knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.path = "man/figures/README-",
  out.width = "100%"
)

pfGO

DOI

The goal of pfGO is to package handy functions and datasets specifically enabling Plasmodium falciparum functional enrichment analyses. pfGO acts as a wrapper around the topGO package for much of its enrichment functionality, while also providing several functions for incorporating latest gene-ontology and functional annotations.

pfGO enables chaining together many parallel enrichment-analyses at once, generating thorough logs and outputs supporting reproducible analyses.

Installation

You can install pfGO from GitHub with:

# install.packages("devtools")
devtools::install_github("oberstal/pfGO")

# or if authenticating via ssh:
devtools::install_git("https://github.com/oberstal/pfGO")

See individual functions/data objects for further documentation. E.g.:

?run.topGO.meta
?Pfal_geneID2GO

# for all available functions/data:
?pfGO
# then scroll to the bottom and click the "index" link.

Example (quick start)

This is a basic example demonstrating how to run an enrichment analysis on piggyBac pooled phenotypic screening results to identify processes enabling parasite survival of host fever (using data as published previously):

library(pfGO)
# load included pf GO database and example-data to be tested for functional enrichment
data(Pfal_geneID2GO)
data(exampleMydf)

# run the topGO pipeline on all experimental categories of interest from exampleMydf
run.topGO.meta(mydf = exampleMydf, geneID2GO = Pfal_geneID2GO, pval = 0.05)

View example console output here

Key functions and further documentation

run.topGO.meta

description:

Tests for functional enrichment in gene-categories of interest.

parameters:

outputs:

run.topGO.meta creates several output-files, including:

Primary results from run.topGO.meta will be in "Routput/GO/all.combined.GO.results.tsv". Note that run.topGO.meta will automatically create the Routput directory (and other required output directories nested in ./Routput) in your working directory for you if it does not exist.

more function details

The run.topGO.meta function:

Enrichments are performed by each ontology (molecular function, biological process, cellular compartment; MF, BP, and CC, respectively) sequentially on all groups of interest. Results are combined in the final output-table ("Routput/GO/all.combined.GO.results.tsv").

TopGO automatically accounts for genes that cannot be mapped to GO terms (or are mapped to terms with < 3 genes in the analysis) with "feasible genes" indicated in the topGO.log files in the "Routput/GO" folder.

Concepts for common use-cases:

RNAseq:

In an RNAseq analysis, common interest-categories might be "upregulated", "downregulated", and "neutral" genes. The gene universe would consist of all genes expressed above your threshold cutoffs (not necessarily all genes in the genome).

piggyBac screens:

In pooled piggyBac-mutant screening, common categories might be "sensitive", "tolerant", and "neutral". The gene universe would consist of all genes represented in your screened library of mutants (again, not all genes in the genome).

See the included data object exampleMydf as an example.

Using your own custom GO database:

A correctly formatted geneID2GO object is included for P. falciparum enrichment analyses (Pfal_geneID2GO). You may also provide your own, so long as it is a named character-vector of GO-terms (each vector named by geneID, with GO terms as each element).

You can use the included formatGOdb.curated() function to format a custom GO database from curated GeneDB annotations for several non-model organisms (or the formatGOdb() function to include all GO annotations, if you aren't picky about including automated electronic annotations). If you're studying a model organism, several annotations are already available through the AnnotationDbi bioconductor package that loads with topGO.

Example console output

Example console output generated running the quick-start example data (piggyBac pooled phenotypic screening results to identify processes enabling parasite survival of host fever (similar to as published previously):

# load included pf GO database and example-data to be tested for functional enrichment
data(Pfal_geneID2GO_curated)
data(exampleMydf)

# run the topGO pipeline on all experimental categories of interest from exampleMydf
run.topGO.meta(mydf = exampleMydf, geneID2GO = Pfal_geneID2GO_curated, pval = 0.1)
# if you've run the pipeline, your significant genes in significant terms per category of interest can be loaded:
sig.genes <- read.delim("Routput/GO/all.combined.sig.genes.per.sig.terms.tsv")


oberstal/pfGO documentation built on April 22, 2024, 7:15 a.m.