knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.path = "man/figures/README-", out.width = "100%" )
The goal of pfGO is to package handy functions and datasets specifically enabling Plasmodium falciparum functional enrichment analyses. pfGO acts as a wrapper around the topGO package for much of its enrichment functionality, while also providing several functions for incorporating latest gene-ontology and functional annotations.
pfGO enables chaining together many parallel enrichment-analyses at once, generating thorough logs and outputs supporting reproducible analyses.
You can install pfGO from GitHub with:
# install.packages("devtools") devtools::install_github("oberstal/pfGO") # or if authenticating via ssh: devtools::install_git("https://github.com/oberstal/pfGO")
See individual functions/data objects for further documentation. E.g.:
?run.topGO.meta ?Pfal_geneID2GO # for all available functions/data: ?pfGO # then scroll to the bottom and click the "index" link.
This is a basic example demonstrating how to run an enrichment analysis on piggyBac pooled phenotypic screening results to identify processes enabling parasite survival of host fever (using data as published previously):
library(pfGO)
# load included pf GO database and example-data to be tested for functional enrichment data(Pfal_geneID2GO) data(exampleMydf) # run the topGO pipeline on all experimental categories of interest from exampleMydf run.topGO.meta(mydf = exampleMydf, geneID2GO = Pfal_geneID2GO, pval = 0.05)
View example console output here
Tests for functional enrichment in gene-categories of interest.
run.topGO.meta creates several output-files, including:
significant genes per significant term
plots of the GO-term hierarchy relevant to the analysis
Primary results from run.topGO.meta will be in "Routput/GO/all.combined.GO.results.tsv". Note that run.topGO.meta will automatically create the Routput directory (and other required output directories nested in ./Routput) in your working directory for you if it does not exist.
The run.topGO.meta function:
Enrichments are performed by each ontology (molecular function, biological process, cellular compartment; MF, BP, and CC, respectively) sequentially on all groups of interest. Results are combined in the final output-table ("Routput/GO/all.combined.GO.results.tsv").
TopGO automatically accounts for genes that cannot be mapped to GO terms (or are mapped to terms with < 3 genes in the analysis) with "feasible genes" indicated in the topGO.log files in the "Routput/GO" folder.
Concepts for common use-cases:
RNAseq:
In an RNAseq analysis, common interest-categories might be "upregulated", "downregulated", and "neutral" genes. The gene universe would consist of all genes expressed above your threshold cutoffs (not necessarily all genes in the genome).
piggyBac screens:
In pooled piggyBac-mutant screening, common categories might be "sensitive", "tolerant", and "neutral". The gene universe would consist of all genes represented in your screened library of mutants (again, not all genes in the genome).
See the included data object exampleMydf as an example.
Using your own custom GO database:
A correctly formatted geneID2GO object is included for P. falciparum enrichment analyses (Pfal_geneID2GO). You may also provide your own, so long as it is a named character-vector of GO-terms (each vector named by geneID, with GO terms as each element).
You can use the included formatGOdb.curated() function to format a custom GO database from curated GeneDB annotations for several non-model organisms (or the formatGOdb() function to include all GO annotations, if you aren't picky about including automated electronic annotations). If you're studying a model organism, several annotations are already available through the AnnotationDbi bioconductor package that loads with topGO.
Example console output generated running the quick-start example data (piggyBac pooled phenotypic screening results to identify processes enabling parasite survival of host fever (similar to as published previously):
# load included pf GO database and example-data to be tested for functional enrichment data(Pfal_geneID2GO_curated) data(exampleMydf) # run the topGO pipeline on all experimental categories of interest from exampleMydf run.topGO.meta(mydf = exampleMydf, geneID2GO = Pfal_geneID2GO_curated, pval = 0.1)
# if you've run the pipeline, your significant genes in significant terms per category of interest can be loaded: sig.genes <- read.delim("Routput/GO/all.combined.sig.genes.per.sig.terms.tsv")
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.