In oberstal/pfGO: Plasmodium falciparum functional enrichment analysis tools

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.path = "man/figures/README-",
  out.width = "100%"
)

pfGO

The goal of pfGO is to package handy functions and datasets specifically enabling Plasmodium falciparum functional enrichment analyses. pfGO acts as a wrapper around the topGO package for much of its enrichment functionality, while also providing several functions for incorporating latest gene-ontology and functional annotations.

pfGO enables chaining together many parallel enrichment-analyses at once, generating thorough logs and outputs supporting reproducible analyses.

Installation

You can install pfGO from GitHub with:

# install.packages("devtools")
devtools::install_github("oberstal/pfGO")

# or if authenticating via ssh:
devtools::install_git("https://github.com/oberstal/pfGO")

See individual functions/data objects for further documentation. E.g.:

?run.topGO.meta
?Pfal_geneID2GO

# for all available functions/data:
?pfGO
# then scroll to the bottom and click the "index" link.

Example (quick start)

This is a basic example demonstrating how to run an enrichment analysis on piggyBac pooled phenotypic screening results to identify processes enabling parasite survival of host fever (using data as published previously):

library(pfGO)

# load included pf GO database and example-data to be tested for functional enrichment
data(Pfal_geneID2GO)
data(exampleMydf)

# run the topGO pipeline on all experimental categories of interest from exampleMydf
run.topGO.meta(mydf = exampleMydf, geneID2GO = Pfal_geneID2GO, pval = 0.05)

View example console output here

Key functions and further documentation

run.topGO.meta

description:

Tests for functional enrichment in gene-categories of interest.

parameters:

mydf: data frame with geneIDs in column 1, and interest-category classifications in column 2.
geneID2GO: a list of named vectors of GO IDs--one vector of GO-terms for each geneID.
pval: p-value threshold for significance. Defaults to 0.05.

outputs:

run.topGO.meta creates several output-files, including:

enrichment results
significant genes per significant term
- available in "Routput/GO/all.combined.sig.genes.per.sig.terms.tsv"
plots of the GO-term hierarchy relevant to the analysis
thorough log-files for each gene-category of interest tested against the background of all other genes in the analysis
system logs recording all package versions, etc.

Primary results from run.topGO.meta will be in "Routput/GO/all.combined.GO.results.tsv". Note that run.topGO.meta will automatically create the Routput directory (and other required output directories nested in ./Routput) in your working directory for you if it does not exist.

more function details

The run.topGO.meta function:

defines which genes are "interesting" and which should be defined as background for each category specified in mydf,
makes the GOdata object for topGO,
tests each category of interest for enriched GO-terms against all the other genes included in mydf (the "gene universe"),
and then outputs results to several tables (.tsv files that can be opened in Excel).

Enrichments are performed by each ontology (molecular function, biological process, cellular compartment; MF, BP, and CC, respectively) sequentially on all groups of interest. Results are combined in the final output-table ("Routput/GO/all.combined.GO.results.tsv").

TopGO automatically accounts for genes that cannot be mapped to GO terms (or are mapped to terms with < 3 genes in the analysis) with "feasible genes" indicated in the topGO.log files in the "Routput/GO" folder.

Concepts for common use-cases:

RNAseq:

In an RNAseq analysis, common interest-categories might be "upregulated", "downregulated", and "neutral" genes. The gene universe would consist of all genes expressed above your threshold cutoffs (not necessarily all genes in the genome).

piggyBac screens:

In pooled piggyBac-mutant screening, common categories might be "sensitive", "tolerant", and "neutral". The gene universe would consist of all genes represented in your screened library of mutants (again, not all genes in the genome).

See the included data object exampleMydf as an example.

Using your own custom GO database:

A correctly formatted geneID2GO object is included for P. falciparum enrichment analyses (Pfal_geneID2GO). You may also provide your own, so long as it is a named character-vector of GO-terms (each vector named by geneID, with GO terms as each element).

You can use the included formatGOdb.curated() function to format a custom GO database from curated GeneDB annotations for several non-model organisms (or the formatGOdb() function to include all GO annotations, if you aren't picky about including automated electronic annotations). If you're studying a model organism, several annotations are already available through the AnnotationDbi bioconductor package that loads with topGO.

Example console output

Example console output generated running the quick-start example data (piggyBac pooled phenotypic screening results to identify processes enabling parasite survival of host fever (similar to as published previously):

# load included pf GO database and example-data to be tested for functional enrichment
data(Pfal_geneID2GO_curated)
data(exampleMydf)

# run the topGO pipeline on all experimental categories of interest from exampleMydf
run.topGO.meta(mydf = exampleMydf, geneID2GO = Pfal_geneID2GO_curated, pval = 0.1)

# if you've run the pipeline, your significant genes in significant terms per category of interest can be loaded:
sig.genes <- read.delim("Routput/GO/all.combined.sig.genes.per.sig.terms.tsv")

oberstal/pfGO documentation built on April 22, 2024, 7:15 a.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

Tweet to @rdrrHQ

GitHub issue tracker

ian@mutexlabs.com