Description Usage Arguments Details Value Author(s) References See Also Examples
Tests GO-categories for enrichment of user defined gene sets, using either the hypergeometric (default), Wilcoxon rank-sum, binomial or 2x2 contingency table test.
1 2 3 |
genes |
a data.frame() with gene-symbols (character()) in the first column and test-dependent additional numeric() columns: |
test |
character(). 'hyper' (default) for the hypergeometric test, 'wilcoxon' for the Wilcoxon rank-sum test, 'binomial' for the binomial test and 'contingency' for the 2x2-contingency table test (fisher's exact test or chi-square). |
n_randsets |
integer defining the number of random sets for computing the FWER. |
organismDb |
character(). Annotation package for GO-annotations and gene coordinates. Besides the default 'Homo.sapiens' also 'Mus.musculus' and 'Rattus.norvegicus' are currently available on Bioconductor. |
gene_len |
logical. Correct for gene length. If |
regions |
logical. If |
circ_chrom |
logical. When |
silent |
logical. If |
domains |
optional character() vector containing one or more of the three GO-domains 'cellular_component', 'biological_process' and 'molecular_function'.
If defined, the analysis will be reduced to those domains which saves time.\ cr
If a custom ontology is specified in |
orgDb |
optional character() naming an OrgDb annotation package from Bioconductor. If |
txDb |
optional character() naming an TxDb annotation package from Bioconductor (e.g. 'TxDb.Ptroglodytes.UCSC.panTro4.refGene') for the gene coordinates.
Only needed when |
annotations |
optional data.frame() for custom annotations, with two character() columns:
(1) gene-symbols and (2) GO-categories.
Note that options |
gene_coords |
optional data.frame() for custom gene coordinates, with four columns:
gene-symbols (character), chromosome (character), start (integer), end (integer).
When |
godir |
optional character() specifying a directory () that contains a custom GO-graph
(files 'term.txt', 'term2term.txt' and 'graph_path.txt').
Alternative to the default integrated GO-graph. |
Please also refer to the package's vignette.
GO-annotations are taken from a Bioconductor annotation package (OrganismDb package 'Homo.sapiens' by default), but also other 'OrganismDb' or 'OrgDb' packages can be used. It is also possible to provide custom annotations as a data.frame().
The ontology graph is integrated, but a custom version can be defined as well with parameter 'godir'. As long as the ontology tables are in the right format (see link to description in vignette), any ontology can be used in GOfuncR, it is not restricted to the gene ontology.
The statistical analysis is based on the ontology enrichment software FUNC [2].
go_enrich
offers four different statistical tests: (1) the hypergeometric test for a candidate and a background gene set; (2) the Wilcoxon rank-sum test for genes that are ranked by scores (e.g. p-value for differential expression); (3) the binomial test for genes that have two associated counts (e.g. amino-acid changes on the human and the chimp lineage); and (4) a 2x2-contingency table test for genes that have four associated counts (e.g. for a McDonald-Kreitman test).
To account for multiple testing family-wise error rates are computed using randomsets. Besides naming candidate genes explicitly, for the hypergeometric test it is also possible to provide entire genomic regions as input. The enrichment analysis is then performed for all genes located in or overlapping these regions and the multiple testing correction accounts for the spatial clustering of genes.
A list with components
results |
a data.frame() with the FWERs from the enrichment analyses per ontology category, ordered by 'FWER_overrep', 'raw_p_overrep', -'FWER_underrep', -'raw_p_underrep', 'ontology' and 'node_id', or the corresponding columns if another test then the hypergeometric test is used. This table contains the combined results for all three ontology domains. Note that GO-categories without any annotations of candidate or background genes are not listed. |
genes |
the input |
databases |
a data.frame() with the used annotation packages and GO-graph. |
min_p |
a data.frame() with the minimum p-values from the randomsets that are used to compute the FWER. |
Steffi Grote
[1] Ashburner, M. et al. (2000). Gene Ontology: tool for the unification of biology. Nature Genetics 25: 25-29. doi: 10.1038/75556
[2] Pruefer, K. et al. (2007). FUNC: A package for detecting significant associations between gene
sets and ontological. BMC Bioinformatics 8: 41. doi: 10.1186/1471-2105-8-41
get_parent_nodes
get_child_nodes
get_anno_categories
get_anno_genes
plot_anno_scores
get_names
get_ids
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 | #### Note that argument 'n_randsets' is reduced
#### to lower computational time in the following examples.
#### Using the default value is recommended.
#### Perform a GO-enrichment analysis for some human genes
#### with a defined background set
# create input dataframe that defines the candidate and backround genes
candi_gene_ids = c('NCAPG', 'APOL4', 'NGFR', 'NXPH4', 'C21orf59', 'CACNG2',
'AGTR1', 'ANO1', 'BTBD3', 'MTUS1', 'CALB1', 'GYG1', 'PAX2')
bg_gene_ids = c('FGR', 'NPHP1', 'DRD2', 'ABCC10', 'PTBP2', 'JPH4', 'SMARCC2',
'FN1', 'NODAL', 'CYP1A2', 'ACSS1', 'CDHR1', 'SLC25A36', 'LEPR', 'PRPS2',
'TNFAIP3', 'NKX3-1', 'LPAR2', 'PGAM2')
is_candidate = c(rep(1,length(candi_gene_ids)), rep(0,length(bg_gene_ids)))
genes = data.frame(gene_ids=c(candi_gene_ids, bg_gene_ids), is_candidate)
genes
# run enrichment analysis
go_res = go_enrich(genes, n_randset=100)
# go_enrich returns a list with 4 elements:
# 1) results from the anlysis
# (ordered by FWER for overrepresentation of candidate genes)
head(go_res[[1]])
# see the top GOs from every GO-domain
by(go_res[[1]], go_res[[1]][,'ontology'], head)
# 2) all valid input genes
go_res[[2]]
# 3) annotation databases used
go_res[[3]]
# 4) minimum p-values from randomsets
head(go_res[[4]])
#### see the package's vignette for more examples
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.