dot-enrichment_analysis: GAPGOM internal - enrichment_analysis()
In GAPGOM: GAPGOM (novel Gene Annotation Prediction and other GO Metrics)

Description Usage Arguments Details Value Notes

This function is an internal function and should not be called by the user.

1
2
3

.enrichment_analysis(ordered_score_df, id_select_vector, id_translation_df,
  organism, ontology, enrichment_cutoff, significance, filter_pvals,
  go_amount)

`ordered_score_df`	the score dataframe see documentation on GAPGOM::example_score_dataframe for formatting.
`id_select_vector`	gene rowname(s) that you want to keep in the dataset. For example, let's say you need to only include protein coding genes. You then make a vector including only ids that are protein coding. Most importantly, this is used in the GO term enrichment. Meaning that this vector should only contain genes that are annotated in the GO databases.
`id_translation_df`	dataframe that has translation between rowname, entrez id and GO ids (generated internally using GOSemSim+entrez ids).
`organism`	where to be scanned genes reside in, this option is neccesary to select the correct GO DAG. Options are based on the org.db bioconductor package; http://www.bioconductor.org/packages/release/BiocViews.html#___OrgDb Following options are available: "fly", "mouse", "rat", "yeast", "zebrafish", "worm", "arabidopsis", "ecolik12", "bovine", "canine", "anopheles", "ecsakai", "chicken", "chimp", "malaria", "rhesus", "pig", "xenopus". Fantom5 data only has "human" and "mouse" available depending on the dataset.
`ontology`	desired ontology to use for prediction. One of three; "BP" (Biological process), "MF" (Molecular function) or "CC" (Cellular Component). Cellular Component is not included with the package's standard data and will thus yield no results.
`enrichment_cutoff`	cutoff number for the amount of genes to be enriched in the enrichment analysis. (default is 250)
`significance`	normalized p-values (fdr) that are below this number will be kept. has to be a float/double between 0-1. Default is 0.05
`filter_pvals`	filters pvalues that are equal to 0 (Default=FALSE).

Enriches score results from multiple methods to give a better idea of important similarities. This function is specifically made for predicting lncRNA annotation by assuming "guilt by association". For instance, the expression data in this package is actually based on mRNA expression data, but correlated with lncRNA. This expression data is the used in combination with mRNA GO annotation to calculate similarity scores between GO terms,

The resulting dataframe with prediction of similar GO terms. These are ordered with respect to FDR values. The following columns will be in the dataframe; GOID - Gene Ontology ID, Ontology - Ontology type (MF or BP), FDR - False Positive Rate, Term - description of GOID. However, unlike in expression_prediction, this dataframe will have unsorted row numbering. And it won't contain used method.

Internal function used in expression_prediction_function().

GAPGOM documentation built on Nov. 8, 2020, 8:08 p.m.