dot-enrichment_analysis: GAPGOM internal - enrichment_analysis()

Description Usage Arguments Details Value Notes

Description

This function is an internal function and should not be called by the user.

Usage

1
2
3
.enrichment_analysis(ordered_score_df, id_select_vector, id_translation_df,
  organism, ontology, enrichment_cutoff, significance, filter_pvals,
  go_amount)

Arguments

ordered_score_df

the score dataframe see documentation on GAPGOM::example_score_dataframe for formatting.

id_select_vector

gene rowname(s) that you want to keep in the dataset. For example, let's say you need to only include protein coding genes. You then make a vector including only ids that are protein coding. Most importantly, this is used in the GO term enrichment. Meaning that this vector should only contain genes that are annotated in the GO databases.

id_translation_df

dataframe that has translation between rowname, entrez id and GO ids (generated internally using GOSemSim+entrez ids).

organism

where to be scanned genes reside in, this option is neccesary to select the correct GO DAG. Options are based on the org.db bioconductor package; http://www.bioconductor.org/packages/release/BiocViews.html#___OrgDb Following options are available: "fly", "mouse", "rat", "yeast", "zebrafish", "worm", "arabidopsis", "ecolik12", "bovine", "canine", "anopheles", "ecsakai", "chicken", "chimp", "malaria", "rhesus", "pig", "xenopus". Fantom5 data only has "human" and "mouse" available depending on the dataset.

ontology

desired ontology to use for prediction. One of three; "BP" (Biological process), "MF" (Molecular function) or "CC" (Cellular Component). Cellular Component is not included with the package's standard data and will thus yield no results.

enrichment_cutoff

cutoff number for the amount of genes to be enriched in the enrichment analysis. (default is 250)

significance

normalized p-values (fdr) that are below this number will be kept. has to be a float/double between 0-1. Default is 0.05

filter_pvals

filters pvalues that are equal to 0 (Default=FALSE).

Details

Enriches score results from multiple methods to give a better idea of important similarities. This function is specifically made for predicting lncRNA annotation by assuming "guilt by association". For instance, the expression data in this package is actually based on mRNA expression data, but correlated with lncRNA. This expression data is the used in combination with mRNA GO annotation to calculate similarity scores between GO terms,

Value

The resulting dataframe with prediction of similar GO terms. These are ordered with respect to FDR values. The following columns will be in the dataframe; GOID - Gene Ontology ID, Ontology - Ontology type (MF or BP), FDR - False Positive Rate, Term - description of GOID. However, unlike in expression_prediction, this dataframe will have unsorted row numbering. And it won't contain used method.

Notes

Internal function used in expression_prediction_function().


GAPGOM documentation built on Nov. 8, 2020, 8:08 p.m.