gene_set_enrichment: Evaluate the enrichment for a list of gene sets

View source: R/gene_set_enrichment.R

gene_set_enrichmentR Documentation

Evaluate the enrichment for a list of gene sets

Description

Using the layer-level (group-level) data, this function evaluates whether list of gene sets (Ensembl gene IDs) are enriched among the significant genes (FDR < 0.1 by default) genes for a given model type result. Test the alternative hypothesis that OR > 1, i.e. that gene set is over-represented in the set of enriched genes. If you want to check depleted genes, change reverse to TRUE.

Usage

gene_set_enrichment(
  gene_list,
  fdr_cut = 0.1,
  modeling_results = fetch_data(type = "modeling_results"),
  model_type = names(modeling_results)[1],
  reverse = FALSE
)

Arguments

gene_list

A named list object (could be a data.frame) where each element of the list is a character vector of Ensembl gene IDs.

fdr_cut

A numeric(1) specifying the FDR cutoff to use for determining significance among the modeling results genes.

modeling_results

Defaults to the output of fetch_data(type = 'modeling_results'). This is a list of tables with the columns ⁠f_stat_*⁠ or ⁠t_stat_*⁠ as well as ⁠p_value_*⁠ and ⁠fdr_*⁠ plus ensembl. The column name is used to extract the statistic results, the p-values, and the FDR adjusted p-values. Then the ensembl column is used for matching in some cases. See fetch_data() for more details. Typically this is the set of reference statistics used in layer_stat_cor().

model_type

A named element of the modeling_results list. By default that is either enrichment for the model that tests one human brain layer against the rest (one group vs the rest), pairwise which compares two layers (groups) denoted by layerA-layerB such that layerA is greater than layerB, and anova which determines if any layer (group) is different from the rest adjusting for the mean expression level. The statistics for enrichment and pairwise are t-statistics while the anova model ones are F-statistics.

reverse

A logical(1) indicating whether to multiply by -1 the input statistics and reverse the layerA-layerB column names (using the -) into layerB-layerA.

Details

Check https://github.com/LieberInstitute/HumanPilot/blob/master/Analysis/Layer_Guesses/check_clinical_gene_sets.R to see a full script from where this family of functions is derived from.

Value

A table in long format with the enrichment results using stats::fisher.test().

  • OR odds ratio.

  • Pval p-value for fisher.test().

  • test group or layer in the modeling_results.

  • NumSig Number of genes from the gene set present in modeling_results & with fdr < fdr_cut and t_stat > 0 (unless reverse = TRUE) for test in modeling results.

  • SetSize Number of genes from modeling_results present in gene_set.

  • ID name of gene set.

  • model_type record of input model type from ⁠modeling results⁠.

  • fdr_cut record of input frd_cut.

Author(s)

Andrew E Jaffe, Leonardo Collado-Torres

See Also

Other Gene set enrichment functions: gene_set_enrichment_plot()

Examples


## Read in the SFARI gene sets included in the package
asd_sfari <- utils::read.csv(
    system.file(
        "extdata",
        "SFARI-Gene_genes_01-03-2020release_02-04-2020export.csv",
        package = "spatialLIBD"
    ),
    as.is = TRUE
)

## Format them appropriately
asd_sfari_geneList <- list(
    Gene_SFARI_all = asd_sfari$ensembl.id,
    Gene_SFARI_high = asd_sfari$ensembl.id[asd_sfari$gene.score < 3],
    Gene_SFARI_syndromic = asd_sfari$ensembl.id[asd_sfari$syndromic == 1]
)

## Obtain the necessary data
if (!exists("modeling_results")) {
    modeling_results <- fetch_data(type = "modeling_results")
}

## Compute the gene set enrichment results
asd_sfari_enrichment <- gene_set_enrichment(
    gene_list = asd_sfari_geneList,
    modeling_results = modeling_results,
    model_type = "enrichment"
)

## Explore the results
asd_sfari_enrichment

LieberInstitute/spatialLIBD documentation built on Dec. 19, 2024, 7:12 p.m.