bootstrap_enrichment_test: Bootstrap cell type enrichment test

View source: R/bootstrap_enrichment_test.R

bootstrap_enrichment_testR Documentation

Bootstrap cell type enrichment test


bootstrap_enrichment_test takes a genelist and a single cell type transcriptome dataset and determines the probability of enrichment and fold changes for each cell type.


  sct_data = NULL,
  hits = NULL,
  bg = NULL,
  genelistSpecies = NULL,
  sctSpecies = NULL,
  sctSpecies_origin = sctSpecies,
  output_species = "human",
  method = "homologene",
  reps = 100,
  no_cores = 1,
  annotLevel = 1,
  geneSizeControl = FALSE,
  controlledCT = NULL,
  mtc_method = "BH",
  sort_results = TRUE,
  verbose = TRUE,
  localHub = FALSE



List generated using generate_celltype_data.


List of gene symbols containing the target gene list. Will automatically be converted to human gene symbols if geneSizeControl=TRUE.


List of gene symbols containing the background gene list (including hit genes). If bg=NULL, an appropriate gene background will be created automatically.


Species that hits genes came from (no longer limited to just "mouse" and "human"). See list_species for all available species.


Species that sct_data is currently formatted as (no longer limited to just "mouse" and "human"). See list_species for all available species.


Species that the sct_data originally came from, regardless of its current gene format (e.g. it was previously converted from mouse to human gene orthologs). This is used for computing an appropriate backgrund.


Species to convert sct_data and hits to (Default: "human"). See list_species for all available species.


R package to use for gene mapping:

  • "gprofiler" : Slower but more species and genes.

  • "homologene" : Faster but fewer species and genes.

  • "babelgene" : Faster but fewer species and genes. Also gives consensus scores for each gene mapping based on a several different data sources.


Number of random gene lists to generate (Default: 100, but should be >=10,000 for publication-quality results).


Number of cores to parallelise bootstrapping reps over.


An integer indicating which level of sct_data to analyse (Default: 1).


Whether you want to control for GC content and transcript length. Recommended if the gene list originates from genetic studies (Default: FALSE). If set to TRUE, then hits must be from humans.


[Optional] If not NULL, and instead is the name of a cell type, then the bootstrapping controls for expression within that cell type.


Multiple-testing correction method (passed to p.adjust).


Sort enrichment results from smallest to largest p-values.


Print messages.


If working offline, add argument localHub=TRUE to work with a local, non-updated hub; It will only have resources available that have previously been downloaded. If offline, Please also see BiocManager vignette section on offline use to ensure proper functionality.


A list containing three elements:

  • hit.cells: vector containing the summed proportion of expression in each cell type for the target list.

  • gene_data: data.table showing the number of time each gene appeared in the bootstrap sample.

  • bootstrap_data: matrix in which each row represents the summed proportion of expression in each cell type for one of the random lists

  • controlledCT: the controlled cell type (if applicable)


# Load the single cell data
sct_data <- ewceData::ctd()
# Set the parameters for the analysis
# Use 3 bootstrap lists for speed, for publishable analysis use >=10,000
reps <- 3
# Load gene list from Alzheimer's disease GWAS
hits <- ewceData::example_genelist()

# Bootstrap significance test, no control for transcript length or GC content
full_results <- EWCE::bootstrap_enrichment_test(
    sct_data = sct_data,
    hits = hits,
    reps = reps,
    annotLevel = 1,
    sctSpecies = "mouse",
    genelistSpecies = "human")

NathanSkene/EWCE documentation built on May 25, 2023, 8:30 a.m.