bootstrap_enrichment_test: Bootstrap cell type enrichment test
In NathanSkene/EWCE: Expression Weighted Celltype Enrichment

View source: R/bootstrap_enrichment_test.R

bootstrap_enrichment_test

R Documentation

Bootstrap cell type enrichment test

Description

bootstrap_enrichment_test takes a genelist and a single cell type transcriptome dataset and determines the probability of enrichment and fold changes for each cell type.

Usage

bootstrap_enrichment_test(
  sct_data = NULL,
  hits = NULL,
  bg = NULL,
  genelistSpecies = NULL,
  sctSpecies = NULL,
  sctSpecies_origin = sctSpecies,
  output_species = "human",
  method = "homologene",
  reps = 100,
  no_cores = 1,
  annotLevel = 1,
  geneSizeControl = FALSE,
  controlledCT = NULL,
  mtc_method = "BH",
  sort_results = TRUE,
  standardise_sct_data = TRUE,
  standardise_hits = FALSE,
  verbose = TRUE,
  localHub = FALSE,
  store_gene_data = TRUE
)

Arguments

`sct_data`	List generated using generate_celltype_data.
`hits`	List of gene symbols containing the target gene list. Will automatically be converted to human gene symbols if `geneSizeControl=TRUE`.
`bg`	List of gene symbols containing the background gene list (including hit genes). If `bg=NULL`, an appropriate gene background will be created automatically.
`genelistSpecies`	Species that `hits` genes came from (no longer limited to just "mouse" and "human"). See list_species for all available species.
`sctSpecies`	Species that `sct_data` is currently formatted as (no longer limited to just "mouse" and "human"). See list_species for all available species.
`sctSpecies_origin`	Species that the `sct_data` originally came from, regardless of its current gene format (e.g. it was previously converted from mouse to human gene orthologs). This is used for computing an appropriate backgrund.
`output_species`	Species to convert `sct_data` and `hits` to (Default: "human"). See list_species for all available species.
`method`	R package to use for gene mapping: `"gprofiler"` : Slower but more species and genes. `"homologene"` : Faster but fewer species and genes. `"babelgene"` : Faster but fewer species and genes. Also gives consensus scores for each gene mapping based on a several different data sources.
`reps`	Number of random gene lists to generate (Default: 100, but should be >=10,000 for publication-quality results).
`no_cores`	Number of cores to parallelise bootstrapping `reps` over.
`annotLevel`	An integer indicating which level of `sct_data` to analyse (Default: 1).
`geneSizeControl`	Whether you want to control for GC content and transcript length. Recommended if the gene list originates from genetic studies (Default: FALSE). If set to `TRUE`, then `hits` must be from humans.
`controlledCT`	[Optional] If not NULL, and instead is the name of a cell type, then the bootstrapping controls for expression within that cell type.
`mtc_method`	Multiple-testing correction method (passed to p.adjust).
`sort_results`	Sort enrichment results from smallest to largest p-values.
`standardise_sct_data`	Should `sct_data` be standardised? if `TRUE`: When `sctSpecies!=output_species` the `sct_data` will be checked for object formatting and the genes will be converted to the orthologs of the `output_species` with standardise_ctd (which calls map_genes internally). When `sctSpecies==output_species`, the `sct_data` will be checked for object formatting with standardise_ctd, but the gene names will remain untouched.
`standardise_hits`	Should `hits` be standardised? If `TRUE`: When `genelistSpecies!=output_species`, the genes will be converted to the orthologs of the `output_species` with convert_orthologs. When `genelistSpecies==output_species`, the genes will be standardised with map_genes. If `FALSE`, `hits` will be passed on to subsequent steps as-is.
`verbose`	Print messages.
`localHub`	If working offline, add argument localHub=TRUE to work with a local, non-updated hub; It will only have resources available that have previously been downloaded. If offline, Please also see BiocManager vignette section on offline use to ensure proper functionality.
`store_gene_data`	Store sampled gene data for every bootstrap iteration. When the number of bootstrap `reps` is very high (>=100k) and/or the number of genes in `hits` is very high, you may want to set `store_gene_data=FALSE` to avoid using excessive amounts of CPU memory.

Value

A list containing three elements:

hit.cells: vector containing the summed proportion of expression in each cell type for the target list.
gene_data: data.table showing the number of time each gene appeared in the bootstrap sample.
bootstrap_data: matrix in which each row represents the summed proportion of expression in each cell type for one of the random lists
controlledCT: the controlled cell type (if applicable)

Examples

# Load the single cell data
sct_data <- ewceData::ctd()
# Set the parameters for the analysis
# Use 3 bootstrap lists for speed, for publishable analysis use >=10,000
reps <- 3
# Load gene list from Alzheimer's disease GWAS
hits <- ewceData::example_genelist()

# Bootstrap significance test, no control for transcript length or GC content
full_results <- EWCE::bootstrap_enrichment_test(
    sct_data = sct_data,
    hits = hits,
    reps = reps,
    annotLevel = 1,
    sctSpecies = "mouse",
    genelistSpecies = "human")

NathanSkene/EWCE documentation built on Feb. 17, 2025, 7:52 a.m.