ewce_expression_data: Bootstrap cell type enrichment test for transcriptome data
In NathanSkene/EWCE: Expression Weighted Celltype Enrichment

ewce_expression_data

R Documentation

Bootstrap cell type enrichment test for transcriptome data

Description

ewce_expression_data takes a differential gene expression (DGE) results table and determines the probability of cell type enrichment in the up- and down- regulated genes.

Usage

ewce_expression_data(
  sct_data,
  annotLevel = 1,
  tt,
  sortBy = "t",
  thresh = 250,
  reps = 100,
  ttSpecies = NULL,
  sctSpecies = NULL,
  output_species = NULL,
  bg = NULL,
  method = "homologene",
  verbose = TRUE,
  localHub = FALSE
)

Arguments

`sct_data`	List generated using generate_celltype_data.
`annotLevel`	An integer indicating which level of `sct_data` to analyse (Default: 1).
`tt`	Differential expression table. Can be output of topTable function. Minimum requirement is that one column stores a metric of increased/decreased expression (i.e. log fold change, t-statistic for differential expression etc) and another contains gene symbols.
`sortBy`	Column name of metric in `tt` which should be used to sort up- from down- regulated genes (Default: "t").
`thresh`	The number of up- and down- regulated genes to be included in each analysis (Default: 250).
`reps`	Number of random gene lists to generate (Default: 100, but should be >=10,000 for publication-quality results).
`ttSpecies`	The species the differential expression table was generated from.
`sctSpecies`	Species that `sct_data` is currently formatted as (no longer limited to just "mouse" and "human"). See list_species for all available species.
`output_species`	Species to convert `sct_data` and `hits` to (Default: "human"). See list_species for all available species.
`bg`	List of gene symbols containing the background gene list (including hit genes). If `bg=NULL`, an appropriate gene background will be created automatically.
`method`	R package to use for gene mapping: `"gprofiler"` : Slower but more species and genes. `"homologene"` : Faster but fewer species and genes. `"babelgene"` : Faster but fewer species and genes. Also gives consensus scores for each gene mapping based on a several different data sources.
`verbose`	Print messages.
`localHub`	If working offline, add argument localHub=TRUE to work with a local, non-updated hub; It will only have resources available that have previously been downloaded. If offline, Please also see BiocManager vignette section on offline use to ensure proper functionality.

Value

A list containing five data frames:

results: dataframe in which each row gives the statistics (p-value, fold change and number of standard deviations from the mean) associated with the enrichment of the stated cell type in the gene list. An additional column *Direction* stores whether it the result is from the up or downregulated set.
hit.cells.up: vector containing the summed proportion of expression in each cell type for the target list.
hit.cells.down: vector containing the summed proportion of expression in each cell type for the target list.
bootstrap_data.up: matrix in which each row represents the summed proportion of expression in each cell type for one of the random lists.
bootstrap_data.down: matrix in which each row represents the summed proportion of expression in each cell type for one of the random lists.

Examples

# Load the single cell data
ctd <- ewceData::ctd()

# Set the parameters for the analysis
# Use 3 bootstrap lists for speed, for publishable analysis use >10000
reps <- 3
# Use 5 up/down regulated genes (thresh) for speed, default is 250
thresh <- 5
annotLevel <- 1 # <- Use cell level annotations (i.e. Interneurons)

# Load the top table
tt_alzh <- ewceData::tt_alzh()

tt_results <- EWCE::ewce_expression_data(
    sct_data = ctd,
    tt = tt_alzh,
    annotLevel = 1,
    thresh = thresh,
    reps = reps,
    ttSpecies = "human",
    sctSpecies = "mouse"
)

NathanSkene/EWCE documentation built on Feb. 17, 2025, 7:52 a.m.