run_de: Run differential expression

View source: R/run_de.R

run_deR Documentation

Run differential expression

Description

Perform differential expression/accessibility (DE/DA) on single-cell data. Libra implements unique DE/DA methods that can all be accessed from one function. These methods encompass traditional single-cell methods as well as methods accounting for biological replicate including pseudobulk and mixed model methods. The code for this package has been largely inspired by the Seurat and Muscat packages. Please see the documentation of these packages for further information.

Usage

run_de(
  input,
  meta = NULL,
  replicate_col = "replicate",
  cell_type_col = "cell_type",
  label_col = "label",
  min_cells = 3,
  min_reps = 2,
  min_features = 0,
  de_family = "pseudobulk",
  de_method = "edgeR",
  de_type = "LRT",
  input_type = "scRNA",
  normalization = "log_tp10k",
  binarization = FALSE,
  latent_vars = NULL,
  n_threads = 2
)

Arguments

input

a single-cell matrix to be converted, with features (genes) in rows and cells in columns. Alternatively, a Seurat, monocle3, or or SingleCellExperiment object can be directly input.

meta

the accompanying meta data whereby the rownames match the column names of input.

replicate_col

the vector in meta containing the replicate information. Defaults to replicate.

cell_type_col

the vector in meta containing the cell type information. Defaults to cell_type.

label_col

the vector in meta containing the experimental label. Defaults to label. Labels 1 and 2 are determined using the factor levels. If no factor levels are provided in the labels of the metadata label column, factors will be run as default on the label column.

min_cells

the minimum number of cells in a cell type to retain it. Defaults to 3.

min_reps

the minimum number of replicates in a cell type to retain it. Defaults to 2.

min_features

the minimum number of expressing cells (or replicates) for a gene to retain it. Defaults to 0.

de_family

the differential expression/accessibility family to use. Available options are:

  • "singlecell": For single cell differential expression (DE) methods, uses traditionally methods implemented by Seurat to test for DE genes. These methods do not take biological replicate into account. There are six options for de_method that can be used, while no input for de_test is required:

    • "wilcox": Wilcoxon Rank-Sum test. The default.

    • "bimod": Likelihood ratio test

    • "t": Student's t-test

    • "negbinom": Negative binomial linear model

    • "LR": Logistic regression

    • "MAST": MAST (requires installation of the MAST package).

    For single cell differential accessibility (DA) methods, uses methods implemented by Signac and custom-implemented methods to test for DA regions. These methods do not take biological replicate into account. There are eight options for de_method that can be used, while no input for de_test is required:

    • "wilcox": Wilcoxon Rank-Sum test. The default.

    • "t": Student's t-test

    • "negbinom": Negative binomial linear model

    • "LR": Logistic regression

    • "fisher": Fisher exact test

    • "binomial": Binomial test

    • "LR_peaks": Logistic regression by peaks

    • "permutation": Permutation testing

  • "pseudobulk": These methods first convert the single-cell expression/ single-cell peak matrix to a so-called 'pseudobulk' matrix by summing counts for each gene/peaks within biological replicates, and then performing differential expression/differential accessbility using bulk RNA-seq methods. For pseudobulk methods there are six different methods that can be accessed by combinations of de_method and de_type. First specify de_method as one of the following:

    • "edgeR": The edgeR method according to Robinson et al, Bioinformatics, 2010. For this method please specify de_type as either "LRT" or "QLF" as the null hypothesis testing approach. See http://www.bioconductor.org/packages/release/bioc/html/edgeR.html for further information. The default.

    • "DESeq2": The DESeq2 method according to Love et al, Genome Biology, 2014. For this method please specify de_type as either "LRT" or "Wald" as the null hypothesis testing approach. See https://bioconductor.org/packages/release/bioc/html/DESeq2.html for further information.

    • "limma": The limma method according to Ritchie et al, Nucleic Acids Research, 2015. For this method please specify de_type as either "voom" or "trend" as the precise normalization and null hypothesis testing approach within limma. See https://bioconductor.org/packages/release/bioc/html/limma.html for further information.

  • "mixedmodel": Mixed model methods also take biological replicate into account by modelling it as a random effect. Please note that these methods are generally extremely computationally intensive. For mixed model methods there are ten different methods that can be accessed by combinations of de_method and de_type. First specify de_method as one of the following:

    • "negbinom": Negative binomial generalized linear mixed model. The default.

    • "linear": Linear mixed model.

    • "poisson": Poisson generalized linear mixed model.

    • "negbinom_offset": Negative binomial generalized linear mixed model with an offset term to account for sequencing depth differences between cells.

    • "poisson_offset": Poisson generalized linear mixed model with an offset term to account for sequencing depth differences between cells.

    For each of these options the user has the option to use either a Wald or Likelihood ratio testing method by setting de_type to "Wald" or "LRT". Default is LRT.

  • "snapatac_findDAR": SnapATAC findDifferentialAccessibility method. Only for scATAC-seq.

de_method

the specific differential expression testing method to use. Please see the documentation under de_family for precise usage options, or see the documentation at https://github.com/neurorestore/Libra. This option will default to wilcox for singlecell methods, to edgeR for pseudobulk methods, and negbinom for mixedmodel methods.

input_type

refers to either scRNA or scATAC

normalization

normalization for single-cell based Seurat/Signac methods, options include

  • "log_tp10k": Log TP10K (default)

  • "tp10k": TP10K

  • "log_tp_median": Log TP median

  • "tp_median": TP median

  • "TFIDF": Only for scATAC-seq

binarization

binarization for single-cell ATAC-seq only

latent_vars

latent variables for single-cell Seurat/Signac based methods.

n_threads

number of threads to use for parallelization in mixed models.

de_test

the specific mixed model test to use. Please see the documentation under de_family for precise usage options, or see the documentation at https://github.com/neurorestore/Libra. This option defaults to NULL for singlecell methods, to LRT for pseudobulk and mixedmodel methods.

Value

a data frame containing differential expression results with the following columns:

  • "cell type": The cell type DE tests were run on. By default Libra will run DE on all cell types present in the original meta data.

  • "gene": The gene being tested.

  • "avg_logFC": The average log fold change between conditions. The direction of the logFC can be controlled using factor levels of label_col whereby a positive logFC reflects higher expression in the first level of the factor, compared to the second. This is calculated using Seurat::FoldChange.

  • "label1.pct": Percentage of cells expressing the gene in label 1.

  • "label2.pct": Percentage of cells expressing the gene in label 2.

  • "label1.exp": Mean expression of the gene in label 1.

  • "label2.exp": Mean expression of the gene in label 2.

  • "p_val": The p-value resulting from the null hypothesis test.

  • "p_val_adj": The adjusted p-value according to the Benjamini Hochberg method (FDR).

  • "de_family/da_family": The differential expression/accessibility method family.

  • "de_method/da_method": The precise differential/accessibility expression method.

  • "de_type/da_type": The differential expression/accessibility method statistical testing type.


neurorestore/Libra documentation built on Aug. 31, 2024, 8:53 p.m.