test_differential_abundance | R Documentation |
test_differential_abundance() takes as input A 'tbl' (with at least three columns for sample, feature and transcript abundance) or 'SummarizedExperiment' (more convenient if abstracted to tibble with library(tidySummarizedExperiment)) and returns a consistent object (to the input) with additional columns for the statistics from the hypothesis test.
test_differential_abundance(
.data,
.formula,
.sample = NULL,
.transcript = NULL,
.abundance = NULL,
contrasts = NULL,
method = "edgeR_quasi_likelihood",
test_above_log2_fold_change = NULL,
scaling_method = "TMM",
omit_contrast_in_colnames = FALSE,
prefix = "",
action = "add",
...,
significance_threshold = NULL,
fill_missing_values = NULL,
.contrasts = NULL
)
## S4 method for signature 'spec_tbl_df'
test_differential_abundance(
.data,
.formula,
.sample = NULL,
.transcript = NULL,
.abundance = NULL,
contrasts = NULL,
method = "edgeR_quasi_likelihood",
test_above_log2_fold_change = NULL,
scaling_method = "TMM",
omit_contrast_in_colnames = FALSE,
prefix = "",
action = "add",
...,
significance_threshold = NULL,
fill_missing_values = NULL,
.contrasts = NULL
)
## S4 method for signature 'tbl_df'
test_differential_abundance(
.data,
.formula,
.sample = NULL,
.transcript = NULL,
.abundance = NULL,
contrasts = NULL,
method = "edgeR_quasi_likelihood",
test_above_log2_fold_change = NULL,
scaling_method = "TMM",
omit_contrast_in_colnames = FALSE,
prefix = "",
action = "add",
...,
significance_threshold = NULL,
fill_missing_values = NULL,
.contrasts = NULL
)
## S4 method for signature 'tidybulk'
test_differential_abundance(
.data,
.formula,
.sample = NULL,
.transcript = NULL,
.abundance = NULL,
contrasts = NULL,
method = "edgeR_quasi_likelihood",
test_above_log2_fold_change = NULL,
scaling_method = "TMM",
omit_contrast_in_colnames = FALSE,
prefix = "",
action = "add",
...,
significance_threshold = NULL,
fill_missing_values = NULL,
.contrasts = NULL
)
## S4 method for signature 'SummarizedExperiment'
test_differential_abundance(
.data,
.formula,
.sample = NULL,
.transcript = NULL,
.abundance = NULL,
contrasts = NULL,
method = "edgeR_quasi_likelihood",
test_above_log2_fold_change = NULL,
scaling_method = "TMM",
omit_contrast_in_colnames = FALSE,
prefix = "",
action = "add",
...,
significance_threshold = NULL,
fill_missing_values = NULL,
.contrasts = NULL
)
## S4 method for signature 'RangedSummarizedExperiment'
test_differential_abundance(
.data,
.formula,
.sample = NULL,
.transcript = NULL,
.abundance = NULL,
contrasts = NULL,
method = "edgeR_quasi_likelihood",
test_above_log2_fold_change = NULL,
scaling_method = "TMM",
omit_contrast_in_colnames = FALSE,
prefix = "",
action = "add",
...,
significance_threshold = NULL,
fill_missing_values = NULL,
.contrasts = NULL
)
.data |
A 'tbl' (with at least three columns for sample, feature and transcript abundance) or 'SummarizedExperiment' (more convenient if abstracted to tibble with library(tidySummarizedExperiment)) |
.formula |
A formula representing the desired linear model. If there is more than one factor, they should be in the order factor of interest + additional factors. |
.sample |
The name of the sample column |
.transcript |
The name of the transcript/gene column |
.abundance |
The name of the transcript/gene abundance column |
contrasts |
This parameter takes the format of the contrast parameter of the method of choice. For edgeR and limma-voom is a character vector. For DESeq2 is a list including a character vector of length three. The first covariate is the one the model is tested against (e.g., ~ factor_of_interest) |
method |
A string character. Either "edgeR_quasi_likelihood" (i.e., QLF), "edgeR_likelihood_ratio" (i.e., LRT), "edger_robust_likelihood_ratio", "DESeq2", "limma_voom", "limma_voom_sample_weights" |
test_above_log2_fold_change |
A positive real value. This works for edgeR and limma_voom methods. It uses the 'treat' function, which tests that the difference in abundance is bigger than this threshold rather than zero https://pubmed.ncbi.nlm.nih.gov/19176553. |
scaling_method |
A character string. The scaling method passed to the back-end functions: edgeR and limma-voom (i.e., edgeR::calcNormFactors; "TMM","TMMwsp","RLE","upperquartile"). Setting the parameter to \"none\" will skip the compensation for sequencing-depth for the method edgeR or limma-voom. |
omit_contrast_in_colnames |
If just one contrast is specified you can choose to omit the contrast label in the colnames. |
prefix |
A character string. The prefix you would like to add to the result columns. It is useful if you want to compare several methods. |
action |
A character string. Whether to join the new information to the input tbl (add), or just get the non-redundant tbl with the new information (get). |
... |
Further arguments passed to some of the internal functions. Currently, it is needed just for internal debug. |
significance_threshold |
DEPRECATED - A real between 0 and 1 (usually 0.05). |
fill_missing_values |
DEPRECATED - A boolean. Whether to fill missing sample/transcript values with the median of the transcript. This is rarely needed. |
.contrasts |
DEPRECATED - This parameter takes the format of the contrast parameter of the method of choice. For edgeR and limma-voom is a character vector. For DESeq2 is a list including a character vector of length three. The first covariate is the one the model is tested against (e.g., ~ factor_of_interest) |
'r lifecycle::badge("maturing")'
This function provides the option to use edgeR https://doi.org/10.1093/bioinformatics/btp616, limma-voom https://doi.org/10.1186/gb-2014-15-2-r29, limma_voom_sample_weights https://doi.org/10.1093/nar/gkv412 or DESeq2 https://doi.org/10.1186/s13059-014-0550-8 to perform the testing. All methods use raw counts, irrespective of if scale_abundance or adjust_abundance have been calculated, therefore it is essential to add covariates such as batch effects (if applicable) in the formula.
Underlying method for edgeR framework:
.data |>
# Filter keep_abundant( factor_of_interest = !!(as.symbol(parse_formula(.formula)[1])), minimum_counts = minimum_counts, minimum_proportion = minimum_proportion ) |>
# Format select(!!.transcript,!!.sample,!!.abundance) |> spread(!!.sample,!!.abundance) |> as_matrix(rownames = !!.transcript)
# edgeR edgeR::DGEList(counts = .) |> edgeR::calcNormFactors(method = scaling_method) |> edgeR::estimateDisp(design) |>
# Fit edgeR::glmQLFit(design) |> // or glmFit according to choice edgeR::glmQLFTest(coef = 2, contrast = my_contrasts) // or glmLRT according to choice
Underlying method for DESeq2 framework:
keep_abundant( factor_of_interest = !!as.symbol(parse_formula(.formula)[[1]]), minimum_counts = minimum_counts, minimum_proportion = minimum_proportion ) |>
# DESeq2 DESeq2::DESeqDataSet(design = .formula) |> DESeq2::DESeq() |> DESeq2::results()
Underlying method for glmmSeq framework:
counts = .data assay(my_assay)
# Create design matrix for dispersion, removing random effects design = model.matrix( object = .formula |> lme4::nobars(), data = metadata )
dispersion = counts |> edgeR::estimateDisp(design = design)
glmmSeq( .formula, countdata = counts , metadata = metadata |> as.data.frame(), dispersion = dispersion, progress = TRUE, method = method |> str_remove("(?i)^glmmSeq_" ), )
A consistent object (to the input) with additional columns for the statistics from the test (e.g., log fold change, p-value and false discovery rate).
A consistent object (to the input) with additional columns for the statistics from the test (e.g., log fold change, p-value and false discovery rate).
A consistent object (to the input) with additional columns for the statistics from the hypothesis test (e.g., log fold change, p-value and false discovery rate).
A consistent object (to the input) with additional columns for the statistics from the hypothesis test (e.g., log fold change, p-value and false discovery rate).
A 'SummarizedExperiment' object
A 'SummarizedExperiment' object
# edgeR
tidybulk::se_mini |>
identify_abundant() |>
test_differential_abundance( ~ condition )
# The function `test_differential_abundance` operates with contrasts too
tidybulk::se_mini |>
identify_abundant(factor_of_interest = condition) |>
test_differential_abundance(
~ 0 + condition,
contrasts = c( "conditionTRUE - conditionFALSE")
)
# DESeq2 - equivalent for limma-voom
my_se_mini = tidybulk::se_mini
my_se_mini$condition = factor(my_se_mini$condition)
# demontrating with `fitType` that you can access any arguments to DESeq()
my_se_mini |>
identify_abundant(factor_of_interest = condition) |>
test_differential_abundance( ~ condition, method="deseq2", fitType="local")
# testing above a log2 threshold, passes along value to lfcThreshold of results()
res <- my_se_mini |>
identify_abundant(factor_of_interest = condition) |>
test_differential_abundance( ~ condition, method="deseq2",
fitType="local",
test_above_log2_fold_change=4 )
# Use random intercept and random effect models
se_mini[1:50,] |>
identify_abundant(factor_of_interest = condition) |>
test_differential_abundance(
~ condition + (1 + condition | time),
method = "glmmseq_lme4", cores = 1
)
# confirm that lfcThreshold was used
## Not run:
res |>
mcols() |>
DESeq2::DESeqResults() |>
DESeq2::plotMA()
## End(Not run)
# The function `test_differential_abundance` operates with contrasts too
my_se_mini |>
identify_abundant() |>
test_differential_abundance(
~ 0 + condition,
contrasts = list(c("condition", "TRUE", "FALSE")),
method="deseq2",
fitType="local"
)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.