runDA: Test for differential abundance using edgeR
In fionarhuang/treeclimbR: An algorithm to find optimal signal levels in a tree

runDA

R Documentation

Test for differential abundance using edgeR

Description

Test for differential abundance of entities using functions from the edgeR package. This adapts edgerWrp to accept input as a TreeSummarizedExperiment (TSE) object instead of a matrix. Features could be represented in either rows or columns. By default, features are in the rows. Then, samples are in columns and the sample information is in colData. The tree that stores the hierarchical information about features is in rowTree. Each row of the assays can be mapped to a node of the tree. Data on rows that are mapped to internal nodes is generated from data on leaf nodes. Normalization for samples is automatically performed by edgeR and the library size is calculated using features that are mapped to leaf nodes.

Usage

runDA(
  TSE,
  feature_on_row = TRUE,
  assay = NULL,
  option = c("glm", "glmQL"),
  design = NULL,
  contrast = NULL,
  filter_min_count = 10,
  filter_min_total_count = 15,
  filter_large_n = 10,
  filter_min_prop = 0.7,
  normalize = TRUE,
  normalize_method = "TMM",
  group_column = "group",
  design_terms = "group",
  ...
)

Arguments

`TSE`	A `TreeSummarizedExperiment` object.
`feature_on_row`	A logical scalar. If `TRUE` (default), features or entities (e.g. genes, OTUs) are in rows of the `assays` tables, and samples are in columns; otherwise, it's the other way around.
`assay`	A numeric index or assay name to specify which assay from `assays` is used for analysis.
`option`	Either `"glm"` or `"glmQL"`. If `"glm"`, `glmFit` and `glmLRT` are used; otherwise, `glmQLFit` and `glmQLFTest` are used. Details about the difference between two options are in the help page of `glmQLFit`.
`design`	A numeric design matrix. If `NULL`, all columns of the sample annotation will be used to create the design matrix.
`contrast`	A numeric vector specifying one contrast of the linear model coefficients to be tested equal to zero. Its length must equal to the number of columns of design. If `NULL`, the last coefficient will be tested equal to zero.
`filter_min_count`	A numeric value, passed to min.count of `filterByExpr`.
`filter_min_total_count`	A numeric value, passed to min.total.count of `filterByExpr`.
`filter_large_n`	A numeric value, passed to large.n of `filterByExpr`.
`filter_min_prop`	A numeric value, passed to min.prop of `filterByExpr`.
`normalize`	A logical scalar indicating whether to estimate normalization factors (using `calcNormFactors`).
`normalize_method`	Normalization method to be used. See `calcNormFactors` for more details.
`group_column`	The name of the column in the sample annotation providing group labels for samples (currently not used).
`design_terms`	The names of columns from the sample annotation that will be used to generate the design matrix. This is ignored if design is provided.
`...`	More arguments to pass to `glmFit` (`option = "glm"` or `glmQLFit` (`option = "glmQL"`).

Details

The experimental design is specified by a design matrix and provided via the argument design. More details about the calculation of normalization factor could be found from calcNormFactors.

Value

A list with entries edgeR_results, tree, and nodes_drop.

edgeR_results: The output of glmQLFTest or glmLRT depending on the specified option.
tree: The hierarchical structure of entities that was stored in the input TSE.
nodes_drop: A vector storing the alias node labels of entities that are filtered before analysis due to low counts.

Author(s)

Ruizhu Huang

Examples

suppressPackageStartupMessages({
    library(TreeSummarizedExperiment)
})

## Load example data set
lse <- readRDS(system.file("extdata", "da_sim_100_30_18de.rds",
                           package = "treeclimbR"))

## Aggregate counts on internal nodes
nodes <- showNode(tree = tinyTree, only.leaf = FALSE)
tse <- aggTSE(x = lse, rowLevel = nodes)

dd <- model.matrix(~ group, data = colData(tse))
out <- runDA(TSE = tse, feature_on_row = TRUE,
             assay = 1, option = "glmQL",
             design = dd, contrast = NULL,
             normalize = TRUE, filter_min_count = 2)
names(out)
out$nodes_drop
edgeR::topTags(out$edgeR_results, sort.by = "PValue")

fionarhuang/treeclimbR documentation built on June 14, 2025, 4:30 p.m.