test_de: Test for Differential Expression
In const-ae/glmGamPoi: Fit a Gamma-Poisson Generalized Linear Model

test_de

R Documentation

Test for Differential Expression

Description

Conduct a quasi-likelihood ratio test for a Gamma-Poisson fit.

Usage

test_de(
  fit,
  contrast,
  reduced_design = NULL,
  full_design = fit$model_matrix,
  subset_to = NULL,
  pseudobulk_by = NULL,
  pval_adjust_method = "BH",
  sort_by = NULL,
  decreasing = FALSE,
  n_max = Inf,
  max_lfc = 10,
  compute_lfc_se = FALSE,
  verbose = FALSE
)

Arguments

`fit`	object of class `glmGamPoi`. Usually the result of calling `glm_gp(data, ...)`
`contrast`	The contrast to test. Can be a single column name (quoted or as a string) that is removed from the full model matrix of `fit`. Or a complex contrast comparing two or more columns: e.g. `A - B`, `"A - 3 * B"`, `(A + B) / 2 - C` etc. For complicated experimental design that involved nested conditions, you specify the condition level to compare using the `cond()` helper function. Only one of `contrast` or `reduced_design` must be specified.
`reduced_design`	a specification of the reduced design used as a comparison to see what how much better `fit` describes the data. Analogous to the `design` parameter in `glm_gp()`, it can be either a `formula`, a `model.matrix()`, or a `vector`. Only one of `contrast` or `reduced_design` must be specified.
`full_design`	option to specify an alternative `full_design` that can differ from `fit$model_matrix`. Can be a `formula` or a `matrix`. Default: `fit$model_matrix`
`subset_to`	a vector with the same length as `ncol(fit$data)` or an expression that evaluates to such a vector. The expression can reference columns from `colData(fit$data)`. A typical use case in single cell analysis would be to subset to a specific cell type (e.g. `subset_to = cell_type == "T-cells"`). Note that if this argument is set a new the model for the `full_design` is re-fit. Default: `NULL` which means that the data is not subset.
`pseudobulk_by`	DEPRECTATED, please use the pseudobulk function instead. A vector with the same length as `ncol(fit$data)` that is used to split the columns into different groups (calls `split()`). `pseudobulk_by` can also be an expression that evaluates to a vector. The expression can reference columns from `colData(fit$data)`. The counts are summed across the groups to create "pseudobulk" samples. This is typically used in single cell analysis if the cells come from different samples to get a proper estimate of the differences. This is particularly powerful in combination with the `subset_to` parameter to analyze differences between samples for subgroups of cells. Note that this does a fresh fit for both the full and the reduced design. Default: `NULL` which means that the data is not aggregated.
`pval_adjust_method`	one of the p-value adjustment method from p.adjust.methods. Default: `"BH"`.
`sort_by`	the name of the column or an expression used to sort the result. If `sort_by` is `NULL` the table is not sorted. Default: `NULL`
`decreasing`	boolean to decide if the result is sorted increasing or decreasing order. Default: `FALSE`.
`n_max`	the maximum number of rows to return. Default: `Inf` which means that all rows are returned
`max_lfc`	set the maximum absolute log fold change that is returned. Large log fold changes occur for lowly expressed genes because the ratio of two small numbers can be impractically large. For example, limiting the range of log fold changes can clarify the patterns in a volcano plot. Default: `10` which corresponds to a thousand-fold (2^10) increase in expression.
`compute_lfc_se`	Compute standard errors for the log fold changes, and add a column `lfc_se` to the returned dataframe. Only has an effect when using `contrast` instead of `reduced_design`. Default: `FALSE`
`verbose`	a boolean that indicates if information about the individual steps are printed while fitting the GLM. Default: `FALSE`.

Details

The cond() helper function simplifies the specification of a contrast for complex experimental designs. Instead of working out which combination of coefficients corresponds to a research question, you can simply specify the two conditions that you want to compare.

You can only call the cond function inside the contrast argument. The arguments are the selected factor levels for each covariate. To compare two conditions, simply subtract the two cond calls. Internally, the package calls model.matrix using the provided values and the original formula from the fit to produce a vector. Subtracting two of these vectors produces a contrast vector. Missing covariates are filled with the first factor level or zero for numerical covariates.

Value

a data.frame with the following columns

name: the rownames of the input data
pval: the p-values of the quasi-likelihood ratio test
adj_pval: the adjusted p-values returned from p.adjust()
f_statistic: the F-statistic: F = (Dev_{full} - Dev_{red}) / (df_1 * disp_{ql-shrunken})
df1: the degrees of freedom of the test: ncol(design) - ncol(reduced_design)
df2: the degrees of freedom of the fit: ncol(data) - ncol(design) + df_0
lfc: the log2-fold change. If the alternative model is specified by reduced_design will be NA.
lfc_se: the standard error of the log2-fold change. Only calculated if compute_lfc_se == TRUE.

References

Lund, S. P., Nettleton, D., McCarthy, D. J., & Smyth, G. K. (2012). Detecting differential expression in RNA-sequence data using quasi-likelihood with shrunken dispersion estimates. Statistical Applications in Genetics and Molecular Biology, 11(5). https://doi.org/10.1515/1544-6115.1826.

Examples

 # Make Data
 Y <- matrix(rnbinom(n = 30 * 100, mu = 4, size = 0.3), nrow = 30, ncol = 100)
 annot <- data.frame(mouse = sample(LETTERS[1:6], size = 100, replace = TRUE),
        celltype = sample(c("Tcell", "Bcell", "Macrophages"), size = 100, replace = TRUE),
        cont1 = rnorm(100), cont2 = rnorm(100, mean = 30))
 annot$condition <- ifelse(annot$mouse %in% c("A", "B", "C"), "ctrl", "treated")
 head(annot)
 se <- SummarizedExperiment::SummarizedExperiment(Y, colData = annot)

 # Fit model
 fit <- glm_gp(se, design = ~ condition + celltype + cont1 + cont2)
 # Test with reduced design
 res <- test_de(fit, reduced_design = ~ celltype + cont1 + cont2)
 head(res)
 # Test with contrast argument, the results are identical
 res2 <- test_de(fit, contrast = conditiontreated)
 head(res2)
 # Test with explicit specification of the conditions
 # The results are still identical
 res3 <- test_de(fit, contrast = cond(condition = "treated", celltype = "Bcell") -
                                    cond(condition = "ctrl", celltype = "Bcell"))
 head(res3)


 # The column names of fit$Beta are valid variables in the contrast argument
 colnames(fit$Beta)
 # You can also have more complex contrasts:
 # the following compares cont1 vs cont2:
 test_de(fit, cont1 - cont2, n_max = 4)
 # You can also sort the output
 test_de(fit, cont1 - cont2, n_max = 4,
         sort_by = "pval")
 test_de(fit, cont1 - cont2, n_max = 4,
         sort_by = - abs(f_statistic))

 # If the data has multiple samples, it is a good
 # idea to aggregate the cell counts by samples.
 # This is called forming a "pseudobulk".
 se_reduced <- pseudobulk(se, group_by = vars(mouse, condition, celltype),
                          cont1 = mean(cont1), cont2 = min(cont2))
 fit_reduced <- glm_gp(se_reduced, design = ~ condition + celltype)
 test_de(fit_reduced, contrast = "conditiontreated", n_max = 4)
 test_de(fit_reduced, contrast = cond(condition = "treated", celltype = "Macrophages") -
                                      cond(condition = "ctrl", celltype = "Macrophages"),
         n_max = 4)

const-ae/glmGamPoi documentation built on April 5, 2025, 5:58 p.m.