test_de: Test for Differential Expression
In glmGamPoi: Fit a Gamma-Poisson Generalized Linear Model

Description Usage Arguments Value References See Also Examples

Conduct a quasi-likelihood ratio test for a Gamma-Poisson fit.

test_de(
  fit,
  contrast,
  reduced_design = NULL,
  full_design = fit$model_matrix,
  subset_to = NULL,
  pseudobulk_by = NULL,
  pval_adjust_method = "BH",
  sort_by = NULL,
  decreasing = FALSE,
  n_max = Inf,
  verbose = FALSE
)

`fit`	object of class `glmGamPoi`. Usually the result of calling `glm_gp(data, ...)`
`contrast`	The contrast to test. Can be a single column name (quoted or as a string) that is removed from the full model matrix of `fit`. Or a complex contrast comparing two or more columns: e.g. `A - B`, `"A - 3 * B"`, `(A + B) / 2 - C` etc. Only one of `contrast` or `reduced_design` must be specified.
`reduced_design`	a specification of the reduced design used as a comparison to see what how much better `fit` describes the data. Analogous to the `design` parameter in `glm_gp()`, it can be either a `formula`, a `model.matrix()`, or a `vector`. Only one of `contrast` or `reduced_design` must be specified.
`full_design`	option to specify an alternative `full_design` that can differ from `fit$model_matrix`. Can be a `formula` or a `matrix`. Default: `fit$model_matrix`
`subset_to`	a vector with the same length as `ncol(fit$data)` or an expression that evaluates to such a vector. The expression can reference columns from `colData(fit$data)`. A typical use case in single cell analysis would be to subset to a specific cell type (e.g. `subset_to = cell_type == "T-cells"`). Note that if this argument is set a new the model for the `full_design` is re-fit. Default: `NULL` which means that the data is not subset.
`pseudobulk_by`	a vector with the same length as `ncol(fit$data)` that is used to split the columns into different groups (calls `split()`). `pseudobulk_by` can also be an expression that evaluates to a vector. The expression can reference columns from `colData(fit$data)`. The counts are summed across the groups to create "pseudobulk" samples. This is typically used in single cell analysis if the cells come from different samples to get a proper estimate of the differences. This is particularly powerful in combination with the `subset_to` parameter to analyze differences between samples for subgroups of cells. Note that this does a fresh fit for both the full and the reduced design. Default: `NULL` which means that the data is not aggregated.
`pval_adjust_method`	one of the p-value adjustment method from p.adjust.methods. Default: `"BH"`.
`sort_by`	the name of the column or an expression used to sort the result. If `sort_by` is `NULL` the table is not sorted. Default: `NULL`
`decreasing`	boolean to decide if the result is sorted increasing or decreasing order. Default: `FALSE`.
`n_max`	the maximum number of rows to return. Default: `Inf` which means that all rows are returned
`verbose`	a boolean that indicates if information about the individual steps are printed while fitting the GLM. Default: `FALSE`.

a data.frame with the following columns

name: the rownames of the input data
pval: the p-values of the quasi-likelihood ratio test
adj_pval: the adjusted p-values returned from p.adjust()
f_statistic: the F-statistic: F = (Dev_full - Dev_red) / (df_1 * disp_ql-shrunken)
df1: the degrees of freedom of the test: ncol(design) - ncol(reduced_design)
df2: the degrees of freedom of the fit: ncol(data) - ncol(design) + df_0
lfc: the log2-fold change. If the alternative model is specified by reduced_design will be NA.

Lund, S. P., Nettleton, D., McCarthy, D. J., & Smyth, G. K. (2012). Detecting differential expression in RNA-sequence data using quasi-likelihood with shrunken dispersion estimates. Statistical Applications in Genetics and Molecular Biology, 11(5). https://doi.org/10.1515/1544-6115.1826.

glm_gp()

  Y <- matrix(rnbinom(n = 30 * 100, mu = 4, size = 0.3), nrow = 30, ncol = 100)
  annot <- data.frame(sample = sample(LETTERS[1:6], size = 100, replace = TRUE),
                      cont1 = rnorm(100), cont2 = rnorm(100, mean = 30))
  annot$condition <- ifelse(annot$sample %in% c("A", "B", "C"), "ctrl", "treated")
  head(annot)
  se <- SummarizedExperiment::SummarizedExperiment(Y, colData = annot)
  fit <- glm_gp(se, design = ~ condition + cont1 + cont2)

  # Test with reduced design
  res <- test_de(fit, reduced_design = ~ condition + cont1)
  head(res)

  # Test with contrast argument, the results are identical
  res2 <- test_de(fit, contrast = cont2)
  head(res2)

  # The column names of fit$Beta are valid variables in the contrast argument
  colnames(fit$Beta)


  # You can also have more complex contrasts:
  # the following compares cont1 vs cont2:
  test_de(fit, cont1 - cont2, n_max = 4)


  # You can also sort the output
  test_de(fit, cont1 - cont2, n_max = 4,
          sort_by = "pval")

  test_de(fit, cont1 - cont2, n_max = 4,
          sort_by = - abs(f_statistic))

  # If the data has multiple samples, it is a good
  # idea to aggregate the cell counts by samples.
  # This is called "pseudobulk".
  test_de(fit, contrast = "conditiontreated", n_max = 4,
          pseudobulk_by = sample)


  # You can also do the pseudobulk only on a subset of cells:
  cell_types <- sample(c("Tcell", "Bcell", "Makrophages"), size = 100, replace = TRUE)
  test_de(fit, contrast = "conditiontreated", n_max = 4,
          pseudobulk_by = sample,
          subset_to = cell_types == "Bcell")


  # Be care full, if you included the cell type information in
  # the original fit, after subsetting the design matrix would
  # be degenerate. To fix this, specify the full_design in 'test_de()'
  SummarizedExperiment::colData(se)$ct <- cell_types
  fit_with_celltype <- glm_gp(se, design = ~ condition + cont1 + cont2 + ct)
  test_de(fit_with_celltype, contrast = cont1, n_max = 4,
          full_design =  ~ condition + cont1 + cont2,
          pseudobulk_by = sample,
          subset_to = ct == "Bcell")