test_de: Test for Differential Expression

Description Usage Arguments Value References See Also Examples

View source: R/test_de.R

Description

Conduct a quasi-likelihood ratio test for a Gamma-Poisson fit.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
test_de(
  fit,
  contrast,
  reduced_design = NULL,
  full_design = fit$model_matrix,
  subset_to = NULL,
  pseudobulk_by = NULL,
  pval_adjust_method = "BH",
  sort_by = NULL,
  decreasing = FALSE,
  n_max = Inf,
  verbose = FALSE
)

Arguments

fit

object of class glmGamPoi. Usually the result of calling glm_gp(data, ...)

contrast

The contrast to test. Can be a single column name (quoted or as a string) that is removed from the full model matrix of fit. Or a complex contrast comparing two or more columns: e.g. A - B, "A - 3 * B", (A + B) / 2 - C etc.
Only one of contrast or reduced_design must be specified.

reduced_design

a specification of the reduced design used as a comparison to see what how much better fit describes the data. Analogous to the design parameter in glm_gp(), it can be either a formula, a model.matrix(), or a vector.
Only one of contrast or reduced_design must be specified.

full_design

option to specify an alternative full_design that can differ from fit$model_matrix. Can be a formula or a matrix. Default: fit$model_matrix

subset_to

a vector with the same length as ncol(fit$data) or an expression that evaluates to such a vector. The expression can reference columns from colData(fit$data). A typical use case in single cell analysis would be to subset to a specific cell type (e.g. subset_to = cell_type == "T-cells"). Note that if this argument is set a new the model for the full_design is re-fit.
Default: NULL which means that the data is not subset.

pseudobulk_by

a vector with the same length as ncol(fit$data) that is used to split the columns into different groups (calls split()). pseudobulk_by can also be an expression that evaluates to a vector. The expression can reference columns from colData(fit$data).
The counts are summed across the groups to create "pseudobulk" samples. This is typically used in single cell analysis if the cells come from different samples to get a proper estimate of the differences. This is particularly powerful in combination with the subset_to parameter to analyze differences between samples for subgroups of cells. Note that this does a fresh fit for both the full and the reduced design. Default: NULL which means that the data is not aggregated.

pval_adjust_method

one of the p-value adjustment method from p.adjust.methods. Default: "BH".

sort_by

the name of the column or an expression used to sort the result. If sort_by is NULL the table is not sorted. Default: NULL

decreasing

boolean to decide if the result is sorted increasing or decreasing order. Default: FALSE.

n_max

the maximum number of rows to return. Default: Inf which means that all rows are returned

verbose

a boolean that indicates if information about the individual steps are printed while fitting the GLM. Default: FALSE.

Value

a data.frame with the following columns

name

the rownames of the input data

pval

the p-values of the quasi-likelihood ratio test

adj_pval

the adjusted p-values returned from p.adjust()

f_statistic

the F-statistic: F = (Dev_full - Dev_red) / (df_1 * disp_ql-shrunken)

df1

the degrees of freedom of the test: ncol(design) - ncol(reduced_design)

df2

the degrees of freedom of the fit: ncol(data) - ncol(design) + df_0

lfc

the log2-fold change. If the alternative model is specified by reduced_design will be NA.

References

See Also

glm_gp()

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
  Y <- matrix(rnbinom(n = 30 * 100, mu = 4, size = 0.3), nrow = 30, ncol = 100)
  annot <- data.frame(sample = sample(LETTERS[1:6], size = 100, replace = TRUE),
                      cont1 = rnorm(100), cont2 = rnorm(100, mean = 30))
  annot$condition <- ifelse(annot$sample %in% c("A", "B", "C"), "ctrl", "treated")
  head(annot)
  se <- SummarizedExperiment::SummarizedExperiment(Y, colData = annot)
  fit <- glm_gp(se, design = ~ condition + cont1 + cont2)

  # Test with reduced design
  res <- test_de(fit, reduced_design = ~ condition + cont1)
  head(res)

  # Test with contrast argument, the results are identical
  res2 <- test_de(fit, contrast = cont2)
  head(res2)

  # The column names of fit$Beta are valid variables in the contrast argument
  colnames(fit$Beta)


  # You can also have more complex contrasts:
  # the following compares cont1 vs cont2:
  test_de(fit, cont1 - cont2, n_max = 4)


  # You can also sort the output
  test_de(fit, cont1 - cont2, n_max = 4,
          sort_by = "pval")

  test_de(fit, cont1 - cont2, n_max = 4,
          sort_by = - abs(f_statistic))

  # If the data has multiple samples, it is a good
  # idea to aggregate the cell counts by samples.
  # This is called "pseudobulk".
  test_de(fit, contrast = "conditiontreated", n_max = 4,
          pseudobulk_by = sample)


  # You can also do the pseudobulk only on a subset of cells:
  cell_types <- sample(c("Tcell", "Bcell", "Makrophages"), size = 100, replace = TRUE)
  test_de(fit, contrast = "conditiontreated", n_max = 4,
          pseudobulk_by = sample,
          subset_to = cell_types == "Bcell")


  # Be care full, if you included the cell type information in
  # the original fit, after subsetting the design matrix would
  # be degenerate. To fix this, specify the full_design in 'test_de()'
  SummarizedExperiment::colData(se)$ct <- cell_types
  fit_with_celltype <- glm_gp(se, design = ~ condition + cont1 + cont2 + ct)
  test_de(fit_with_celltype, contrast = cont1, n_max = 4,
          full_design =  ~ condition + cont1 + cont2,
          pseudobulk_by = sample,
          subset_to = ct == "Bcell")

glmGamPoi documentation built on Nov. 8, 2020, 7:14 p.m.