distinct_test: Test for differential state between two groups of samples,...

Description Usage Arguments Value Author(s) See Also Examples

View source: R/distinct_test.R

Description

distinct_test tests for differential state between two groups of samples.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
distinct_test(
  x,
  name_assays_expression = "logcounts",
  name_cluster = "cluster_id",
  name_sample = "sample_id",
  design,
  column_to_test = 2,
  P_1 = 100,
  P_2 = 500,
  P_3 = 2000,
  P_4 = 10000,
  N_breaks = 25,
  min_non_zero_cells = 20,
  n_cores = 1
)

Arguments

x

a linkS4class{SummarizedExperiment} or a linkS4class{SingleCellExperiment} object.

name_assays_expression

a character ("logcounts" by default), indicating the name of the assays(x) element which stores the expression data (i.e., assays(x)$name_assays_expression). We strongly encourage using normalized data, such as counts per million (CPM) or log2-CPM (e.g., 'logcounts' as created via scater::logNormCounts). In case additional covariates are provided (e.g., batch effects), we highly recommend using log-normalized data, such as log2-CPM (e.g., 'logcounts' as created via scater::logNormCounts).

name_cluster

a character ("cluster_id" by default), indicating the name of the colData(x) element which stores the cluster id of each cell (i.e., colData(x)$name_cluster).

name_sample

a character ("sample_id" by default), indicating the name of the colData(x) element which stores the sample id of each cell (i.e., colData(x)$name_sample).

design

a matrix or data.frame with the design matrix of the study (e.g., built via model.matrix(~batches), design must contain one row per sample, while columns include intercept, group and eventual covariates such as batches. Row names of design must indicate the sample ids, and correspond to the names in colData(x)$name_sample.

column_to_test

indicates the column(s) of the design one wants to test (do not include the intercept).

P_1

the number of permutations to use on all gene-cluster combinations.

P_2

the number of permutations to use, when a (raw) p-value is < 0.1 (500 by default).

P_3

the number of permutations to use, when a (raw) p-value is < 0.01 (2,000 by default).

P_4

the number of permutations to use, when a (raw) p-value is < 0.001 (10,000 by default). In order to obtain a finer ranking for the most significant genes, if computational resources are available, we encourage users to set P_4 = 20,000.

N_breaks

the number of breaks at which to evaluate the comulative density function.

min_non_zero_cells

the minimum number of non-zero cells (across all samples) in each cluster for a gene to be evaluated.

n_cores

the number of cores to parallelize the tasks on (parallelization is at the cluster level: each cluster is parallelized on a thread).

Value

A data.frame object. Columns 'gene' and 'cluster_id' contain the gene and cell-cluster name, while 'p_val', 'p_adj.loc' and 'p_adj.glb' report the raw p-values, locally and globally adjusted p-values, via Benjamini and Hochberg (BH) correction. In locally adjusted p-values ('p_adj.loc') BH correction is applied in each cluster separately, while in globally adjusted p-values ('p_adj.glb') BH correction is performed to the results from all clusters. Column 'filtered' indicates whether a gene-cluster result was filtered (if TRUE), or analyzed (if FALSE). A gene-cluster combination is filtered when fewer than 'min_non_zero_cells' non-zero cells are available. Filtered results have raw and adjusted p-values equal to 1.

Author(s)

Simone Tiberi simone.tiberi@uzh.ch

See Also

plot_cdfs, plot_densities, log2_FC, top_results

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
# load the input data:
data("Kang_subset", package = "distinct")
Kang_subset

# create the design of the study:
samples = Kang_subset@metadata$experiment_info$sample_id
group = Kang_subset@metadata$experiment_info$stim
design = model.matrix(~group)
# rownames of the design must indicate sample ids:
rownames(design) = samples
design

# Note that the sample names in `colData(x)$name_sample` have to be the same ones as those in `rownames(design)`.
rownames(design)
unique(SingleCellExperiment::colData(Kang_subset)$sample_id)

# In order to obtain a finer ranking for the most significant genes, if computational resources are available, we encourage users to increase P_4 (i.e., the number of permutations when a raw p-value is < 0.001) and set P_4 = 20,000 (by default P_4 = 10,000).

# The group we would like to test for is in the second column of the design, therefore we will specify: column_to_test = 2

set.seed(61217)
res = distinct_test(
  x = Kang_subset, 
  name_assays_expression = "logcounts",
  name_cluster = "cell",
  design = design,
  column_to_test = 2,
  min_non_zero_cells = 20,
  n_cores = 2)

# We can optionally add the fold change (FC) and log2-FC between groups:
res = log2_FC(res = res,
  x = Kang_subset, 
  name_assays_expression = "cpm",
  name_group = "stim",
  name_cluster = "cell")

# Visualize significant results:
head(top_results(res))

# Visualize significant results from a specified cluster of cells:
top_results(res, cluster = "Dendritic cells")

# By default, results from 'top_results' are sorted by (globally) adjusted p-value;
# they can also be sorted by log2-FC:
top_results(res, cluster = "Dendritic cells", sort_by = "log2FC")

# Visualize significant UP-regulated genes only:
top_results(res, up_down = "UP",
  cluster = "Dendritic cells")

# Plot density and cdf for gene 'ISG15' in cluster 'Dendritic cells'.
plot_densities(x = Kang_subset,
  gene = "ISG15",
  cluster = "Dendritic cells",
  name_assays_expression = "logcounts",
  name_cluster = "cell",
  name_sample = "sample_id",
  name_group = "stim")
 
 plot_cdfs(x = Kang_subset,
   gene = "ISG15",
   cluster = "Dendritic cells",
   name_assays_expression = "logcounts",
   name_cluster = "cell",
   name_sample = "sample_id",
   name_group = "stim")

distinct documentation built on Nov. 8, 2020, 8:20 p.m.