Description Usage Arguments Value Author(s) See Also Examples
View source: R/distinct_test.R
distinct_test
tests for differential state between two groups of samples.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 | distinct_test(
x,
name_assays_expression = "logcounts",
name_cluster = "cluster_id",
name_sample = "sample_id",
design,
column_to_test = 2,
P_1 = 100,
P_2 = 500,
P_3 = 2000,
P_4 = 10000,
N_breaks = 25,
min_non_zero_cells = 20,
n_cores = 1
)
|
x |
a |
name_assays_expression |
a character ("logcounts" by default),
indicating the name of the assays(x) element which stores the expression data (i.e., assays(x)$name_assays_expression).
We strongly encourage using normalized data, such as counts per million (CPM) or log2-CPM (e.g., 'logcounts' as created via |
name_cluster |
a character ("cluster_id" by default), indicating the name of the colData(x) element which stores the cluster id of each cell (i.e., colData(x)$name_cluster). |
name_sample |
a character ("sample_id" by default), indicating the name of the colData(x) element which stores the sample id of each cell (i.e., colData(x)$name_sample). |
design |
a |
column_to_test |
indicates the column(s) of the design one wants to test (do not include the intercept). |
P_1 |
the number of permutations to use on all gene-cluster combinations. |
P_2 |
the number of permutations to use, when a (raw) p-value is < 0.1 (500 by default). |
P_3 |
the number of permutations to use, when a (raw) p-value is < 0.01 (2,000 by default). |
P_4 |
the number of permutations to use, when a (raw) p-value is < 0.001 (10,000 by default). In order to obtain a finer ranking for the most significant genes, if computational resources are available, we encourage users to set P_4 = 20,000. |
N_breaks |
the number of breaks at which to evaluate the comulative density function. |
min_non_zero_cells |
the minimum number of non-zero cells (across all samples) in each cluster for a gene to be evaluated. |
n_cores |
the number of cores to parallelize the tasks on (parallelization is at the cluster level: each cluster is parallelized on a thread). |
A data.frame
object.
Columns 'gene' and 'cluster_id' contain the gene and cell-cluster name, while 'p_val', 'p_adj.loc' and 'p_adj.glb' report the raw p-values, locally and globally adjusted p-values, via Benjamini and Hochberg (BH) correction.
In locally adjusted p-values ('p_adj.loc') BH correction is applied in each cluster separately, while in globally adjusted p-values ('p_adj.glb') BH correction is performed to the results from all clusters.
Column 'filtered' indicates whether a gene-cluster result was filtered (if TRUE), or analyzed (if FALSE).
A gene-cluster combination is filtered when fewer than 'min_non_zero_cells' non-zero cells are available.
Filtered results have raw and adjusted p-values equal to 1.
Simone Tiberi simone.tiberi@uzh.ch
plot_cdfs
, plot_densities
, log2_FC
, top_results
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 | # load the input data:
data("Kang_subset", package = "distinct")
Kang_subset
# create the design of the study:
samples = Kang_subset@metadata$experiment_info$sample_id
group = Kang_subset@metadata$experiment_info$stim
design = model.matrix(~group)
# rownames of the design must indicate sample ids:
rownames(design) = samples
design
# Note that the sample names in `colData(x)$name_sample` have to be the same ones as those in `rownames(design)`.
rownames(design)
unique(SingleCellExperiment::colData(Kang_subset)$sample_id)
# In order to obtain a finer ranking for the most significant genes, if computational resources are available, we encourage users to increase P_4 (i.e., the number of permutations when a raw p-value is < 0.001) and set P_4 = 20,000 (by default P_4 = 10,000).
# The group we would like to test for is in the second column of the design, therefore we will specify: column_to_test = 2
set.seed(61217)
res = distinct_test(
x = Kang_subset,
name_assays_expression = "logcounts",
name_cluster = "cell",
design = design,
column_to_test = 2,
min_non_zero_cells = 20,
n_cores = 2)
# We can optionally add the fold change (FC) and log2-FC between groups:
res = log2_FC(res = res,
x = Kang_subset,
name_assays_expression = "cpm",
name_group = "stim",
name_cluster = "cell")
# Visualize significant results:
head(top_results(res))
# Visualize significant results from a specified cluster of cells:
top_results(res, cluster = "Dendritic cells")
# By default, results from 'top_results' are sorted by (globally) adjusted p-value;
# they can also be sorted by log2-FC:
top_results(res, cluster = "Dendritic cells", sort_by = "log2FC")
# Visualize significant UP-regulated genes only:
top_results(res, up_down = "UP",
cluster = "Dendritic cells")
# Plot density and cdf for gene 'ISG15' in cluster 'Dendritic cells'.
plot_densities(x = Kang_subset,
gene = "ISG15",
cluster = "Dendritic cells",
name_assays_expression = "logcounts",
name_cluster = "cell",
name_sample = "sample_id",
name_group = "stim")
plot_cdfs(x = Kang_subset,
gene = "ISG15",
cluster = "Dendritic cells",
name_assays_expression = "logcounts",
name_cluster = "cell",
name_sample = "sample_id",
name_group = "stim")
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.