View source: R/diffcyt_wrapper.R
diffcyt | R Documentation |
Wrapper function to run complete 'diffcyt' pipeline
diffcyt(
d_input,
experiment_info = NULL,
marker_info = NULL,
design = NULL,
formula = NULL,
contrast,
analysis_type = c("DA", "DS"),
method_DA = c("diffcyt-DA-edgeR", "diffcyt-DA-voom", "diffcyt-DA-GLMM"),
method_DS = c("diffcyt-DS-limma", "diffcyt-DS-LMM"),
markers_to_test = NULL,
clustering_to_use = NULL,
cols_to_include = NULL,
subsampling = FALSE,
n_sub = NULL,
seed_sub = NULL,
transform = TRUE,
cofactor = 5,
cols_clustering = NULL,
xdim = 10,
ydim = 10,
meta_clustering = FALSE,
meta_k = 40,
seed_clustering = NULL,
min_cells = 3,
min_samples = NULL,
normalize = FALSE,
norm_factors = "TMM",
trend_method = "none",
block_id = NULL,
trend = TRUE,
weights = TRUE,
plot = FALSE,
path = ".",
verbose = TRUE
)
d_input |
Input data. Must be either: (i) a |
experiment_info |
|
marker_info |
|
design |
Design matrix, created with |
formula |
Model formula object, created with |
contrast |
Contrast matrix, created with |
analysis_type |
Type of differential analysis to perform: differential abundance
(DA) of cell populations, or differential states (DS) within cell populations.
Options are |
method_DA |
Method to use for calculating differential abundance (DA) tests.
Options are |
method_DS |
Method to use for calculating differential state (DS) tests. Options
are |
markers_to_test |
(Optional) Logical vector specifying which markers to test for
differential expression (from the set of markers stored in the |
clustering_to_use |
(Optional) Column name indicating which set of cluster labels
to use for differential testing, when input data are provided as a |
cols_to_include |
Logical vector indicating which columns to include from the
input data. Default = all columns. See |
subsampling |
Whether to use random subsampling to select an equal number of cells
from each sample. Default = FALSE. See |
n_sub |
Number of cells to select from each sample by random subsampling, if
|
seed_sub |
Random seed for subsampling. Set to an integer value to generate
reproducible results. Default = |
transform |
Whether to apply 'arcsinh' transform. This may be set to FALSE if the
input data has already been transformed. Default = TRUE. See
|
cofactor |
Cofactor parameter for 'arcsinh' transform. Default = 5, which is
appropriate for mass cytometry (CyTOF) data. For fluorescence flow cytometry, we
recommend cofactor = 150 instead. See |
cols_clustering |
Columns to use for clustering. Default = |
xdim |
Horizontal length of grid for self-organizing map for FlowSOM clustering
(number of clusters = |
ydim |
Vertical length of grid for self-organizing map for FlowSOM clustering
(number of clusters = |
meta_clustering |
Whether to include FlowSOM 'meta-clustering' step. Default =
|
meta_k |
Number of meta-clusters for FlowSOM, if |
seed_clustering |
Random seed for clustering. Set to an integer value to generate
reproducible results. Default = |
min_cells |
Filtering parameter. Default = 3. Clusters are kept for differential
testing if they have at least |
min_samples |
Filtering parameter. Default = |
normalize |
Whether to include optional normalization factors to adjust for
composition effects. Default = FALSE. See |
norm_factors |
Normalization factors to use, if |
trend_method |
Method for estimating dispersion trend; passed to function
|
block_id |
(Optional) Vector or factor of block IDs (e.g. patient IDs) for paired
experimental designs, to be included as random effects (for method |
trend |
(Optional) Whether to fit a mean-variance trend when calculating moderated
tests with function |
weights |
(Optional) Whether to include precision weights (for method
|
plot |
Whether to save diagnostic plots (for method |
path |
Path for diagnostic plots, if |
verbose |
Whether to print status messages during each step of the pipeline. Default = TRUE. |
This wrapper function runs the complete 'diffcyt' analysis pipeline, by calling the functions for the individual steps in the pipeline in the correct sequence.
For more details about the functions for the individual steps, see the package vignette and the function help pages. Running the individual functions may provide additional flexibility, especially for complex analyses.
The input data can be provided as a flowSet
or a list of
flowFrames
, DataFrames
, data.frames
, or matrices
(one flowFrame
or list item per sample). Alternatively, it is also possible to
provide the input as a daFrame
object from the CATALYST
Bioconductor
package (Chevrier, Crowell, Zanotelli et al., 2018). This can be useful when initial
exploratory analyses and clustering have been performed using CATALYST
; the
daFrame
object from CATALYST
(containing cluster labels in the
rowData
) can then be provided directly to the diffcyt
functions for
differential testing.
Minimum required arguments when not providing a flowSet
or list of
flowFrames
, DataFrames
, data.frames
, or matrices:
d_input
experiment_info
marker_info
either design
or formula
(depending on the differential testing
method used)
contrast
analysis_type
Minimum required arguments when providing a CATALYST
daFrame
object:
d_input
either design
or formula
(depending on the differential testing
method used)
contrast
analysis_type
Returns a list containing the results object res
, as well as the data
objects d_se
, d_counts
, d_medians
,
d_medians_by_cluster_marker
, and d_medians_by_sample_marker
. (If a
CATALYST
daFrame
object was used as input, the output list contains
objects res
, d_counts
, and d_medians
.) The structure of
res
depends on the differential testing method used. See
testDA_edgeR
, testDA_voom
, testDA_GLMM
,
testDS_limma
, or testDS_LMM
.
# For a complete workflow example demonstrating each step in the 'diffcyt' pipeline,
# see the package vignette.
# Function to create random data (one sample)
d_random <- function(n = 20000, mean = 0, sd = 1, ncol = 20, cofactor = 5) {
d <- sinh(matrix(rnorm(n, mean, sd), ncol = ncol)) * cofactor
colnames(d) <- paste0("marker", sprintf("%02d", 1:ncol))
d
}
# Create random data (without differential signal)
set.seed(123)
d_input <- list(
sample1 = d_random(),
sample2 = d_random(),
sample3 = d_random(),
sample4 = d_random()
)
# Add differential abundance (DA) signal
ix_DA <- 801:900
ix_cols_type <- 1:10
d_input[[3]][ix_DA, ix_cols_type] <- d_random(n = 1000, mean = 2, ncol = 10)
d_input[[4]][ix_DA, ix_cols_type] <- d_random(n = 1000, mean = 2, ncol = 10)
# Add differential states (DS) signal
ix_DS <- 901:1000
ix_cols_DS <- 19:20
d_input[[1]][ix_DS, ix_cols_type] <- d_random(n = 1000, mean = 3, ncol = 10)
d_input[[2]][ix_DS, ix_cols_type] <- d_random(n = 1000, mean = 3, ncol = 10)
d_input[[3]][ix_DS, c(ix_cols_type, ix_cols_DS)] <- d_random(n = 1200, mean = 3, ncol = 12)
d_input[[4]][ix_DS, c(ix_cols_type, ix_cols_DS)] <- d_random(n = 1200, mean = 3, ncol = 12)
experiment_info <- data.frame(
sample_id = factor(paste0("sample", 1:4)),
group_id = factor(c("group1", "group1", "group2", "group2")),
stringsAsFactors = FALSE
)
marker_info <- data.frame(
channel_name = paste0("channel", sprintf("%03d", 1:20)),
marker_name = paste0("marker", sprintf("%02d", 1:20)),
marker_class = factor(c(rep("type", 10), rep("state", 10)),
levels = c("type", "state", "none")),
stringsAsFactors = FALSE
)
# Create design matrix
design <- createDesignMatrix(experiment_info, cols_design = "group_id")
# Create contrast matrix
contrast <- createContrast(c(0, 1))
# Test for differential abundance (DA) of clusters (using default method 'diffcyt-DA-edgeR')
out_DA <- diffcyt(d_input, experiment_info, marker_info,
design = design, contrast = contrast,
analysis_type = "DA", method_DA = "diffcyt-DA-edgeR",
seed_clustering = 123, verbose = FALSE)
# Test for differential states (DS) within clusters (using default method 'diffcyt-DS-limma')
out_DS <- diffcyt(d_input, experiment_info, marker_info,
design = design, contrast = contrast,
analysis_type = "DS", method_DS = "diffcyt-DS-limma",
seed_clustering = 123, verbose = FALSE)
# Display results for top DA clusters
topTable(out_DA, format_vals = TRUE)
# Display results for top DS cluster-marker combinations
topTable(out_DS, format_vals = TRUE)
# Plot heatmap for DA tests
plotHeatmap(out_DA, analysis_type = "DA")
# Plot heatmap for DS tests
plotHeatmap(out_DS, analysis_type = "DS")
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.