View source: R/censcyt_wrapper.R
| censcyt | R Documentation |
censcyt pipelineWrapper function to run complete censcyt pipeline
censcyt(
d_input,
experiment_info = NULL,
marker_info = NULL,
design = NULL,
formula = NULL,
contrast,
analysis_type = c("DA"),
method_DA = c("censcyt-DA-censored-GLMM"),
markers_to_test = NULL,
clustering_to_use = NULL,
cols_to_include = NULL,
subsampling = FALSE,
n_sub = NULL,
seed_sub = NULL,
transform = TRUE,
cofactor = 5,
cols_clustering = NULL,
xdim = 10,
ydim = 10,
meta_clustering = FALSE,
meta_k = 40,
seed_clustering = NULL,
min_cells = 3,
min_samples = NULL,
normalize = FALSE,
norm_factors = "TMM",
verbose = TRUE,
mi_reps = 10,
imputation_method = c("km", "km_exp", "km_wei", "km_os", "rs", "mrl", "cc", "pmm"),
BPPARAM = BiocParallel::SerialParam()
)
d_input |
Input data. Must be either: (i) a |
experiment_info |
|
marker_info |
|
design |
Design matrix, created with |
formula |
Model formula object, created with |
contrast |
Contrast matrix, created with |
analysis_type |
Type of differential analysis to perform: differential abundance
(DA) of cell populations. The only option at the moment is |
method_DA |
Method to use for calculating differential abundance (DA) tests.
Currently the only option is |
markers_to_test |
(Optional) Logical vector specifying which markers to test for
differential expression (from the set of markers stored in the |
clustering_to_use |
(Optional) Column name indicating which set of cluster labels
to use for differential testing, when input data are provided as a |
cols_to_include |
Logical vector indicating which columns to include from the
input data. Default = all columns. See |
subsampling |
Whether to use random subsampling to select an equal number of cells
from each sample. Default = FALSE. See |
n_sub |
Number of cells to select from each sample by random subsampling, if
|
seed_sub |
Random seed for subsampling. Set to an integer value to generate
reproducible results. Default = |
transform |
Whether to apply 'arcsinh' transform. This may be set to FALSE if the
input data has already been transformed. Default = TRUE. See
|
cofactor |
Cofactor parameter for 'arcsinh' transform. Default = 5, which is
appropriate for mass cytometry (CyTOF) data. For fluorescence flow cytometry,
cofactor = 150 is recommended instead. See |
cols_clustering |
Columns to use for clustering. Default = |
xdim |
Horizontal length of grid for self-organizing map for FlowSOM clustering
(number of clusters = |
ydim |
Vertical length of grid for self-organizing map for FlowSOM clustering
(number of clusters = |
meta_clustering |
Whether to include FlowSOM 'meta-clustering' step. Default =
|
meta_k |
Number of meta-clusters for FlowSOM, if |
seed_clustering |
Random seed for clustering. Set to an integer value to generate
reproducible results. Default = |
min_cells |
Filtering parameter. Default = 3. Clusters are kept for differential
testing if they have at least |
min_samples |
Filtering parameter. Default = |
normalize |
Whether to include optional normalization factors to adjust for
composition effects. Default = FALSE. See |
norm_factors |
Normalization factors to use, if |
verbose |
Whether to print status messages during each step of the pipeline. Default = TRUE. |
mi_reps |
Number of imputations in multiple imputation.
Default = 10. See |
imputation_method |
Method to be used in the imputation step.
One of |
BPPARAM |
Specification of parallelization option as one of
|
This wrapper function runs the complete diffcyt analysis
pipeline where the only difference is the analysis step which uses the functions
from censcyt (which is currently only testDA_censoredGLMM).
For more details about the functions for the individual steps, see
diffcyt, the diffcyt vignette,
the censcyt package vignette and the function help pages. The following
is a slightly adapted summary from diffcyt:
Running the individual functions may provide additional flexibility, especially for complex analyses.
The input data can be provided as a flowSet-class or a list of
flowFrame-classs, DataFrames, data.frames, or matrices
(one flowFrame or list item per sample). Alternatively, it is also possible to
provide the input as a daFrame object from the CATALYST Bioconductor
package (Chevrier, Crowell, Zanotelli et al., 2018). This can be useful when initial
exploratory analyses and clustering have been performed using CATALYST; the
daFrame object from CATALYST (containing cluster labels in the
rowData) can then be provided directly to the censcyt functions for
differential testing.
Minimum required arguments when not providing a flowSet-class or list of
flowFrame-classs, DataFrames, data.frames, or matrices:
d_input
experiment_info
marker_info
either design or formula (depending on the differential testing
method used)
contrast
analysis_type
Minimum required arguments when providing a CATALYST daFrame object:
d_input
either design or formula (depending on the differential testing
method used)
contrast
analysis_type
Returns a list containing the results object res, as well as the data
objects d_se, d_counts, d_medians,
d_medians_by_cluster_marker, and d_medians_by_sample_marker. (If a
CATALYST daFrame object was used as input, the output list contains
objects res, d_counts, and d_medians.)
# Function to create random data (one sample)
fcs_sim <- function(n = 2000, mean = 0, sd = 1, ncol = 10, cofactor = 5) {
d <- matrix(sinh(rnorm(n*ncol, mean, sd)) * cofactor,ncol=ncol)
for(i in seq_len(ncol)){
d[seq(n/ncol*(i-1)+1,n/ncol*(i)),i] <- sinh(rnorm(n/ncol, mean+5, sd)) * cofactor
}
colnames(d) <- paste0("marker", sprintf("%02d", 1:ncol))
d
}
# Create random data (without differential signal)
set.seed(123)
d_input <- lapply(1:50, function(i) fcs_sim())
# simulate survival time
d_surv <- simulate_singlecluster(50, formula(Y~Surv(X,I)))[c("X","I","TrVal")]
# Add differential abundance (DA) signal
for(i in 1:50){
# number of cells in cluster 1
n_da <- round(sqrt(2000*d_surv$TrVal[i]))*9
# set to no expression
tmpd <- matrix(sinh(rnorm(n_da*10, 0, 1)) * 5, ncol=10)
# increase expresion for cluster 1
tmpd[ ,1] <- sinh(rnorm(n_da, 5, 1)) * 5
d_input[[i]][seq_len(n_da), ] <- tmpd
}
experiment_info <- data.frame(
sample_id = factor(paste0("sample", 1:50)),
survival_time = d_surv$X,
event_indicator= d_surv$I,
stringsAsFactors = FALSE
)
marker_info <- data.frame(
channel_name = paste0("channel", sprintf("%03d", 1:10)),
marker_name = paste0("marker", sprintf("%02d", 1:10)),
marker_class = factor(c(rep("type", 10)),
levels = c("type", "state", "none")),
stringsAsFactors = FALSE
)
# Create formula
da_formula <- createFormula(experiment_info, cols_fixed="survival_time",
cols_random = "sample_id",event_indicator = "event_indicator")
# Create contrast matrix
contrast <- diffcyt::createContrast(c(0, 1))
# Test for differential abundance (DA) of clusters
out_DA <- censcyt(d_input, experiment_info, marker_info,
formula = da_formula, contrast = contrast,
analysis_type = "DA", method_DA = "censcyt-DA-censored-GLMM",
seed_clustering = 123, verbose = FALSE, mi_reps = 3,
BPPARAM=BiocParallel::MulticoreParam(workers = 1),
imputation_method = "mrl",meta_clustering = TRUE, meta_k = 10)
# Display results for top DA clusters
diffcyt::topTable(out_DA, format_vals = TRUE)
# Plot heatmap for DA tests
diffcyt::plotHeatmap(out_DA, analysis_type = "DA")
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.