View source: R/censcyt_wrapper.R
censcyt | R Documentation |
censcyt
pipelineWrapper function to run complete censcyt
pipeline
censcyt( d_input, experiment_info = NULL, marker_info = NULL, design = NULL, formula = NULL, contrast, analysis_type = c("DA"), method_DA = c("censcyt-DA-censored-GLMM"), markers_to_test = NULL, clustering_to_use = NULL, cols_to_include = NULL, subsampling = FALSE, n_sub = NULL, seed_sub = NULL, transform = TRUE, cofactor = 5, cols_clustering = NULL, xdim = 10, ydim = 10, meta_clustering = FALSE, meta_k = 40, seed_clustering = NULL, min_cells = 3, min_samples = NULL, normalize = FALSE, norm_factors = "TMM", verbose = TRUE, mi_reps = 10, imputation_method = c("km", "km_exp", "km_wei", "km_os", "rs", "mrl", "cc", "pmm"), BPPARAM = BiocParallel::SerialParam() )
d_input |
Input data. Must be either: (i) a |
experiment_info |
|
marker_info |
|
design |
Design matrix, created with |
formula |
Model formula object, created with |
contrast |
Contrast matrix, created with |
analysis_type |
Type of differential analysis to perform: differential abundance
(DA) of cell populations. The only option at the moment is |
method_DA |
Method to use for calculating differential abundance (DA) tests.
Currently the only option is |
markers_to_test |
(Optional) Logical vector specifying which markers to test for
differential expression (from the set of markers stored in the |
clustering_to_use |
(Optional) Column name indicating which set of cluster labels
to use for differential testing, when input data are provided as a |
cols_to_include |
Logical vector indicating which columns to include from the
input data. Default = all columns. See |
subsampling |
Whether to use random subsampling to select an equal number of cells
from each sample. Default = FALSE. See |
n_sub |
Number of cells to select from each sample by random subsampling, if
|
seed_sub |
Random seed for subsampling. Set to an integer value to generate
reproducible results. Default = |
transform |
Whether to apply 'arcsinh' transform. This may be set to FALSE if the
input data has already been transformed. Default = TRUE. See
|
cofactor |
Cofactor parameter for 'arcsinh' transform. Default = 5, which is
appropriate for mass cytometry (CyTOF) data. For fluorescence flow cytometry,
cofactor = 150 is recommended instead. See |
cols_clustering |
Columns to use for clustering. Default = |
xdim |
Horizontal length of grid for self-organizing map for FlowSOM clustering
(number of clusters = |
ydim |
Vertical length of grid for self-organizing map for FlowSOM clustering
(number of clusters = |
meta_clustering |
Whether to include FlowSOM 'meta-clustering' step. Default =
|
meta_k |
Number of meta-clusters for FlowSOM, if |
seed_clustering |
Random seed for clustering. Set to an integer value to generate
reproducible results. Default = |
min_cells |
Filtering parameter. Default = 3. Clusters are kept for differential
testing if they have at least |
min_samples |
Filtering parameter. Default = |
normalize |
Whether to include optional normalization factors to adjust for
composition effects. Default = FALSE. See |
norm_factors |
Normalization factors to use, if |
verbose |
Whether to print status messages during each step of the pipeline. Default = TRUE. |
mi_reps |
Number of imputations in multiple imputation.
Default = 10. See |
imputation_method |
Method to be used in the imputation step.
One of |
BPPARAM |
Specification of parallelization option as one of
|
This wrapper function runs the complete diffcyt
analysis
pipeline where the only difference is the analysis step which uses the functions
from censcyt
(which is currently only testDA_censoredGLMM
).
For more details about the functions for the individual steps, see
diffcyt
, the diffcyt
vignette,
the censcyt
package vignette and the function help pages. The following
is a slightly adapted summary from diffcyt
:
Running the individual functions may provide additional flexibility, especially for complex analyses.
The input data can be provided as a flowSet-class
or a list of
flowFrame-classs
, DataFrames
, data.frames
, or matrices
(one flowFrame
or list item per sample). Alternatively, it is also possible to
provide the input as a daFrame
object from the CATALYST
Bioconductor
package (Chevrier, Crowell, Zanotelli et al., 2018). This can be useful when initial
exploratory analyses and clustering have been performed using CATALYST
; the
daFrame
object from CATALYST
(containing cluster labels in the
rowData
) can then be provided directly to the censcyt
functions for
differential testing.
Minimum required arguments when not providing a flowSet-class
or list of
flowFrame-classs
, DataFrames
, data.frames
, or matrices:
d_input
experiment_info
marker_info
either design
or formula
(depending on the differential testing
method used)
contrast
analysis_type
Minimum required arguments when providing a CATALYST
daFrame
object:
d_input
either design
or formula
(depending on the differential testing
method used)
contrast
analysis_type
Returns a list containing the results object res
, as well as the data
objects d_se
, d_counts
, d_medians
,
d_medians_by_cluster_marker
, and d_medians_by_sample_marker
. (If a
CATALYST
daFrame
object was used as input, the output list contains
objects res
, d_counts
, and d_medians
.)
# Function to create random data (one sample) fcs_sim <- function(n = 2000, mean = 0, sd = 1, ncol = 10, cofactor = 5) { d <- matrix(sinh(rnorm(n*ncol, mean, sd)) * cofactor,ncol=ncol) for(i in seq_len(ncol)){ d[seq(n/ncol*(i-1)+1,n/ncol*(i)),i] <- sinh(rnorm(n/ncol, mean+5, sd)) * cofactor } colnames(d) <- paste0("marker", sprintf("%02d", 1:ncol)) d } # Create random data (without differential signal) set.seed(123) d_input <- lapply(1:50, function(i) fcs_sim()) # simulate survival time d_surv <- simulate_singlecluster(50, formula(Y~Surv(X,I)))[c("X","I","TrVal")] # Add differential abundance (DA) signal for(i in 1:50){ # number of cells in cluster 1 n_da <- round(sqrt(2000*d_surv$TrVal[i]))*9 # set to no expression tmpd <- matrix(sinh(rnorm(n_da*10, 0, 1)) * 5, ncol=10) # increase expresion for cluster 1 tmpd[ ,1] <- sinh(rnorm(n_da, 5, 1)) * 5 d_input[[i]][seq_len(n_da), ] <- tmpd } experiment_info <- data.frame( sample_id = factor(paste0("sample", 1:50)), survival_time = d_surv$X, event_indicator= d_surv$I, stringsAsFactors = FALSE ) marker_info <- data.frame( channel_name = paste0("channel", sprintf("%03d", 1:10)), marker_name = paste0("marker", sprintf("%02d", 1:10)), marker_class = factor(c(rep("type", 10)), levels = c("type", "state", "none")), stringsAsFactors = FALSE ) # Create formula da_formula <- createFormula(experiment_info, cols_fixed="survival_time", cols_random = "sample_id",event_indicator = "event_indicator") # Create contrast matrix contrast <- diffcyt::createContrast(c(0, 1)) # Test for differential abundance (DA) of clusters out_DA <- censcyt(d_input, experiment_info, marker_info, formula = da_formula, contrast = contrast, analysis_type = "DA", method_DA = "censcyt-DA-censored-GLMM", seed_clustering = 123, verbose = FALSE, mi_reps = 3, BPPARAM=BiocParallel::MulticoreParam(workers = 1), imputation_method = "mrl",meta_clustering = TRUE, meta_k = 10) # Display results for top DA clusters diffcyt::topTable(out_DA, format_vals = TRUE) # Plot heatmap for DA tests diffcyt::plotHeatmap(out_DA, analysis_type = "DA")
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.