Home

/

GitHub

/

retogerber/censcyt

/

censcyt: Run 'censcyt' pipeline

censcyt: Run 'censcyt' pipeline
In retogerber/censcyt: Differential abundance analysis with a right censored covariate in high-dimensional cytometry

View source: R/censcyt_wrapper.R

censcyt

R Documentation

Run `censcyt` pipeline

Description

Wrapper function to run complete censcyt pipeline

Usage

censcyt(
  d_input,
  experiment_info = NULL,
  marker_info = NULL,
  design = NULL,
  formula = NULL,
  contrast,
  analysis_type = c("DA"),
  method_DA = c("censcyt-DA-censored-GLMM"),
  markers_to_test = NULL,
  clustering_to_use = NULL,
  cols_to_include = NULL,
  subsampling = FALSE,
  n_sub = NULL,
  seed_sub = NULL,
  transform = TRUE,
  cofactor = 5,
  cols_clustering = NULL,
  xdim = 10,
  ydim = 10,
  meta_clustering = FALSE,
  meta_k = 40,
  seed_clustering = NULL,
  min_cells = 3,
  min_samples = NULL,
  normalize = FALSE,
  norm_factors = "TMM",
  verbose = TRUE,
  mi_reps = 10,
  imputation_method = c("km", "km_exp", "km_wei", "km_os", "rs", "mrl", "cc", "pmm"),
  BPPARAM = BiocParallel::SerialParam()
)

Arguments

`d_input`	Input data. Must be either: (i) a `flowSet-class` or list of `flowFrame-classs`, `DataFrames`, `data.frames`, or matrices as input (one `flowFrame` or list item per sample) (see `prepareData`); or (ii) a `CATALYST` `daFrame` (containing cluster labels in `rowData`; see vignette for an example).
`experiment_info`	`data.frame`, `DataFrame`, or `{tbl_df}` of experiment information, for example sample IDs and group IDs. Must contain a column named `sample_id`. See `prepareData`. (Not required when providing a `CATALYST` `daFrame` for `d_input`.)
`marker_info`	`data.frame`, `DataFrame`, or `tbl_df` of marker information for each column of data. This should contain columns named `marker_name` and `marker_class`. The columns contain: (i) marker names (and any other column names); and (ii) a factor indicating the marker class for each column (with entries `"type"`, `"state"`, or `"none"`). See `prepareData`. (Not required when providing a `CATALYST` `daFrame` for `d_input`.)
`design`	Design matrix, created with `createDesignMatrix`. See `createDesignMatrix`.
`formula`	Model formula object, created with `createFormula`. See `createFormula`.
`contrast`	Contrast matrix, created with `createContrast`. See `createContrast`.
`analysis_type`	Type of differential analysis to perform: differential abundance (DA) of cell populations. The only option at the moment is `"DA"`. See `testDA_censoredGLMM`.
`method_DA`	Method to use for calculating differential abundance (DA) tests. Currently the only option is `testDA_censoredGLMM`. Default = `testDA_censoredGLMM`.
`markers_to_test`	(Optional) Logical vector specifying which markers to test for differential expression (from the set of markers stored in the `assays` of `d_medians`; for method `testDS_limma` or `testDS_LMM`). Default = all 'cell state' markers, which are identified by the logical vector `id_state_markers` stored in the meta-data of `d_medians`. See `testDS_limma` or `testDS_LMM`.
`clustering_to_use`	(Optional) Column name indicating which set of cluster labels to use for differential testing, when input data are provided as a `CATALYST` `daFrame` object containing multiple sets of cluster labels. (In this case, the `metadata` of the `daFrame` object is assumed to contain a data frame named `cluster_codes`; `clustering_to_use` is the column name of the selected column in `cluster_codes`. If `clustering_to_use` is provided, an identifier `clustering_name` to identify this column will also be saved in the `metadata` of the output object.) Default = NULL, in which case cluster labels stored in column named `cluster_id` in the `rowData` of the `daFrame` object are used.
`cols_to_include`	Logical vector indicating which columns to include from the input data. Default = all columns. See `prepareData`.
`subsampling`	Whether to use random subsampling to select an equal number of cells from each sample. Default = FALSE. See `prepareData`.
`n_sub`	Number of cells to select from each sample by random subsampling, if `subsampling = TRUE`. Default = number of cells in smallest sample. See `prepareData`.
`seed_sub`	Random seed for subsampling. Set to an integer value to generate reproducible results. Default = `NULL`. See `prepareData`.
`transform`	Whether to apply 'arcsinh' transform. This may be set to FALSE if the input data has already been transformed. Default = TRUE. See `transformData`.
`cofactor`	Cofactor parameter for 'arcsinh' transform. Default = 5, which is appropriate for mass cytometry (CyTOF) data. For fluorescence flow cytometry, cofactor = 150 is recommended instead. See `transformData`.
`cols_clustering`	Columns to use for clustering. Default = `NULL`, in which case markers identified as 'cell type' markers (with entries `"type"`) in the vector `marker_class` in the column meta-data of `d_se` will be used. See `generateClusters`.
`xdim`	Horizontal length of grid for self-organizing map for FlowSOM clustering (number of clusters = `xdim` * `ydim`). Default = 10 (i.e. 100 clusters). See `generateClusters`.
`ydim`	Vertical length of grid for self-organizing map for FlowSOM clustering (number of clusters = `xdim` * `ydim`). Default = 10 (i.e. 100 clusters). See `generateClusters`.
`meta_clustering`	Whether to include FlowSOM 'meta-clustering' step. Default = `FALSE`. See `generateClusters`.
`meta_k`	Number of meta-clusters for FlowSOM, if `meta-clustering = TRUE`. Default = 40. See `generateClusters`.
`seed_clustering`	Random seed for clustering. Set to an integer value to generate reproducible results. Default = `NULL`. See `generateClusters`.
`min_cells`	Filtering parameter. Default = 3. Clusters are kept for differential testing if they have at least `min_cells` cells in at least `min_samples` samples. See `testDA_censoredGLMM`.
`min_samples`	Filtering parameter. Default = `number of samples / 2`, which is appropriate for two-group comparisons (of equal size). Clusters are kept for differential testing if they have at least `min_cells` cells in at least `min_samples` samples. See `testDA_censoredGLMM`.
`normalize`	Whether to include optional normalization factors to adjust for composition effects. Default = FALSE. See `testDA_censoredGLMM`.
`norm_factors`	Normalization factors to use, if `normalize = TRUE`. Default = `"TMM"`, in which case normalization factors are calculated automatically using the 'trimmed mean of M-values' (TMM) method from the `edgeR` package. Alternatively, a vector of values can be provided (the values should multiply to 1). See `testDA_censoredGLMM`.
`verbose`	Whether to print status messages during each step of the pipeline. Default = TRUE.
`mi_reps`	Number of imputations in multiple imputation. Default = 10. See `testDA_censoredGLMM`.
`imputation_method`	Method to be used in the imputation step. One of `km`,`km_exp`,`km_wei`,`km_os`, `rs`, `mrl`, `cc`, `pmm`. See `testDA_censoredGLMM`.
`BPPARAM`	Specification of parallelization option as one of `BiocParallelParam` if `BiocParallel` is available otherwise no parallelization is used. e.g. `MulticoreParam-class`(workers=2) for parallelization with two cores. Default is `SerialParam-class`() (no parallelization).

Details

This wrapper function runs the complete diffcyt analysis pipeline where the only difference is the analysis step which uses the functions from censcyt (which is currently only testDA_censoredGLMM).

For more details about the functions for the individual steps, see diffcyt, the diffcyt vignette, the censcyt package vignette and the function help pages. The following is a slightly adapted summary from diffcyt:

Running the individual functions may provide additional flexibility, especially for complex analyses.

The input data can be provided as a flowSet-class or a list of flowFrame-classs, DataFrames, data.frames, or matrices (one flowFrame or list item per sample). Alternatively, it is also possible to provide the input as a daFrame object from the CATALYST Bioconductor package (Chevrier, Crowell, Zanotelli et al., 2018). This can be useful when initial exploratory analyses and clustering have been performed using CATALYST; the daFrame object from CATALYST (containing cluster labels in the rowData) can then be provided directly to the censcyt functions for differential testing.

Minimum required arguments when not providing a flowSet-class or list of flowFrame-classs, DataFrames, data.frames, or matrices:

d_input
experiment_info
marker_info
either design or formula (depending on the differential testing method used)
contrast
analysis_type

Minimum required arguments when providing a CATALYST daFrame object:

d_input
either design or formula (depending on the differential testing method used)
contrast
analysis_type

Value

Returns a list containing the results object res, as well as the data objects d_se, d_counts, d_medians, d_medians_by_cluster_marker, and d_medians_by_sample_marker. (If a CATALYST daFrame object was used as input, the output list contains objects res, d_counts, and d_medians.)

Examples

# Function to create random data (one sample)
fcs_sim <- function(n = 2000, mean = 0, sd = 1, ncol = 10, cofactor = 5) {
  d <- matrix(sinh(rnorm(n*ncol, mean, sd)) * cofactor,ncol=ncol)
  for(i in seq_len(ncol)){
    d[seq(n/ncol*(i-1)+1,n/ncol*(i)),i] <- sinh(rnorm(n/ncol, mean+5, sd)) * cofactor
  }
  colnames(d) <- paste0("marker", sprintf("%02d", 1:ncol))
  d
}

# Create random data (without differential signal)
set.seed(123)
d_input <- lapply(1:50, function(i) fcs_sim())

# simulate survival time
d_surv <- simulate_singlecluster(50, formula(Y~Surv(X,I)))[c("X","I","TrVal")]

# Add differential abundance (DA) signal
for(i in 1:50){
  # number of cells in cluster 1 
  n_da <- round(sqrt(2000*d_surv$TrVal[i]))*9
  # set to no expression
  tmpd <- matrix(sinh(rnorm(n_da*10, 0, 1)) * 5, ncol=10)
  # increase expresion for cluster 1
  tmpd[ ,1] <- sinh(rnorm(n_da, 5, 1)) * 5
  d_input[[i]][seq_len(n_da), ] <- tmpd
}

experiment_info <- data.frame(
  sample_id = factor(paste0("sample", 1:50)),
  survival_time = d_surv$X,
  event_indicator= d_surv$I,
  stringsAsFactors = FALSE
)

marker_info <- data.frame(
  channel_name = paste0("channel", sprintf("%03d", 1:10)),
  marker_name = paste0("marker", sprintf("%02d", 1:10)),
  marker_class = factor(c(rep("type", 10)),
                        levels = c("type", "state", "none")),
  stringsAsFactors = FALSE
)

# Create formula
da_formula <- createFormula(experiment_info, cols_fixed="survival_time",
                            cols_random = "sample_id",event_indicator = "event_indicator")
# Create contrast matrix
contrast <- diffcyt::createContrast(c(0, 1))

# Test for differential abundance (DA) of clusters
out_DA <- censcyt(d_input, experiment_info, marker_info,
                  formula = da_formula, contrast = contrast,
                  analysis_type = "DA", method_DA = "censcyt-DA-censored-GLMM",
                  seed_clustering = 123, verbose = FALSE, mi_reps = 3,
                  BPPARAM=BiocParallel::MulticoreParam(workers = 1),
                  imputation_method = "mrl",meta_clustering = TRUE, meta_k = 10)

# Display results for top DA clusters
diffcyt::topTable(out_DA, format_vals = TRUE)


# Plot heatmap for DA tests
diffcyt::plotHeatmap(out_DA, analysis_type = "DA")

retogerber/censcyt documentation built on Feb. 7, 2023, 9:56 a.m.

retogerber/censcyt index

README.md

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

retogerber/censcyt
Differential abundance analysis with a right censored covariate in high-dimensional cytometry

censcyt: Run 'censcyt' pipeline
In retogerber/censcyt: Differential abundance analysis with a right censored covariate in high-dimensional cytometry

Run `censcyt` pipeline

Description

Usage

Arguments

Details

Value

Examples

Related to censcyt in retogerber/censcyt...

R Package Documentation

Browse R Packages

We want your feedback!

retogerber/censcyt Differential abundance analysis with a right censored covariate in high-dimensional cytometry

censcyt: Run 'censcyt' pipeline In retogerber/censcyt: Differential abundance analysis with a right censored covariate in high-dimensional cytometry

Run censcyt pipeline

Description

Usage

Arguments

Details

Value

Examples

Related to censcyt in retogerber/censcyt...

R Package Documentation

Browse R Packages

We want your feedback!

retogerber/censcyt
Differential abundance analysis with a right censored covariate in high-dimensional cytometry

censcyt: Run 'censcyt' pipeline
In retogerber/censcyt: Differential abundance analysis with a right censored covariate in high-dimensional cytometry

Run `censcyt` pipeline