diffcyt: Run 'diffcyt' pipeline
In lmweber/diffcyt: Differential discovery in high-dimensional cytometry via high-resolution clustering

diffcyt

R Documentation

Run 'diffcyt' pipeline

Description

Wrapper function to run complete 'diffcyt' pipeline

Usage

diffcyt(
  d_input,
  experiment_info = NULL,
  marker_info = NULL,
  design = NULL,
  formula = NULL,
  contrast,
  analysis_type = c("DA", "DS"),
  method_DA = c("diffcyt-DA-edgeR", "diffcyt-DA-voom", "diffcyt-DA-GLMM"),
  method_DS = c("diffcyt-DS-limma", "diffcyt-DS-LMM"),
  markers_to_test = NULL,
  clustering_to_use = NULL,
  cols_to_include = NULL,
  subsampling = FALSE,
  n_sub = NULL,
  seed_sub = NULL,
  transform = TRUE,
  cofactor = 5,
  cols_clustering = NULL,
  xdim = 10,
  ydim = 10,
  meta_clustering = FALSE,
  meta_k = 40,
  seed_clustering = NULL,
  min_cells = 3,
  min_samples = NULL,
  normalize = FALSE,
  norm_factors = "TMM",
  trend_method = "none",
  block_id = NULL,
  trend = TRUE,
  weights = TRUE,
  plot = FALSE,
  path = ".",
  verbose = TRUE
)

Arguments

`d_input`	Input data. Must be either: (i) a `flowSet` or list of `flowFrames`, `DataFrames`, `data.frames`, or matrices as input (one `flowFrame` or list item per sample) (see `prepareData`); or (ii) a `CATALYST` `daFrame` (containing cluster labels in `rowData`; see vignette for an example).
`experiment_info`	`data.frame`, `DataFrame`, or `tbl_df` of experiment information, for example sample IDs and group IDs. Must contain a column named `sample_id`. See `prepareData`. (Not required when providing a `CATALYST` `daFrame` for `d_input`.)
`marker_info`	`data.frame`, `DataFrame`, or `tbl_df` of marker information for each column of data. This should contain columns named `marker_name` and `marker_class`. The columns contain: (i) marker names (and any other column names); and (ii) a factor indicating the marker class for each column (with entries `"type"`, `"state"`, or `"none"`). See `prepareData`. (Not required when providing a `CATALYST` `daFrame` for `d_input`.)
`design`	Design matrix, created with `createDesignMatrix`. See `createDesignMatrix`.
`formula`	Model formula object, created with `createFormula`. See `createFormula`.
`contrast`	Contrast matrix, created with `createContrast`. See `createContrast`.
`analysis_type`	Type of differential analysis to perform: differential abundance (DA) of cell populations, or differential states (DS) within cell populations. Options are `"DA"` and `"DS"`. See `testDA_edgeR`, `testDA_voom`, `testDA_GLMM`, `testDS_limma`, or `testDS_LMM`.
`method_DA`	Method to use for calculating differential abundance (DA) tests. Options are `"diffcyt-DA-edgeR"`, `"diffcyt-DA-voom"`, and `"diffcyt-DA-GLMM"`. Default = `"diffcyt-DA-edgeR"`. See `testDA_edgeR`, `testDA_voom`, or `testDA_GLMM`.
`method_DS`	Method to use for calculating differential state (DS) tests. Options are `"diffcyt-DS-limma"` and `"diffcyt-DS-LMM"`. Default = `"diffcyt-DS-limma"`. See `testDS_limma` or `testDS_LMM`.
`markers_to_test`	(Optional) Logical vector specifying which markers to test for differential expression (from the set of markers stored in the `assays` of `d_medians`; for method `testDS_limma` or `testDS_LMM`). Default = all 'cell state' markers, which are identified by the logical vector `id_state_markers` stored in the meta-data of `d_medians`. See `testDS_limma` or `testDS_LMM`.
`clustering_to_use`	(Optional) Column name indicating which set of cluster labels to use for differential testing, when input data are provided as a `CATALYST` `daFrame` object containing multiple sets of cluster labels. (In this case, the `metadata` of the `daFrame` object is assumed to contain a data frame named `cluster_codes`; `clustering_to_use` is the column name of the selected column in `cluster_codes`. If `clustering_to_use` is provided, an identifier `clustering_name` to identify this column will also be saved in the `metadata` of the output object.) Default = NULL, in which case cluster labels stored in column named `cluster_id` in the `rowData` of the `daFrame` object are used.
`cols_to_include`	Logical vector indicating which columns to include from the input data. Default = all columns. See `prepareData`.
`subsampling`	Whether to use random subsampling to select an equal number of cells from each sample. Default = FALSE. See `prepareData`.
`n_sub`	Number of cells to select from each sample by random subsampling, if `subsampling = TRUE`. Default = number of cells in smallest sample. See `prepareData`.
`seed_sub`	Random seed for subsampling. Set to an integer value to generate reproducible results. Default = `NULL`. See `prepareData`.
`transform`	Whether to apply 'arcsinh' transform. This may be set to FALSE if the input data has already been transformed. Default = TRUE. See `transformData`.
`cofactor`	Cofactor parameter for 'arcsinh' transform. Default = 5, which is appropriate for mass cytometry (CyTOF) data. For fluorescence flow cytometry, we recommend cofactor = 150 instead. See `transformData`.
`cols_clustering`	Columns to use for clustering. Default = `NULL`, in which case markers identified as 'cell type' markers (with entries `"type"`) in the vector `marker_class` in the column meta-data of `d_se` will be used. See `generateClusters`.
`xdim`	Horizontal length of grid for self-organizing map for FlowSOM clustering (number of clusters = `xdim` * `ydim`). Default = 10 (i.e. 100 clusters). See `generateClusters`.
`ydim`	Vertical length of grid for self-organizing map for FlowSOM clustering (number of clusters = `xdim` * `ydim`). Default = 10 (i.e. 100 clusters). See `generateClusters`.
`meta_clustering`	Whether to include FlowSOM 'meta-clustering' step. Default = `FALSE`. See `generateClusters`.
`meta_k`	Number of meta-clusters for FlowSOM, if `meta-clustering = TRUE`. Default = 40. See `generateClusters`.
`seed_clustering`	Random seed for clustering. Set to an integer value to generate reproducible results. Default = `NULL`. See `generateClusters`.
`min_cells`	Filtering parameter. Default = 3. Clusters are kept for differential testing if they have at least `min_cells` cells in at least `min_samples` samples. See `testDA_edgeR`, `testDA_voom`, `testDA_GLMM`, `testDS_limma`, or `testDS_LMM`.
`min_samples`	Filtering parameter. Default = `number of samples / 2`, which is appropriate for two-group comparisons (of equal size). Clusters are kept for differential testing if they have at least `min_cells` cells in at least `min_samples` samples. See `testDA_edgeR`, `testDA_voom`, `testDA_GLMM`, `testDS_limma`, or `testDS_LMM`.
`normalize`	Whether to include optional normalization factors to adjust for composition effects. Default = FALSE. See `testDA_edgeR`, `testDA_voom`, or `testDA_GLMM`.
`norm_factors`	Normalization factors to use, if `normalize = TRUE`. Default = `"TMM"`, in which case normalization factors are calculated automatically using the 'trimmed mean of M-values' (TMM) method from the `edgeR` package. Alternatively, a vector of values can be provided (the values should multiply to 1). See `testDA_edgeR`, `testDA_voom`, or `testDA_GLMM`.
`trend_method`	Method for estimating dispersion trend; passed to function `estimateDisp` from `edgeR` package (for method `testDA_edgeR`). Default = "none". (See `estimateDisp` help file from `edgeR` package for other options.) See `testDA_edgeR`.
`block_id`	(Optional) Vector or factor of block IDs (e.g. patient IDs) for paired experimental designs, to be included as random effects (for method `testDA_voom` or `testDS_limma`). If provided, the block IDs will be included as random effects using the `limma` `duplicateCorrelation` methodology. Alternatively, block IDs can be included as fixed effects in the design matrix (`createDesignMatrix`). See `testDA_voom` or `testDS_limma`.
`trend`	(Optional) Whether to fit a mean-variance trend when calculating moderated tests with function `eBayes` from `limma` package (for method `testDS_limma`). When `trend = TRUE`, this is known as the `limma-trend` method (Law et al., 2014; Phipson et al., 2016). Default = TRUE. See `testDS_limma`.
`weights`	(Optional) Whether to include precision weights (for method `testDS_limma` or `testDS_LMM`). For method `testDS_limma`, cluster cell counts will be used as precision weights (across all samples and clusters); this allows the `limma` model fitting functions to account for uncertainty due to the total number of cells per sample (library sizes) and total number of cells per cluster. For methods `testDS_LMM`, cluster cell counts will be used as precision weights within each model (across samples, i.e. within the model for each cluster); these represent the relative uncertainty in calculating each median value (within each model). Default = TRUE. See `testDS_limma` or `testDS_LMM`.
`plot`	Whether to save diagnostic plots (for method `testDA_voom` or `testDS_limma`). Default = FALSE. See `testDA_voom` or `testDS_limma`.
`path`	Path for diagnostic plots, if `plot = TRUE` (for method `testDA_voom` or `testDS_limma`). Default = current working directory. See `testDA_voom` or `testDS_limma`.
`verbose`	Whether to print status messages during each step of the pipeline. Default = TRUE.

Details

This wrapper function runs the complete 'diffcyt' analysis pipeline, by calling the functions for the individual steps in the pipeline in the correct sequence.

For more details about the functions for the individual steps, see the package vignette and the function help pages. Running the individual functions may provide additional flexibility, especially for complex analyses.

The input data can be provided as a flowSet or a list of flowFrames, DataFrames, data.frames, or matrices (one flowFrame or list item per sample). Alternatively, it is also possible to provide the input as a daFrame object from the CATALYST Bioconductor package (Chevrier, Crowell, Zanotelli et al., 2018). This can be useful when initial exploratory analyses and clustering have been performed using CATALYST; the daFrame object from CATALYST (containing cluster labels in the rowData) can then be provided directly to the diffcyt functions for differential testing.

Minimum required arguments when not providing a flowSet or list of flowFrames, DataFrames, data.frames, or matrices:

d_input
experiment_info
marker_info
either design or formula (depending on the differential testing method used)
contrast
analysis_type

Minimum required arguments when providing a CATALYST daFrame object:

d_input
either design or formula (depending on the differential testing method used)
contrast
analysis_type

Value

Returns a list containing the results object res, as well as the data objects d_se, d_counts, d_medians, d_medians_by_cluster_marker, and d_medians_by_sample_marker. (If a CATALYST daFrame object was used as input, the output list contains objects res, d_counts, and d_medians.) The structure of res depends on the differential testing method used. See testDA_edgeR, testDA_voom, testDA_GLMM, testDS_limma, or testDS_LMM.

Examples

# For a complete workflow example demonstrating each step in the 'diffcyt' pipeline, 
# see the package vignette.

# Function to create random data (one sample)
d_random <- function(n = 20000, mean = 0, sd = 1, ncol = 20, cofactor = 5) {
  d <- sinh(matrix(rnorm(n, mean, sd), ncol = ncol)) * cofactor
  colnames(d) <- paste0("marker", sprintf("%02d", 1:ncol))
  d
}

# Create random data (without differential signal)
set.seed(123)
d_input <- list(
  sample1 = d_random(), 
  sample2 = d_random(), 
  sample3 = d_random(), 
  sample4 = d_random()
)

# Add differential abundance (DA) signal
ix_DA <- 801:900
ix_cols_type <- 1:10
d_input[[3]][ix_DA, ix_cols_type] <- d_random(n = 1000, mean = 2, ncol = 10)
d_input[[4]][ix_DA, ix_cols_type] <- d_random(n = 1000, mean = 2, ncol = 10)

# Add differential states (DS) signal
ix_DS <- 901:1000
ix_cols_DS <- 19:20
d_input[[1]][ix_DS, ix_cols_type] <- d_random(n = 1000, mean = 3, ncol = 10)
d_input[[2]][ix_DS, ix_cols_type] <- d_random(n = 1000, mean = 3, ncol = 10)
d_input[[3]][ix_DS, c(ix_cols_type, ix_cols_DS)] <- d_random(n = 1200, mean = 3, ncol = 12)
d_input[[4]][ix_DS, c(ix_cols_type, ix_cols_DS)] <- d_random(n = 1200, mean = 3, ncol = 12)

experiment_info <- data.frame(
  sample_id = factor(paste0("sample", 1:4)), 
  group_id = factor(c("group1", "group1", "group2", "group2")), 
  stringsAsFactors = FALSE
)

marker_info <- data.frame(
  channel_name = paste0("channel", sprintf("%03d", 1:20)), 
  marker_name = paste0("marker", sprintf("%02d", 1:20)), 
  marker_class = factor(c(rep("type", 10), rep("state", 10)), 
                        levels = c("type", "state", "none")), 
  stringsAsFactors = FALSE
)

# Create design matrix
design <- createDesignMatrix(experiment_info, cols_design = "group_id")

# Create contrast matrix
contrast <- createContrast(c(0, 1))

# Test for differential abundance (DA) of clusters (using default method 'diffcyt-DA-edgeR')
out_DA <- diffcyt(d_input, experiment_info, marker_info, 
                  design = design, contrast = contrast, 
                  analysis_type = "DA", method_DA = "diffcyt-DA-edgeR", 
                  seed_clustering = 123, verbose = FALSE)

# Test for differential states (DS) within clusters (using default method 'diffcyt-DS-limma')
out_DS <- diffcyt(d_input, experiment_info, marker_info, 
                  design = design, contrast = contrast, 
                  analysis_type = "DS", method_DS = "diffcyt-DS-limma", 
                  seed_clustering = 123, verbose = FALSE)

# Display results for top DA clusters
topTable(out_DA, format_vals = TRUE)

# Display results for top DS cluster-marker combinations
topTable(out_DS, format_vals = TRUE)

# Plot heatmap for DA tests
plotHeatmap(out_DA, analysis_type = "DA")

# Plot heatmap for DS tests
plotHeatmap(out_DS, analysis_type = "DS")

lmweber/diffcyt documentation built on Feb. 10, 2025, 5:15 p.m.

lmweber/diffcyt index

README.md

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

lmweber/diffcyt
Differential discovery in high-dimensional cytometry via high-resolution clustering

diffcyt: Run 'diffcyt' pipeline
In lmweber/diffcyt: Differential discovery in high-dimensional cytometry via high-resolution clustering

Run 'diffcyt' pipeline

Description

Usage

Arguments

Details

Value

Examples

Related to diffcyt in lmweber/diffcyt...

R Package Documentation

Browse R Packages

We want your feedback!

lmweber/diffcyt Differential discovery in high-dimensional cytometry via high-resolution clustering

diffcyt: Run 'diffcyt' pipeline In lmweber/diffcyt: Differential discovery in high-dimensional cytometry via high-resolution clustering

Run 'diffcyt' pipeline

Description

Usage

Arguments

Details

Value

Examples

Related to diffcyt in lmweber/diffcyt...

R Package Documentation

Browse R Packages

We want your feedback!

lmweber/diffcyt
Differential discovery in high-dimensional cytometry via high-resolution clustering

diffcyt: Run 'diffcyt' pipeline
In lmweber/diffcyt: Differential discovery in high-dimensional cytometry via high-resolution clustering