View source: R/2_transformData.R
transformData | R Documentation |
Transform data prior to clustering
transformData(d_se, cofactor = 5)
d_se |
Input data. Assumed to be in the form of a
|
cofactor |
Cofactor parameter for 'arcsinh' transform. Default = 5, which is appropriate for mass cytometry (CyTOF) data. For fluorescence flow cytometry, we recommend cofactor = 150 instead. |
Flow and mass cytometry data should be transformed prior to clustering. The raw data follows an approximately log-normal distribution. Transforming with a log (or similar) function brings the data closer to a normal distribution, which improves clustering performance and allows positive and negative populations to be distinguished more clearly.
This function implements an inverse hyperbolic sine ('arcsinh') transform with adjustable 'cofactor' parameter. The arcsinh transform is widely used for CyTOF data. It behaves similarly to a log transform at high values, but is approximately linear near zero; so unlike the log, it can handle zeros or small negative values. The cofactor parameter controls the width of the linear region. Zero values and small negatives occur in CyTOF data when no ions are detected in a given channel (negatives are due to background subtraction and randomization of integer count values, which are performed by default by the CyTOF instrument software).
Recommended values for the cofactor parameter are 5 (mass cytometry, CyTOF) or 150 (fluorescence flow cytometry); see Bendall et al. (2011), Science, Supplementary Figure S2.
The transform should be applied to protein marker columns only. The
SummarizedExperiment
object created in the previous step
(prepareData
) is assumed to contain a factor marker_class
in the
column meta-data, where entries "none"
indicate non-marker columns. (If this is
not available, all columns will be transformed instead.)
d_se
: Data with transform applied to protein marker columns.
# For a complete workflow example demonstrating each step in the 'diffcyt' pipeline,
# see the package vignette.
# Function to create random data (one sample)
d_random <- function(n = 20000, mean = 0, sd = 1, ncol = 20, cofactor = 5) {
d <- sinh(matrix(rnorm(n, mean, sd), ncol = ncol)) * cofactor
colnames(d) <- paste0("marker", sprintf("%02d", 1:ncol))
d
}
# Create random data (without differential signal)
set.seed(123)
d_input <- list(
sample1 = d_random(),
sample2 = d_random(),
sample3 = d_random(),
sample4 = d_random()
)
experiment_info <- data.frame(
sample_id = factor(paste0("sample", 1:4)),
group_id = factor(c("group1", "group1", "group2", "group2")),
stringsAsFactors = FALSE
)
marker_info <- data.frame(
channel_name = paste0("channel", sprintf("%03d", 1:20)),
marker_name = paste0("marker", sprintf("%02d", 1:20)),
marker_class = factor(c(rep("type", 10), rep("state", 10)),
levels = c("type", "state", "none")),
stringsAsFactors = FALSE
)
# Prepare data
d_se <- prepareData(d_input, experiment_info, marker_info)
# Transform data
d_se <- transformData(d_se)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.