View source: R/3_generateClusters.R
| generateClusters | R Documentation |
Generate high-resolution clusters for diffcyt analysis
generateClusters(
d_se,
cols_clustering = NULL,
xdim = 10,
ydim = 10,
meta_clustering = FALSE,
meta_k = 40,
seed_clustering = NULL,
...
)
d_se |
Transformed input data, from |
cols_clustering |
Columns to use for clustering. Default = |
xdim |
Horizontal length of grid for self-organizing map for FlowSOM clustering
(number of clusters = |
ydim |
Vertical length of grid for self-organizing map for FlowSOM clustering
(number of clusters = |
meta_clustering |
Whether to include FlowSOM 'meta-clustering' step. Default =
|
meta_k |
Number of meta-clusters for FlowSOM, if |
seed_clustering |
Random seed for clustering. Set to an integer value to generate
reproducible results. Default = |
... |
Other parameters to pass to the FlowSOM clustering algorithm (through the
function |
Performs clustering to group cells into clusters representing cell populations or subsets, which can then be further analyzed by testing for differential abundance of cell populations or differential states within cell populations. By default, we use high-resolution clustering or over-clustering (i.e. we generate a large number of small clusters), which helps ensure that rare populations are adequately separated from larger ones.
Data is assumed to be in the form of a SummarizedExperiment object
generated with prepareData and transformed with
transformData.
The input data object d_se is assumed to contain a vector marker_class in
the column meta-data. This vector indicates the marker class for each column
("type", "state", or "none"). By default, clustering is performed
using the 'cell type' markers only. For example, in immunological data, this may be the
lineage markers. The choice of cell type markers is an important design choice for the
user, and will depend on the underlying experimental design and research questions. It
may be made based on prior biological knowledge or using data-driven methods. For an
example of a data-driven method of marker ranking and selection, see Nowicka et al.
(2017), F1000Research.
By default, we use the FlowSOM clustering algorithm (Van Gassen et al.
2015, Cytometry Part A, available from Bioconductor) to generate the clusters.
We previously showed that FlowSOM gives very good clustering performance for
high-dimensional cytometry data, for both major and rare cell populations, and is
extremely fast (Weber and Robinson, 2016, Cytometry Part A).
The clustering is run at high resolution to give a large number of small clusters (i.e.
over-clustering). This is done by running only the initial 'self-organizing map'
clustering step in the FlowSOM algorithm, i.e. without the final
'meta-clustering' step. This ensures that small or rare populations are adequately
separated from larger populations, which is crucial for detecting differential signals
for extremely rare populations.
The minimum spanning tree (MST) object from BuildMST is stored in the
experiment metadata slot in the SummarizedExperiment object
d_se, and can be accessed with metadata(d_se)$MST.
d_se: Returns the SummarizedExperiment input object, with
cluster labels for each cell stored in an additional column of row meta-data. Row
meta-data can be accessed with rowData. The minimum spanning tree (MST)
object is also stored in the metadata slot, and can be accessed with
metadata(d_se)$MST.
# For a complete workflow example demonstrating each step in the 'diffcyt' pipeline,
# see the package vignette.
# Function to create random data (one sample)
d_random <- function(n = 20000, mean = 0, sd = 1, ncol = 20, cofactor = 5) {
d <- sinh(matrix(rnorm(n, mean, sd), ncol = ncol)) * cofactor
colnames(d) <- paste0("marker", sprintf("%02d", 1:ncol))
d
}
# Create random data (without differential signal)
set.seed(123)
d_input <- list(
sample1 = d_random(),
sample2 = d_random(),
sample3 = d_random(),
sample4 = d_random()
)
experiment_info <- data.frame(
sample_id = factor(paste0("sample", 1:4)),
group_id = factor(c("group1", "group1", "group2", "group2")),
stringsAsFactors = FALSE
)
marker_info <- data.frame(
channel_name = paste0("channel", sprintf("%03d", 1:20)),
marker_name = paste0("marker", sprintf("%02d", 1:20)),
marker_class = factor(c(rep("type", 10), rep("state", 10)),
levels = c("type", "state", "none")),
stringsAsFactors = FALSE
)
# Prepare data
d_se <- prepareData(d_input, experiment_info, marker_info)
# Transform data
d_se <- transformData(d_se)
# Generate clusters
d_se <- generateClusters(d_se)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.