View source: R/3_generateClusters.R
generateClusters | R Documentation |
Generate high-resolution clusters for diffcyt
analysis
generateClusters(
d_se,
cols_clustering = NULL,
xdim = 10,
ydim = 10,
meta_clustering = FALSE,
meta_k = 40,
seed_clustering = NULL,
...
)
d_se |
Transformed input data, from |
cols_clustering |
Columns to use for clustering. Default = |
xdim |
Horizontal length of grid for self-organizing map for FlowSOM clustering
(number of clusters = |
ydim |
Vertical length of grid for self-organizing map for FlowSOM clustering
(number of clusters = |
meta_clustering |
Whether to include FlowSOM 'meta-clustering' step. Default =
|
meta_k |
Number of meta-clusters for FlowSOM, if |
seed_clustering |
Random seed for clustering. Set to an integer value to generate
reproducible results. Default = |
... |
Other parameters to pass to the FlowSOM clustering algorithm (through the
function |
Performs clustering to group cells into clusters representing cell populations or subsets, which can then be further analyzed by testing for differential abundance of cell populations or differential states within cell populations. By default, we use high-resolution clustering or over-clustering (i.e. we generate a large number of small clusters), which helps ensure that rare populations are adequately separated from larger ones.
Data is assumed to be in the form of a SummarizedExperiment
object
generated with prepareData
and transformed with
transformData
.
The input data object d_se
is assumed to contain a vector marker_class
in
the column meta-data. This vector indicates the marker class for each column
("type"
, "state"
, or "none"
). By default, clustering is performed
using the 'cell type' markers only. For example, in immunological data, this may be the
lineage markers. The choice of cell type markers is an important design choice for the
user, and will depend on the underlying experimental design and research questions. It
may be made based on prior biological knowledge or using data-driven methods. For an
example of a data-driven method of marker ranking and selection, see Nowicka et al.
(2017), F1000Research.
By default, we use the FlowSOM
clustering algorithm (Van Gassen et al.
2015, Cytometry Part A, available from Bioconductor) to generate the clusters.
We previously showed that FlowSOM
gives very good clustering performance for
high-dimensional cytometry data, for both major and rare cell populations, and is
extremely fast (Weber and Robinson, 2016, Cytometry Part A).
The clustering is run at high resolution to give a large number of small clusters (i.e.
over-clustering). This is done by running only the initial 'self-organizing map'
clustering step in the FlowSOM
algorithm, i.e. without the final
'meta-clustering' step. This ensures that small or rare populations are adequately
separated from larger populations, which is crucial for detecting differential signals
for extremely rare populations.
The minimum spanning tree (MST) object from BuildMST
is stored in the
experiment metadata
slot in the SummarizedExperiment
object
d_se
, and can be accessed with metadata(d_se)$MST
.
d_se
: Returns the SummarizedExperiment
input object, with
cluster labels for each cell stored in an additional column of row meta-data. Row
meta-data can be accessed with rowData
. The minimum spanning tree (MST)
object is also stored in the metadata
slot, and can be accessed with
metadata(d_se)$MST
.
# For a complete workflow example demonstrating each step in the 'diffcyt' pipeline,
# see the package vignette.
# Function to create random data (one sample)
d_random <- function(n = 20000, mean = 0, sd = 1, ncol = 20, cofactor = 5) {
d <- sinh(matrix(rnorm(n, mean, sd), ncol = ncol)) * cofactor
colnames(d) <- paste0("marker", sprintf("%02d", 1:ncol))
d
}
# Create random data (without differential signal)
set.seed(123)
d_input <- list(
sample1 = d_random(),
sample2 = d_random(),
sample3 = d_random(),
sample4 = d_random()
)
experiment_info <- data.frame(
sample_id = factor(paste0("sample", 1:4)),
group_id = factor(c("group1", "group1", "group2", "group2")),
stringsAsFactors = FALSE
)
marker_info <- data.frame(
channel_name = paste0("channel", sprintf("%03d", 1:20)),
marker_name = paste0("marker", sprintf("%02d", 1:20)),
marker_class = factor(c(rep("type", 10), rep("state", 10)),
levels = c("type", "state", "none")),
stringsAsFactors = FALSE
)
# Prepare data
d_se <- prepareData(d_input, experiment_info, marker_info)
# Transform data
d_se <- transformData(d_se)
# Generate clusters
d_se <- generateClusters(d_se)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.