cluster_events: Clustering with trimming

Description Usage Arguments Value Additional parameters Default parameters to clara() Parameters to dbscan Parameters to mclust Parameters to density_cut See Also Examples

Description

Cluster identification with various algorithms and subsequent trimming of each cluster

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
bp_kmeans(df, .parameter, .column_name, .k, .trim = 0, .data = NULL, ...)

bp_clara(df, .parameter, .column_name, .k, .trim = 0, .data = NULL, ...)

bp_dbscan(
  df,
  .parameter,
  .column_name,
  .eps = 0.2,
  .MinPts = 50,
  .data = NULL,
  ...
)

bp_mclust(
  df,
  .parameter,
  .column_name,
  .k,
  .trim = 0,
  .sample_frac = 0.05,
  .max_subset = 500,
  .data = NULL,
  ...
)

bp_density_cut(df, .parameter, .column_name, .k, .trim = 0, .data = NULL, ...)

Arguments

df

A tidy data.frame.

.parameter

A character giving the name of column(s) where populations are identified.

.column_name

A character giving the name of the column to store the population information.

.k

Numeric giving the number of expected clusters, or a set of initial cluster centers.

.trim

A numeric between 0 and 1, giving the fraction of points to remove by marking them NA.

.data

Deprecated. Use df.

...

Additional arguments passed to appropriate methods, see below.

.eps

Reachability distance, see fpc::dbscan().

.MinPts

Reachability minimum no. of points, see fpc::dbscan().

.sample_frac

A numeric between 0 and 1 giving the fraction of points to use in initialisation of Mclust().

.max_subset

A numeric giving the maximum of events to use in initialisation of Mclust(), see below.

Value

The data.frame in df with the cluster classification added in the column given by .column_name.

Additional parameters

Information on additional arguments passed, can be found here:

clara

cluster::clara()

kmeans

kmeans()

dbscan

fpc::dbscan()

mclust

mclust::Mclust()

density_cut

approx_adjust()

Default parameters to clara()

cluster::clara() is by default called with the following parameters:

samples

100

pamLike

TRUE

Parameters to dbscan

It requires some trial and error to get the right parameters for the density based clustering, but the parameters usually stay stable throughout an entire experiment and over time (assuming that there is only little drift in the flow cytometer). There is no guarantee that the correct number of clusters are returned, and it might be better to use this on the forward - side scatter discrimination.

Scaling of the parameters seems to be appropriate in most cases for the forward - side scatter discrimination and is automatically performed.

Parameters to mclust

Mclust is is slow and memory hungry on large datasets. Using a subset of the data to initialise the clustering greatly improves the speed. I have found that a subset sample of 500 even works well and gives no markedly better clustering than a subset of 5000 events, but initialisation with 500 makes the clustering complete about 12 times faster than with 5000 events.

Parameters to density_cut

This simple function works by smoothing a density function until the desired number of clusters are found. The segregation of the clusters follows at the lowest point between two clusters.

See Also

trim_population(), identify_analyte().

Mclust and dbscan seems to do an excellent job at separating on the forward and side scatter parameters. Mclust and clara both perform well separating beads in the APC channel, but clara is about 3 times faster than Mclust.

Examples

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
library(beadplexr)
library(dplyr)
library(ggplot2)

data("lplex")

lplex[[1]] %>%
  # Speed things up a bit by selecting one fourth of the events.
  # Probably not something you'd usually do
  dplyr::sample_frac(0.25) %>%
  bp_kmeans(.parameter = c("FSC-A", "SSC-A"),
            .column_name = "population", .trim = 0.1, .k = 2) %>%
  ggplot() +
  aes(x = `FSC-A`, y = `SSC-A`, colour = population) +
  geom_point()

library(beadplexr)
library(dplyr)
library(ggplot2)

data("lplex")

lplex[[1]] %>%
  # Speed things up a bit by selecting one fourth of the events.
  # Probably not something you'd usually do
  dplyr::sample_frac(0.25) %>%
  bp_clara(.parameter = c("FSC-A", "SSC-A"),
           .column_name = "population", .trim = 0.1, .k = 2) %>%
  ggplot() +
  aes(x = `FSC-A`, y = `SSC-A`, colour = population) +
  geom_point()

lplex[[1]] %>%
  # Speed things up a bit by selecting one fourth of the events.
  # Probably not something you'd usually do
  dplyr::sample_frac(0.25) %>%
  bp_clara(.parameter = c("FSC-A", "SSC-A"),
           .column_name = "population", .trim = 0, .k = 2) %>%
  ggplot() +
  aes(x = `FSC-A`, y = `SSC-A`, colour = population) +
  geom_point()

## Not run: 
library(beadplexr)
library(dplyr)
library(ggplot2)

data("lplex")

lplex[[1]] %>%
  # Speed things up a bit by selecting one fourth of the events.
  # Probably not something you'd usually do
  dplyr::sample_frac(0.25) %>%
  bp_dbscan(.parameter = c("FSC-A", "SSC-A"), .column_name = "population",
            eps = 0.2, MinPts = 50) %>%
  ggplot() +
  aes(x = `FSC-A`, y = `SSC-A`, colour = population) +
  geom_point()

pop1 <- lplex[[1]] %>%
  # Speed things up a bit by selecting one fourth of the events.
  # Probably not something you'd usually do
  dplyr::sample_frac(0.25) %>%
  bp_dbscan(.parameter = c("FSC-A", "SSC-A"), .column_name = "population",
    eps = 0.2, MinPts = 50) %>%
  dplyr::filter(population == "1")

pop1 %>%
  bp_dbscan(.parameter = c("FL6-H", "FL2-H"), .column_name = "population",
    eps = 0.2, MinPts = 50) %>%
  .$population %>% unique

pop1 %>%
  bp_dbscan(.parameter = c("FL6-H", "FL2-H"), .column_name = "population",
    eps = 0.2, MinPts = 50, scale = FALSE) %>%
  .$population %>% unique

## End(Not run)
library(beadplexr)
library(magrittr)
library(ggplot2)

data("lplex")

lplex[[1]] %>%
  bp_mclust(.parameter = c("FSC-A", "SSC-A"),
           .column_name = "population", .trim = 0, .k = 2) %>%
  ggplot() +
  aes(x = `FSC-A`, y = `SSC-A`, colour = population) +
  geom_point()
library(beadplexr)
library(magrittr)
library(ggplot2)

data("lplex")

lplex[[1]] %>%
  bp_density_cut(.parameter = c("FSC-A"),
           .column_name = "population", .trim = 0, .k = 2) %>%
  ggplot() +
  aes(x = `FSC-A`, y = `SSC-A`, colour = population) +
  geom_point()

beadplexr documentation built on April 4, 2020, 5:07 p.m.