optimalFlowTemplates: optimalFlowTemplates

Description Usage Arguments Value References Examples

View source: R/optimalFlowTemplates.R

Description

Returns a partition of the input clusterings with a respective consensus clustering for every group.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
optimalFlowTemplates(
  database,
  database.names = NULL,
  cov.estimation = "standard",
  alpha.cov = 0.85,
  equal.weights.template = TRUE,
  hclust.method = "complete",
  trimm.template = FALSE,
  templates.number = NA,
  minPts = 2,
  eps = 1,
  consensus.method = "pooling",
  barycenters.number = NA,
  bar.repetitions = 40,
  alpha.bar = 0.05,
  bar.ini.method = "plus-plus",
  consensus.minPts = 3,
  cl.paral = 1
)

Arguments

database

A list where each entry is a partition (clustering) represented as dataframe, of the same dimensions, where the last variable represents the labels of the partition.

database.names

Names of the elements in the database.

cov.estimation

How to estimate covariance matrices in each cluster of a partition. 'standard' is for using cov(), while 'robust' is for using robustbase::covMcd.

alpha.cov

Only when cov.estimation = 'robust'. Indicates the value of alpha in robustbase::covMcd.

equal.weights.template

If True, weights assigned to every cluster in a partion are uniform (1/number of clusters). If False, weights assigned to clusters are the proportions of points in every cluster compared to the total amount of points in the partition.

hclust.method

Indicates what kind of hierarchical clustering to do with the similarity distances matrix of the partitions. Takes values in c('complete', 'single', 'average', 'hdbscan', 'dbscan').

trimm.template

Logical value. Indicates if it is allowed to not take into account some of the entries of database. Default is False.

templates.number

Only if hclust.method in c('complete', 'single', 'average'). Indicates the number of clusters to use with cutree. If set to NA (default), plots the hierarchical tree and asks the user to introduce an appropriate number of clusters.

minPts

Only if hclust.method in c('hdbscan', 'dbscan'). Indicates the value of argument minPts in dbscan::dbscan and dbscan::hdbscan.

eps

Only if hclust.method = 'dbscan'. Indicates the value of eps in dbscan::dbscan.

consensus.method

Sets the way of doing consensus clustering when clusters are viewed as Multivariate Distributions. Can take values in c('pooling', 'k-barycenter', 'hierarchical'). See details.

barycenters.number

Only if consensus.method = 'k-barycenter'. Sets the number, k, of barycenters when using k-barycenters.

bar.repetitions

Only if consensus.method = 'k-barycenter'. How many times to repeat the k-barycenters procedure. Equivalent to nstart in kmeans.

alpha.bar

Only if consensus.method = 'k-barycenter'. The level of trimming allowed during the k-barycenters procedure.

bar.ini.method

Only if consensus.method = 'k-barycenter'. Takes values in c('rnd', 'plus-plus'). See details.

consensus.minPts

Only if consensus.method = 'hierarchical'. The value of argument minPts for dbscan::hdbscan.

cl.paral

Number of cores to be used in parallel procedures.

Value

A list containting:

templates

A list representing the consensus clusterings for every group in the partition of the database. Each element of the list is a template partition. Hence it is a list itself, containig the cell types in the prototype, where each element has components: mean, cov, weight and type.

clustering

Clustering of the input partitions.

database.elliptical

A list containig each cytometry in the database viewed as a mixture distribution. Each element of the list is a cytometry viewed as a mixture. Hence it is a list itself, containig the cell types in the cytometry, where each element has components: mean, cov, weight and type.

References

E del Barrio, H Inouzhe, JM Loubes, C Matran and A Mayo-Iscar. (2019) optimalFlow: Optimal-transport approach to flow cytometry gating and population matching. arXiv:1907.08006

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
# # We construct a simple database selecting only some of the Cytometries and some cell types for simplicity and for a better visualisation.
database <- buildDatabase(
  dataset_names = paste0('Cytometry', c(2:5, 7:9, 12:17, 19, 21)),
    population_ids = c('Monocytes', 'CD4+CD8-', 'Mature SIg Kappa', 'TCRgd-'))

# # To select the appropriate number of templates, via hierarchical tree, in an interactive fashion and produce a clustering we can also use:
# templates.optimalFlow <- optimalFlowTemplates(database = database)

templates.optimalFlow <- optimalFlowTemplates(database = database, templates.number = 5,
                                             cl.paral = 1)

optimalFlow documentation built on Nov. 8, 2020, 6:59 p.m.