optimalFlowTemplates: optimalFlowTemplates
In HristoInouzhe/optimalFlow: optimalFlow

optimalFlowTemplates

R Documentation

optimalFlowTemplates

Description

Returns a partition of the input clusterings with a respective consensus clustering for every group.

Usage

optimalFlowTemplates(
  database,
  database.names = NULL,
  cov.estimation = "standard",
  alpha.cov = 0.85,
  equal.weights.template = TRUE,
  hclust.method = "complete",
  trimm.template = FALSE,
  templates.number = NA,
  minPts = 2,
  eps = 1,
  consensus.method = "pooling",
  barycenters.number = NA,
  bar.repetitions = 40,
  alpha.bar = 0.05,
  bar.ini.method = "plus-plus",
  consensus.minPts = 3,
  cl.paral = 1
)

Arguments

`database`	A list where each entry is a partition (clustering) represented as dataframe, of the same dimensions, where the last variable represents the labels of the partition.
`database.names`	Names of the elements in the database.
`cov.estimation`	How to estimate covariance matrices in each cluster of a partition. 'standard' is for using cov(), while 'robust' is for using robustbase::covMcd.
`alpha.cov`	Only when cov.estimation = 'robust'. Indicates the value of alpha in robustbase::covMcd.
`equal.weights.template`	If True, weights assigned to every cluster in a partion are uniform (1/number of clusters). If False, weights assigned to clusters are the proportions of points in every cluster compared to the total amount of points in the partition.
`hclust.method`	Indicates what kind of hierarchical clustering to do with the similarity distances matrix of the partitions. Takes values in c('complete', 'single', 'average', 'hdbscan', 'dbscan').
`trimm.template`	Logical value. Indicates if it is allowed to not take into account some of the entries of database. Default is False.
`templates.number`	Only if hclust.method in c('complete', 'single', 'average'). Indicates the number of clusters to use with cutree. If set to NA (default), plots the hierarchical tree and asks the user to introduce an appropriate number of clusters.
`minPts`	Only if hclust.method in c('hdbscan', 'dbscan'). Indicates the value of argument minPts in dbscan::dbscan and dbscan::hdbscan.
`eps`	Only if hclust.method = 'dbscan'. Indicates the value of eps in dbscan::dbscan.
`consensus.method`	Sets the way of doing consensus clustering when clusters are viewed as Multivariate Distributions. Can take values in c('pooling', 'k-barycenter', 'hierarchical'). See details.
`barycenters.number`	Only if consensus.method = 'k-barycenter'. Sets the number, k, of barycenters when using k-barycenters.
`bar.repetitions`	Only if consensus.method = 'k-barycenter'. How many times to repeat the k-barycenters procedure. Equivalent to nstart in kmeans.
`alpha.bar`	Only if consensus.method = 'k-barycenter'. The level of trimming allowed during the k-barycenters procedure.
`bar.ini.method`	Only if consensus.method = 'k-barycenter'. Takes values in c('rnd', 'plus-plus'). See details.
`consensus.minPts`	Only if consensus.method = 'hierarchical'. The value of argument minPts for dbscan::hdbscan.
`cl.paral`	Number of cores to be used in parallel procedures.

Value

A list containting:

templates: A list representing the consensus clusterings for every group in the partition of the database. Each element of the list is a template partition. Hence it is a list itself, containig the cell types in the prototype, where each element has components: mean, cov, weight and type.
clustering: Clustering of the input partitions.
database.elliptical: A list containig each cytometry in the database viewed as a mixture distribution. Each element of the list is a cytometry viewed as a mixture. Hence it is a list itself, containig the cell types in the cytometry, where each element has components: mean, cov, weight and type.

References

E del Barrio, H Inouzhe, JM Loubes, C Matran and A Mayo-Iscar. (2019) optimalFlow: Optimal-transport approach to flow cytometry gating and population matching. arXiv:1907.08006

Examples

# # We construct a simple database selecting only some of the Cytometries and some cell types for simplicity and for a better visualisation.
database <- buildDatabase(
  dataset_names = paste0('Cytometry', c(2:5, 7:9, 12:17, 19, 21)),
    population_ids = c('Monocytes', 'CD4+CD8-', 'Mature SIg Kappa', 'TCRgd-'))

# # To select the appropriate number of templates, via hierarchical tree, in an interactive fashion and produce a clustering we can also use:
# templates.optimalFlow <- optimalFlowTemplates(database = database)

templates.optimalFlow <- optimalFlowTemplates(database = database, templates.number = 5,
                                             cl.paral = 1)

HristoInouzhe/optimalFlow documentation built on April 23, 2023, 5:45 p.m.