optimalFlowClassification: optimalFlowClassification
In HristoInouzhe/optimalFlow: optimalFlow

optimalFlowClassification

R Documentation

optimalFlowClassification

Description

Performs a supervised classification of input data when a database and a partition of the database are provided.

Usage

optimalFlowClassification(
  X,
  database,
  templates,
  consensus.method = "pooling",
  cov.estimation = "standard",
  alpha.cov = 0.85,
  initial.method = "supervized",
  max.clusters = NA,
  alpha.tclust = 0,
  restr.factor.tclust = 1000,
  classif.method = "qda",
  qda.bar = TRUE,
  cost.function = "points",
  cl.paral = 1,
  equal.weights.voting = TRUE,
  equal.weights.template = TRUE
)

Arguments

`X`	Datasample to be classified.
`database`	A list where each entry is a partition (clustering) represented as dataframe, of the same dimensions, where the last variable represents the labels of the partition.
`templates`	List of the consensus clusterings for every group in the partition of the database obtained by optimalFlowTemplates
`consensus.method`	The consensus.method value that was used in optimalFlowTemplates.
`cov.estimation`	How to estimate covariance matrices in each cluster of a partition. "standard" is for using cov(), while "robust" is for using robustbase::covMcd.
`alpha.cov`	Only when cov.estimation = "robust". Indicates the value of alpha in robustbase::covMcd.
`initial.method`	Indicates how to obtain a partition of X. Takes values in c("supervized", "unsupervized"). Supervized uses tclust initilized by templates. Unsupevized usese flowMeans.
`max.clusters`	The maximum numbers of clusters for flowMeans. Only when initial.method = unsupervized.
`alpha.tclust`	Level of trimming allowed fo tclust. Only when initial.method = supervized.
`restr.factor.tclust`	Fixes the restr.fact parameter in tclust. Only when initial.method = supervized.
`classif.method`	Indicates what type of supervised learning we want to do. Takes values on c("matching", "qda", "random forest").
`qda.bar`	Only if classif.method = "qda". If True then the appropriate consensus clustering (template, prototype) is used for learning. If False, the closest partition in the appropriate group is used.
`cost.function`	Only if classif.method = "matching". Indicates the cost function, distance between clusters, to be used for label matching.
`cl.paral`	Number of cores to be used in parallel procedures.
`equal.weights.voting`	only when classif.method = "qda" and qda.bar =F, or when classif.method = "random forest". Indicates the weights structure when looking for the most similar partition in a group.
`equal.weights.template`	If True, weights assigned to every cluster in a partion are uniform (1/number of clusters). If False, weights assigned to clusters are the proportions of points in every cluster compared to the total amount of points in the partition.

Value

A list formed by:

cluster: Labels assigned to the input data.
clusterings: A list that contains the initial unsupervized or semi-supervized clusterings of the cytometry of interest. Can have as much entries as the number of templates in the semi-supervized case (initial.method = "supervized), or only one entry in the case of initial.method = "unsupervized". Each entry is a list where the most relevant argument for the clusterings is cluster.
assigned.template.index: Label of the group for which the template is closer to the data. When classical qda or random forest ares used for classification there is a secon argument indicating the index of the cytometry in the cluster used for learning.
cluster.vote: Only when classif.method = "matching" or when consensus.method in c("hierarchical", "k-barycenter"). Vote on the type of every label in the partition of the data. In essence, cluster + cluster.vote return a fuzzy clustering of the data of interest.

References

E del Barrio, H Inouzhe, JM Loubes, C Matran and A Mayo-Iscar. (2019) optimalFlow: Optimal-transport approach to flow cytometry gating and population matching. arXiv:1907.08006

Examples

# # We construct a simple database selecting only some of the Cytometries and some cell types for simplicity and for a better visualisation.
database <- buildDatabase(
  dataset_names = paste0('Cytometry', c(2:5, 7:9, 12:17, 19, 21)),
    population_ids = c('Monocytes', 'CD4+CD8-', 'Mature SIg Kappa', 'TCRgd-'))
# # To select the appropriate number of templates, via hierarchical tree, in an interactive fashion and produce a clustering we can also use:
# templates.optimalFlow <- optimalFlowTemplates(database = database)
templates.optimalFlow <- optimalFlowTemplates(database = database, templates.number = 5,
                                             cl.paral = 1)
classification.optimalFlow <- optimalFlowClassification(Cytometry1[
  which(match(Cytometry1$`Population ID (name)`,c("Monocytes", "CD4+CD8-", "Mature SIg Kappa",
                                                  "TCRgd-"), nomatch = 0) > 0), 1:10], database, templates.optimalFlow, cl.paral = 1)
scoreF1.optimalFlow <- optimalFlow::f1Score(classification.optimalFlow$cluster,
                                           Cytometry1[which(match(Cytometry1$`Population ID (name)`,
                                                                                 c("Monocytes", "CD4+CD8-", "Mature SIg Kappa", "TCRgd-"), nomatch = 0) > 0),], noise.types)

HristoInouzhe/optimalFlow documentation built on April 23, 2023, 5:45 p.m.