Description Usage Arguments Value References Examples
Performs a supervised classification of input data when a database and a partition of the database are provided.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 | optimalFlowClassification(
X,
database,
templates,
consensus.method = "pooling",
cov.estimation = "standard",
alpha.cov = 0.85,
initial.method = "supervized",
max.clusters = NA,
alpha.tclust = 0,
restr.factor.tclust = 1000,
classif.method = "qda",
qda.bar = TRUE,
cost.function = "points",
cl.paral = 1,
equal.weights.voting = TRUE,
equal.weights.template = TRUE
)
|
X |
Datasample to be classified. |
database |
A list where each entry is a partition (clustering) represented as dataframe, of the same dimensions, where the last variable represents the labels of the partition. |
templates |
List of the consensus clusterings for every group in the partition of the database obtained by optimalFlowTemplates |
consensus.method |
The consensus.method value that was used in optimalFlowTemplates. |
cov.estimation |
How to estimate covariance matrices in each cluster of a partition. "standard" is for using cov(), while "robust" is for using robustbase::covMcd. |
alpha.cov |
Only when cov.estimation = "robust". Indicates the value of alpha in robustbase::covMcd. |
initial.method |
Indicates how to obtain a partition of X. Takes values in c("supervized", "unsupervized"). Supervized uses tclust initilized by templates. Unsupevized usese flowMeans. |
max.clusters |
The maximum numbers of clusters for flowMeans. Only when initial.method = unsupervized. |
alpha.tclust |
Level of trimming allowed fo tclust. Only when initial.method = supervized. |
restr.factor.tclust |
Fixes the restr.fact parameter in tclust. Only when initial.method = supervized. |
classif.method |
Indicates what type of supervised learning we want to do. Takes values on c("matching", "qda", "random forest"). |
qda.bar |
Only if classif.method = "qda". If True then the appropriate consensus clustering (template, prototype) is used for learning. If False, the closest partition in the appropriate group is used. |
cost.function |
Only if classif.method = "matching". Indicates the cost function, distance between clusters, to be used for label matching. |
cl.paral |
Number of cores to be used in parallel procedures. |
equal.weights.voting |
only when classif.method = "qda" and qda.bar =F, or when classif.method = "random forest". Indicates the weights structure when looking for the most similar partition in a group. |
equal.weights.template |
If True, weights assigned to every cluster in a partion are uniform (1/number of clusters). If False, weights assigned to clusters are the proportions of points in every cluster compared to the total amount of points in the partition. |
A list formed by:
Labels assigned to the input data.
A list that contains the initial unsupervized or semi-supervized clusterings of the cytometry of interest. Can have as much entries as the number of templates in the semi-supervized case (initial.method = "supervized), or only one entry in the case of initial.method = "unsupervized". Each entry is a list where the most relevant argument for the clusterings is cluster.
Label of the group for which the template is closer to the data. When classical qda or random forest ares used for classification there is a secon argument indicating the index of the cytometry in the cluster used for learning.
Only when classif.method = "matching" or when consensus.method in c("hierarchical", "k-barycenter"). Vote on the type of every label in the partition of the data. In essence, cluster + cluster.vote return a fuzzy clustering of the data of interest.
E del Barrio, H Inouzhe, JM Loubes, C Matran and A Mayo-Iscar. (2019) optimalFlow: Optimal-transport approach to flow cytometry gating and population matching. arXiv:1907.08006
1 2 3 4 5 6 7 8 9 10 11 12 13 14 | # # We construct a simple database selecting only some of the Cytometries and some cell types for simplicity and for a better visualisation.
database <- buildDatabase(
dataset_names = paste0('Cytometry', c(2:5, 7:9, 12:17, 19, 21)),
population_ids = c('Monocytes', 'CD4+CD8-', 'Mature SIg Kappa', 'TCRgd-'))
# # To select the appropriate number of templates, via hierarchical tree, in an interactive fashion and produce a clustering we can also use:
# templates.optimalFlow <- optimalFlowTemplates(database = database)
templates.optimalFlow <- optimalFlowTemplates(database = database, templates.number = 5,
cl.paral = 1)
classification.optimalFlow <- optimalFlowClassification(Cytometry1[
which(match(Cytometry1$`Population ID (name)`,c("Monocytes", "CD4+CD8-", "Mature SIg Kappa",
"TCRgd-"), nomatch = 0) > 0), 1:10], database, templates.optimalFlow, cl.paral = 1)
scoreF1.optimalFlow <- optimalFlow::f1Score(classification.optimalFlow$cluster,
Cytometry1[which(match(Cytometry1$`Population ID (name)`,
c("Monocytes", "CD4+CD8-", "Mature SIg Kappa", "TCRgd-"), nomatch = 0) > 0),], noise.types)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.