| chips | R Documentation |
This function provides a partition to a subset of items which has high marginal probability based on samples from a partition distribution using the conditional high inclusion probability subset (CHIPS) partition greedy search method (Barrientos, Page, Dahl, Dunson, 2024).
chips(
partitions,
threshold = 0,
nRuns = 64,
intermediateResults = identical(threshold, 0),
allCandidates = FALSE,
andSALSO = !intermediateResults && !allCandidates,
loss = VI(a = 1),
maxNClusters = 0,
initialPartition = integer(0),
nCores = 0
)
partitions |
A |
threshold |
The minimum marginal probability for the subpartition, i.e., the gamma parameter. Values closer to 1.0 will yield a partition of fewer items and values closer to 0.0 will yield a partition of more items. |
nRuns |
The number of runs to try. When multiple runs produce candidate subpartitions, the one allocating the most items is selected; among ties, the candidate with the highest posterior probability is chosen. |
intermediateResults |
Should intermediate subset partitions be returned? |
allCandidates |
Should all the final subset partitions from multiple runs be returned? |
andSALSO |
Should the resulting incomplete partition be completed using SALSO? |
loss |
When |
maxNClusters |
The maximum number of clusters that can be considered by
SALSO, which has important implications for the interpretability of the
resulting clustering and can greatly influence the RAM needed for the
optimization algorithm. If the supplied value is zero, the optimization is
constrained by the maximum number of clusters among the clusterings in
|
initialPartition |
An vector of length |
nCores |
The number of CPU cores to use, i.e., the number of simultaneous runs at any given time. A value of zero indicates to use all cores on the system. |
A complete, end-to-end demonstration is provided in the package demo. To run it and to access the accompanying synthetic dataset:
Run the demo: demo("chips", package = "salso")
Load the synthetic data used in the demo: data("synthetic", package = "salso")
A list containing:
chips_partition: If intermediateResults is FALSE, an integer vector giving the
estimated subset partition, encoded using cluster labels with -1
indicating not allocated. If TRUE, an integer matrix with intermediate subset
partitions in the rows.
n_items: Number of items in the estimated subset partition.
probability: Monte Carlo estimate of the probability of the subset partition.
AUChips: If intermediateResults is TRUE, this element is provided and gives
the area under the probability curve as a function of the number of clusters
after scaling to be between 0 and 1.
chips_and_salso_partition: If andSALSO is TRUE, this element is provided and
gives an integer vector giving the
estimated partition of all items based on CHiPS until the threshold is met
and using SALSO to allocate the rest.
data(iris.clusterings)
draws <- iris.clusterings
# For examples, use 'nRuns = 1' and 'nCores = 1' for CRAN, but in practice omit this.
all <- chips(draws, nRuns = 1, nCores = 1)
plot(all$n_items, all$probability, type = "l")
subpartition <- threshold(all, 0.80, nCores = 1)
str(subpartition)
# See the full CHIPS demo run: demo("chips", package = "salso")
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.