View source: R/predict_mbcfit.R
| predict.mbcfit | R Documentation |
This function predicts cluster assignments for new data based on an existing
model of class mbcfit. The prediction leverages information from the
fitted model to categorize new observations into clusters.
## S3 method for class 'mbcfit'
predict(object, newdata, ...)
object |
An object of class |
newdata |
A numeric vector, matrix, or data frame of observations. Rows
correspond to observations and columns correspond to variables/features.
Categorical variables and |
... |
Further arguments passed to or from other methods. |
The predict.mbcfit function utilizes the parameters of a previously
fitted mbcfit model to allocate new data points to estimated
clusters. The function performs necessary checks to ensure the
mbcfit model returns valid estimates and the dimensionality of the
new data aligns with the model.
The mbcfit object must contain a component named params, which
is itself a list containing the following necessary elements, for a mixture
model with K components:
proportionA numeric vector of length K, with elements summing to 1, representing cluster proportions.
meanA numeric matrix of dimensions c(P, K),
representing cluster centers.
covA numeric array of dimensions c(P, P, K),
representing cluster covariance matrices.
Data dimensionality is P, and new data dimensionality must match
(ncol(newdata) must be equal to P) or otherwise the function
terminates with an error message.
The stored mixture parameters must be Gaussian-ready, i.e. proportions must
be positive and sum to 1, and covariance matrices must be finite,
symmetric, and positive definite.
The predicted clustering is obtained as the MAP estimator using posterior
weights of a Gaussian mixture model parametrized at params.
Denoting with z(x) the predicted cluster label for point
x, and with \phi the (multivariate) Gaussian density:
z(x) = \underset{k=\{1,\ldots,K\}}{\arg\,\max}
\frac{\pi_k\phi(x, \mu_k, \Sigma_k)}{\sum_{j=1}^K \pi_j\phi(x, \mu_j, \Sigma_j)}
A vector of predicted cluster labels, one for each observation in
newdata.
Coraggio, Luca and Pietro Coretto (2023). Selecting the number of clusters, clustering models, and algorithms. A unifying approach based on the quadratic discriminant score. Journal of Multivariate Analysis, Vol. 196(105181), 1-20. doi: \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1016/j.jmva.2023.105181")}
gmix
# load data
data(banknote)
dat <- banknote[, -1]
# Estimate 3-components gaussian mixture model
set.seed(123)
res <- gmix(dat, K = 3)
# Cluster in output from gmix
print(res$cluster)
# Predict cluster on a single point
# (keep table dimension)
predict(res, dat[1, , drop = FALSE])
# Predict cluster on a subset
predict(res, dat[1:10, ])
# Predicted cluster on original dataset are equal to the clustering from the
# gmix model
all(predict(res, dat) == res$cluster)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.