predict.mbcfit | R Documentation |
This function predicts cluster assignments for new data based on an existing model of class mbcfit
. The prediction leverages information from the fitted model to categorize new observations into clusters.
## S3 method for class 'mbcfit'
predict(object, newdata, ...)
object |
An object of class |
newdata |
A numeric vector, matrix, or data frame of observations. Rows correspond to observations and columns correspond to variables/features. Categorical variables and |
... |
Further arguments passed to or from other methods. |
The predict.mbcfit
function utilizes the parameters of a previously fitted mbcfit
model to allocate new data points to estimated clusters. The function performs necessary checks to ensure the mbcfit
model returns valid estimates and the dimensionality of the new data aligns with the model.
The mbcfit
object must contain a component named params
, which is itself a list containing the following necessary elements, for a mixture model with K components:
proportions
A numeric vector of length K, with elements summing to 1, representing cluster proportions.
mean
A numeric matrix of dimensions c(P, K)
, representing cluster centers.
cov
A numeric array of dimensions c(P, P, K)
, representing cluster covariance matrices.
Data dimensionality is P
, and new data dimensionality must match (ncol(data)
must be equal to P
) or otherwise the function terminates with an error message.
The predicted clustering is obtained as the MAP estimator using posterior weights of a Gaussian mixture model parametrized at params
.
Denoting with z(x)
the predicted cluster label for point x
, and with \phi
the (multivariate) Gaussian density:
z(x) = \underset{k=\{1,\ldots,K\}}{\arg\,\max} \frac{\pi_k\phi(x, \mu_k, \Sigma_k)}{\sum_{j=1}^K \pi_j\phi(x, \mu_j, \Sigma_j)}
A vector of length nrow(data)
containing the estimated cluster labels for each observation in the provided data
.
Coraggio, Luca and Pietro Coretto (2023). Selecting the number of clusters, clustering models, and algorithms. A unifying approach based on the quadratic discriminant score. Journal of Multivariate Analysis, Vol. 196(105181), 1-20. doi: \Sexpr[results=rd]{tools:::Rd_expr_doi("10.1016/j.jmva.2023.105181")}
gmix
# load data
data(banknote)
dat <- banknote[,-1]
# Estimate 3-components gaussian mixture model
set.seed(123)
res <- gmix(dat, K = 3)
# Cluster in output from gmix
print(res$cluster)
# Predict cluster on a single point
# (keep table dimension)
predict(res, dat[1, , drop=FALSE])
# Predict cluster on a subset
predict(res, dat[1:10, ])
# Predicted cluster on original dataset are equal to the clustering from the gmix model
all(predict(res, dat) == res$cluster)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.