Description Usage Arguments Details Value References See Also Examples
The adaptive L_1 penalty was proposed by Pan and Shen (2007). Under the framework of the model-based clustering, APL1 aims to identify the globally informative variables for clustering high-dimensional data.
1 2 3 4 | apL1(tuning, K = NULL, lambda = NULL, y, N = 100, kms.iter = 100, kms.nstart = 100,
adapt.kms = FALSE, eps.diff = 1e-5, eps.em = 1e-5, model.crit = 'gic')
apL1(tuning = NULL, K, lambda, y, N = 100, kms.iter = 100, kms.nstart = 100,
adapt.kms = FALSE, eps.diff = 1e-5, eps.em = 1e-5, model.crit = 'gic')
|
tuning |
A 2-dimensional vector or a matrix with 2 columns, the first column is the number of clusters K and the second column is the tuning parameter λ in the penalty term. If this is missing, then |
K |
The number of clusters K. |
lambda |
The tuning parameter λ in the penalty term. |
y |
A p-dimensional data matrix. Each row is an observation. |
N |
The maximum number of iterations in the EM algorithm. The default value is 100. |
kms.iter |
The maximum number of iterations in kmeans algorithm for generating the starting value for the EM algorithm. |
kms.nstart |
The number of starting values in K-means. |
adapt.kms |
A indicator of using the cluster means estimated by K-means to calculate the adaptive parameters in APFP. The default value is FALSE. |
eps.diff |
The lower bound of pairwise difference of two mean values. Any value lower than it is treated as 0. |
eps.em |
The lower bound for the stopping criterion. |
model.crit |
The criterion used to select the number of clusters K. It is either ‘bic’ for Bayesian Information Criterion or ‘gic’ for Generalized Information Criterion. |
A variable is defined as globally informative if there exists at least one pair of clusters such that μ_{kj} \neq μ_{k'j}. Here we assume that each cluster has the same diagonal variance in the model-based clustering. APL1 is in the following form,
∑_{j=1}^d τ_{kj}∑_{k=1}^K |μ_{kj}|,
where d is the number of variables in the data, K is the number of clusters, τ_{kj} = \tilde{μ}_{kj} is the adaptive parameters. Here we provide two choices for τ_{kj}. If adapt.kms == TRUE
, \tilde{μ}_{kj} is the estimates from the K-mean algorithm; otherwise, \tilde{μ}_{kj} is the estimates from the model-based clustering without penalty.
The EM algorithm is used for estimating parameters. Since the EM algorithm depends on the starting values. We use the estimates from K-means with multiple starting points as the starting values.
This function returns the esimated parameters and some statistics of the optimal model within the given K and λ, which is selected by BIC when model.crit = 'bic'
or GIC when model.crit = 'gic'
.
mu.hat.best |
The estimated cluster means in the optimal model |
sigma.hat.best |
The estimated covariance in the optimal model |
p.hat.best |
The estimated cluster proportions in the optimal model |
s.hat.best |
The clustering assignments using the optimal model |
lambda.best |
The value of λ that provide the optimal model |
K.best |
The value of K that provide the optimal model |
llh.best |
The log-likelihood of the optimal model |
gic.best |
The GIC of the optimal model |
bic.best |
The BIC of the optimal model |
ct.mu.best |
The degrees of freedom in the cluster means of the optimal model |
Pan, W. and Shen, X. (2007). Penalized model-based clustering with application to variable selection. The Journal of Machine Learning Research 8, 1145–1164.
1 2 3 |
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.