Home

/

CRAN

/

PARSE

/

apL1: Model-based Clustering with APL1

apL1: Model-based Clustering with APL1
In PARSE: Model-Based Clustering with Regularization Methods for High-Dimensional Data

Description Usage Arguments Details Value References See Also Examples

View source: R/apL1.R

The adaptive L_1 penalty was proposed by Pan and Shen (2007). Under the framework of the model-based clustering, APL1 aims to identify the globally informative variables for clustering high-dimensional data.

apL1(tuning, K = NULL, lambda = NULL, y, N = 100, kms.iter = 100, kms.nstart = 100,
      adapt.kms = FALSE, eps.diff = 1e-5, eps.em = 1e-5, model.crit = 'gic')
apL1(tuning = NULL, K, lambda, y, N = 100, kms.iter = 100, kms.nstart = 100,
      adapt.kms = FALSE, eps.diff = 1e-5, eps.em = 1e-5, model.crit = 'gic')

`tuning`	A 2-dimensional vector or a matrix with 2 columns, the first column is the number of clusters K and the second column is the tuning parameter λ in the penalty term. If this is missing, then `K` and `lambda` must be provided.
`K`	The number of clusters K.
`lambda`	The tuning parameter λ in the penalty term.
`y`	A p-dimensional data matrix. Each row is an observation.
`N`	The maximum number of iterations in the EM algorithm. The default value is 100.
`kms.iter`	The maximum number of iterations in kmeans algorithm for generating the starting value for the EM algorithm.
`kms.nstart`	The number of starting values in K-means.
`adapt.kms`	A indicator of using the cluster means estimated by K-means to calculate the adaptive parameters in APFP. The default value is FALSE.
`eps.diff`	The lower bound of pairwise difference of two mean values. Any value lower than it is treated as 0.
`eps.em`	The lower bound for the stopping criterion.
`model.crit`	The criterion used to select the number of clusters K. It is either ‘bic’ for Bayesian Information Criterion or ‘gic’ for Generalized Information Criterion.

A variable is defined as globally informative if there exists at least one pair of clusters such that μ_{kj} \neq μ_{k'j}. Here we assume that each cluster has the same diagonal variance in the model-based clustering. APL1 is in the following form,

∑_{j=1}^d τ_{kj}∑_{k=1}^K |μ_{kj}|,

where d is the number of variables in the data, K is the number of clusters, τ_{kj} = \tilde{μ}_{kj} is the adaptive parameters. Here we provide two choices for τ_{kj}. If adapt.kms == TRUE, \tilde{μ}_{kj} is the estimates from the K-mean algorithm; otherwise, \tilde{μ}_{kj} is the estimates from the model-based clustering without penalty.

The EM algorithm is used for estimating parameters. Since the EM algorithm depends on the starting values. We use the estimates from K-means with multiple starting points as the starting values.

This function returns the esimated parameters and some statistics of the optimal model within the given K and λ, which is selected by BIC when model.crit = 'bic' or GIC when model.crit = 'gic'.

`mu.hat.best`	The estimated cluster means in the optimal model
`sigma.hat.best`	The estimated covariance in the optimal model
`p.hat.best`	The estimated cluster proportions in the optimal model
`s.hat.best`	The clustering assignments using the optimal model
`lambda.best`	The value of λ that provide the optimal model
`K.best`	The value of K that provide the optimal model
`llh.best`	The log-likelihood of the optimal model
`gic.best`	The GIC of the optimal model
`bic.best`	The BIC of the optimal model
`ct.mu.best`	The degrees of freedom in the cluster means of the optimal model

Pan, W. and Shen, X. (2007). Penalized model-based clustering with application to variable selection. The Journal of Machine Learning Research 8, 1145–1164.

nopenalty apfp parse

1
2
3

y <- rbind(matrix(rnorm(100,0,1),ncol=2), matrix(rnorm(100,4,1), ncol=2))
output <- apL1(K = c(1:2), lambda = c(0,0.1), y=y)
output$mu.hat.best

PARSE documentation built on May 2, 2019, 9:57 a.m.

PARSE index

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

PARSE
Model-Based Clustering with Regularization Methods for High-Dimensional Data

apL1: Model-based Clustering with APL1
In PARSE: Model-Based Clustering with Regularization Methods for High-Dimensional Data

Description

Usage

Arguments

Details

Value

References

See Also

Examples

Related to apL1 in PARSE...

R Package Documentation

Browse R Packages

We want your feedback!

PARSE Model-Based Clustering with Regularization Methods for High-Dimensional Data

apL1: Model-based Clustering with APL1 In PARSE: Model-Based Clustering with Regularization Methods for High-Dimensional Data

Description

Usage

Arguments

Details

Value

References

See Also

Examples

Related to apL1 in PARSE...

R Package Documentation

Browse R Packages

We want your feedback!

PARSE
Model-Based Clustering with Regularization Methods for High-Dimensional Data

apL1: Model-based Clustering with APL1
In PARSE: Model-Based Clustering with Regularization Methods for High-Dimensional Data