Home

/

CRAN

/

PARSE

/

apfp: Model-based Clustering with APFP

apfp: Model-based Clustering with APFP
In PARSE: Model-Based Clustering with Regularization Methods for High-Dimensional Data

Description Usage Arguments Details Value References See Also Examples

View source: R/apfp.R

The adaptive pairwise fusion penalty (APFP) was proposed by Guo (2010). Under the framework of the model-based clustering, APFP aims to identify the pairwise informative variables for clustering high-dimenisonal data.

apfp(tuning, K = NULL, lambda = NULL, y, N = 100, kms.iter = 100, kms.nstart = 100,
      adapt.kms = FALSE, eps.diff = 1e-5, eps.em = 1e-5,
      iter.LQA = 20, eps.LQA = 1e-5, model.crit = 'gic')
apfp(tuning = NULL, K, lambda, y, N = 100, kms.iter = 100, kms.nstart = 100,
      adapt.kms = FALSE, eps.diff = 1e-5, eps.em = 1e-5,
      iter.LQA = 20, eps.LQA = 1e-5, model.crit = 'gic')

`tuning`	A 2-dimensional vector or a matrix with 2 columns, the first column is the number of clusters K and the second column is the tuning parameter λ in the penalty term. If this is missing, then `K` and `lambda` must be provided.
`K`	The number of clusters K.
`lambda`	The tuning parameter λ in the penalty term.
`y`	A p-dimensional data matrix. Each row is an observation.
`N`	The maximum number of iterations in the EM algorithm. The default value is 100.
`kms.iter`	The maximum number of iterations in kmeans algorithm for generating the starting value for the EM algorithm.
`kms.nstart`	The number of starting values in K-means.
`adapt.kms`	A indicator of using the cluster means estimated by K-means to calculate the adaptive parameters in APFP. The default value is FALSE.
`eps.diff`	The lower bound of pairwise difference of two mean values. Any value lower than it is treated as 0.
`eps.em`	The lower bound for the stopping criterion in the EM algorithm.
`iter.LQA`	The number of iterations in the estimation of cluster means by using the local quadratic approximation (LQA).
`eps.LQA`	The lower bound for the stopping criterion in the estimation of cluster means.
`model.crit`	The criterion used to select the number of clusters K. It is either ‘bic’ for Bayesian Information Criterion or ‘gic’ for Generalized Information Criterion.

The j-th variable is defined as pairwise informative for a pair of clusters C_k and C_{k'} if μ_{kj} \neq μ_{k'j}. Also, a variable is globally informative if it is pairwise informative for at least one pair of clusters. Here we assume that each cluster has the same diagonal variance in the model-based clustering. APFP is in the following form,

∑_{j=1}^d ∑_{k<k'}τ_{kk'j}|μ_{kj} - μ_{k'j}|,

where d is the number of variables in the data, τ_{kk'j} = |\tilde{μ}_{kj} - \tilde{μ}_{k'j}|^{-1} is the adaptive parameters. Here we provide two choices for \tilde{μ_{kj}}. If adapt.kms == TRUE, \tilde{μ}_{kj} is the estimates from the K-mean algorithm; otherwise, \tilde{μ}_{kj} is the estimates from the model-based clustering without penalty.

The estimation uses the EM algorithm. Since the EM algorithm depends on the starting values. We use the estimates from K-means with multiple starting points as the starting values. For estimating the cluster means, APFP uses the local quadratic approximation.

This function returns the esimated parameters and some statistics of the optimal model within the given K and λ, which is selected by BIC when model.crit = 'bic' or GIC when model.crit = 'gic'.

`mu.hat.best`	The estimated cluster means in the optimal model
`sigma.hat.best`	The estimated covariance in the optimal model
`p.hat.best`	The estimated cluster proportions in the optimal model
`s.hat.best`	The clustering assignments using the optimal model
`lambda.best`	The value of λ that provide the optimal model
`K.best`	The value of K that provide the optimal model
`llh.best`	The log-likelihood of the optimal model
`gic.best`	The GIC of the optimal model
`bic.best`	The BIC of the optimal model
`ct.mu.best`	The degrees of freedom in the cluster means of the optimal model

Guo, J., Levina, E., Michailidis, G., and Zhu, J. (2010) Pairwise variable selection for high-dimensional model-based clustering. Biometrics 66(3), 793–804.

nopenalty apL1 parse

1
2
3

y <- rbind(matrix(rnorm(100,0,1),ncol=2), matrix(rnorm(100,4,1), ncol=2))
output <- apfp(K = c(1:2), lambda = c(0,1), y=y)
output$mu.hat.best

PARSE documentation built on May 2, 2019, 9:57 a.m.

PARSE index

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

PARSE
Model-Based Clustering with Regularization Methods for High-Dimensional Data

apfp: Model-based Clustering with APFP
In PARSE: Model-Based Clustering with Regularization Methods for High-Dimensional Data

Description

Usage

Arguments

Details

Value

References

See Also

Examples

Related to apfp in PARSE...

R Package Documentation

Browse R Packages

We want your feedback!

PARSE Model-Based Clustering with Regularization Methods for High-Dimensional Data

apfp: Model-based Clustering with APFP In PARSE: Model-Based Clustering with Regularization Methods for High-Dimensional Data

Description

Usage

Arguments

Details

Value

References

See Also

Examples

Related to apfp in PARSE...

R Package Documentation

Browse R Packages

We want your feedback!

PARSE
Model-Based Clustering with Regularization Methods for High-Dimensional Data

apfp: Model-based Clustering with APFP
In PARSE: Model-Based Clustering with Regularization Methods for High-Dimensional Data