PRclust: Find the Solution of Penalized Regression-Based Clustering.
In ChongWu-Biostat/prclust: Penalized Regression-Based Clustering Method

Description Usage Arguments Details Value Note Author(s) References Examples

View source: R/RcppExports.R

Clustering is unsupervised and exploratory in nature. Yet, it can be performed through penalized regression with grouping pursuit. Prclust helps us peform penalized regression-based clustering with various loss functions and grouping penalities via two algorithm (DC-ADMM and quadratic penalty).

PRclust(data, lambda1, lambda2, tau, 
	loss.method = c("quadratic","lasso"), 
	grouping.penalty = c("gtlp","L1","SCAD","MCP"), 
	algorithm = c("ADMM","Quadratic"), epsilon=0.001)

`data`	input matrix, of dimension nvars x nobs; each column is an observation vector.
`lambda1`	Tuning parameter or step size: lambda1, typically set at 1 for quadratic penalty based algorithm; 0.4 for revised ADMM.
`lambda2`	Tuning parameter: lambda2, the magnitude of grouping penalty.
`tau`	Tuning parameter: tau, related to grouping penalty.
`loss.method`	The loss method. "lasso" stands for L_1 loss function, while "quadratic" stands for the quadratic loss function.
`grouping.penalty`	Grouping penalty. Character: may be abbreviated. "gtlp" means generalized group lasso is used for grouping penalty. "lasso" means lasso is used for grouping penalty. "SCAD" and "MCP" are two other non-convex penalty.
`algorithm`	character: may be abbreviated. The algorithm to use for finding the solution. The default algorithm is "ADMM", which stands for the new algorithm we developed.
`epsilon`	The stopping critetion parameter. The default is 0.001.

Clustering analysis has been widely used in many fields. In the absence of a class label, clustering analysis is also called unsupervised learning. However, penalized regression-based clustering adopts a novel framework for clustering analysis by viewing it as a regression problem. In this method, a novel non-convex penalty for grouping pursuit was proposed which data-adaptively encourages the equality among some unknown subsets of parameter estimates. This new method can deal with some complex clustering situation, for example, in the presence of non-convex cluster, in which the K-means fails to work, PRclust might perform much better.

The return value is a list. In this list, it contains the following matrix.

`mu`	The centroid of the each observations.
`theta`	The theta value for the data set, not very useful.
`group`	The group for each points.
`count`	The iteration times.

Choosing tunning parameter is kind of time consuming job. It is always based on "trials and errors".

Chong Wu, Wei Pan

Pan, W., Shen, X., & Liu, B. (2013). Cluster analysis: unsupervised learning via supervised learning with a non-convex penalty. Journal of Machine Learning Research, 14(1), 1865-1889.

Wu, C., Kwon, S., Shen, X., & Pan, W. (2016). A New Algorithm and Theory for Penalized Regression-based Clustering. Journal of Machine Learning Research, 17(188), 1-25.

library("prclust")
# To let you have a better understanding about the power and strength
# of PRclust method, 6 examples in original prclust paper were provided.
################################################
### case 1
################################################
## generate the data
data = matrix(NA,2,100)
data[1,1:50] = rnorm(50,0,0.33)
data[2,1:50] = rnorm(50,0,0.33)
data[1,51:100] = rnorm(50,1,0.33)
data[2,51:100] = rnorm(50,1,0.33)
## set the tunning parameter
lambda1 =1
lambda2 = 3
tau = 0.5
a =PRclust(data,lambda1,lambda2,tau)
a