# GCV: Calculate the Generalized Cross-Validation Statistic (GCV) In prclust: Penalized Regression-Based Clustering Method

## Description

Calculate the generalized cross-validation statistic with generalized degrees of freedom.

## Usage

 ```1 2 3 4``` ```GCV(data,lambda1,lambda2,tau,sigma,B=100, loss.method = c("quadratic","lasso"), grouping.penalty = c("gtlp","L1","SCAD","MCP"), algorithm = c("ADMM","Quadratic"), epsilon =0.001) ```

## Arguments

 `data` Numeric data matrix . `lambda1` Tuning parameter or step size: lambda1, typically set at 1 for quadratic penalty based algorithm; 0.4 for revised ADMM. `lambda2` Tuning parameter: lambda2, the magnitude of grouping penalty. `tau` Tuning parameter: tau, related to grouping penalty. `sigma` The perturbation size. `B` The Monte Carlo time. The defualt value is 100. `loss.method ` character may be abbreviated. "lasso" stands for L_1 loss function, while "quadratic" stands for the quadratic loss function. `grouping.penalty` character: may be abbreviated. "gtlp" means generalized group lasso is used for grouping penalty. "lasso" means lasso is used for grouping penalty. "SCAD" and "MCP" are two other non-convex penalty. `algorithm` character: may be abbreviated. The algorithm will use for finding the solution. The default algorithm is "ADMM", which stands for the DC-ADMM. `epsilon` The stopping critetion parameter. The default is 0.001.

## Details

A bonus with the regression approach to clustering is the potential application of many existing model selection methods for regression or supervised learning to clustering. We propose using generalized cross-validation (GCV). GCV can be regarded as an approximation to leave-one-out cross-validation (CV). Hence, GCV provides an approximately unbiased estimate of the prediction error.

We use the generalized degrees of freedom (GDF) to consider the data-adaptive nature in estimating the centroids of the observations.

The chosen tuning parameters are the one giving the smallest GCV error.

## Value

Return value: the Generalized cross-validation statistic (GCV)

## Author(s)

Chong Wu, Wei Pan

## References

Pan, W., Shen, X., & Liu, B. (2013). Cluster analysis: unsupervised learning via supervised learning with a non-convex penalty. Journal of Machine Learning Research, 14(1), 1865-1889.

## Examples

 ``` 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19``` ```set.seed(1) library("prclust") data = matrix(NA,2,50) data[1,1:25] = rnorm(25,0,0.33) data[2,1:25] = rnorm(25,0,0.33) data[1,26:50] = rnorm(25,1,0.33) data[2,26:50] = rnorm(25,1,0.33) #case 1 gcv1 = GCV(data,lambda1=1,lambda2=1,tau=0.5,sigma=0.25,B =10) gcv1 #case 2 gcv2 = GCV(data,lambda1=1,lambda2=0.7,tau=0.3,sigma=0.25,B = 10) gcv2 # Note that the combination of tuning parameters in case 1 are better than # the combination of tuning parameters in case 2 since the value of GCV in case 1 is # less than the value in case 2. ```

### Example output

```          GDF         GCV groupNum estSigmaSquare
[1,] 41.04462 0.001923263        4      0.1133867
GDF        GCV groupNum estSigmaSquare
[1,] 84.59114 0.01146571       11      0.1766736
```

prclust documentation built on May 2, 2019, 10:24 a.m.