PgaMsgl.cv: Cross validation of hyperparameters of PgaMsgl

Description Usage Arguments Value Author(s) Examples

Description

Cross validation of hyperparameters mg and mc of PgaMsgl.

Usage

1
PgaMsgl.cv(XX, YY, B0, model = c("L020v1", "L020v2", "L121"), Gm, mi = 1000, mg.v, mc.v, minlambda = 1e-5, rlambda = 0.98, mintau = 1e-5, rtau = 0.98, fold = 5, seed = 1, ncores)

Arguments

XX

Matrix X in the model Y=XB. Remember to use as.matrix() if your input is a data frame.

YY

Matrix Y in the model Y=XB. Remember to use as.matrix() if your input is a data frame.

B0

Initial matrix of the coefficient matrix B in the model Y=XB. Remember to use as.matrix() if your input is a data frame.

model

The model for Sparse Group Lasso, L121, L020v1, or L020v2.

Gm

Matrix of the group structure of coefficient matrix B. It is the a matrix of group boundaries, with each row indicating a group, four columns indicate the row-start, row-end, column-start and column-end of the group. The row/column index is 1-based.

mi

Maximum number of iterations allowed, default value is 1000.

mg.v

A vector indicates maximum number of groups in matrix B to be reserved to be cross validated.

mc.v

A vector indicates maximum number of single coefficients in matrix B to be reserved to be cross validated.

minlambda

Minimum value of lambda. Only used when model = "L020v2", default value is 1e-5.

rlambda

Rate of lambda decrease. Only used when model = "L020v2", defalult value is 0.98.

mintau

Minimum value of tau. Only used when model = "L020v2", default value is 1e-5.

rtau

Rate of tau decrease. Only used when model = "L020v2", default value is 0.98.

fold

Number of fold for k-fold cross validation.

seed

Numeric value for set.seed() when generate test or evaluate samples for k-fold cross validation.

ncores

The number of cores to use for parallel execution. A parameter of registerDoMC() in doMC package.

Value

rss

A vector of the residual sum of squares (RSS) of model fitting with each combination of parameters.

RMSE

A vector of the root-mean-square error (RMSE) of model fitting with each combination of parameters.

Rsquare

A vector of the R^2 of model fitting with each combination of parameters.

rss.matr

A matrix of the residual sum of squares (RSS) of model fitting with each combination of parameters.

RMSE.matr

A matrix of the root-mean-square error (RMSE)E of model fitting with each combination of parameters.

Rsquare.matr

A matrix of the R^2 of model fitting with each combination of parameters.

mg.v

A vector of values of the parameter mg tested.

mc.v

A vector of values of the parameter mc tested.

Author(s)

Yiming Qin

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
data(lowD)

mg.v <- seq(from=0.01*100, to=0.8*100, by=0.02*100)
mc.v <- seq(from = 0.01*2500, to = 0.5*2500, by = 0.01*2500)

result.cv <- PgaMsgl.cv(lowD$X, lowD$Y, lowD$B0, model="L121", lowD$Gm, lowD$mi, mg.v, mc.v, fold=5, seed=1, ncores=4)

grp.max.cv.rss <- result.cv$mg.v[which.min(result.cv$rss)]
coe.max.cv.rss <- result.cv$mc.v[which.min(result.cv$rss)]

grp.max.cv.rmse <- result.cv$mg.v[which.min(result.cv$RMSE)]
coe.max.cv.rmse <- result.cv$mc.v[which.min(result.cv$RMSE)]

grp.max.cv.r2 <- result.cv$mg.v[which.max(result.cv$Rsquare)]
coe.max.cv.r2 <- result.cv$mc.v[which.max(result.cv$Rsquare)]

OR: One can select hyperparameters following the one standard error rule.

index.minrss <- which.min(result.cv$rss)
grp.max.cv.rss <- result.cv$mg.v[ min( which( result.cv$rss < result.cv$rss[index.minrss] + result.cv$rss.se[index.minrss] ) ) ]
coe.max.cv.rss <- result.cv$mc.v[ min( which( result.cv$rss < result.cv$rss[index.minrss] + result.cv$rss.se[index.minrss] ) ) ]

index.minrmse <- which.min(result.cv$RMSE)
grp.max.cv.rmse <- result.cv$mg.v[ min( which( result.cv$RMSE <= result.cv$RMSE[index.minrmse] + result.cv$RMSE.se[index.minrmse] ) ) ]
coe.max.cv.rmse <- result.cv$mc.v[ min( which( result.cv$RMSE <= result.cv$RMSE[index.minrmse] + result.cv$RMSE.se[index.minrmse] ) ) ]

index.minr2 <- which.max(result.cv$Rsquare)
grp.max.cv.r2 <- result.cv$mg.v[ min( which( result.cv$Rsquare >= result.cv$Rsquare[index.minr2] - result.cv$Rsquare.se[index.minr2] ) ) ]
coe.max.cv.r2 <- result.cv$mc.v[ min( which( result.cv$Rsquare >= result.cv$Rsquare[index.minr2] - result.cv$Rsquare.se[index.minr2] ) ) ]

system.time(try1 <- PgaMsgl(lowD$X, lowD$Y, lowD$B0, model="L121", lowD$Gm, lowD$mi, 10, 120))
system.time(try2 <- PgaMsgl(lowD$X, lowD$Y, lowD$B0, model="L121", lowD$Gm, lowD$mi, grp.max.cv.rss, coe.max.cv.rss))
system.time(try3 <- PgaMsgl(lowD$X, lowD$Y, lowD$B0, model="L121", lowD$Gm, lowD$mi, grp.max.cv.rmse, coe.max.cv.rmse))
system.time(try4 <- PgaMsgl(lowD$X, lowD$Y, lowD$B0, model="L121", lowD$Gm, lowD$mi, grp.max.cv.r2, coe.max.cv.r2))

TriangularCell/PgaMsgl documentation built on May 28, 2019, 9:33 a.m.