sprmsCV: Cross validation method for SPRM regression models.

Description Usage Arguments Details Value Author(s) References See Also Examples

Description

k-fold cross validation for the selection of the number of components and the sparsity parameter for sparse partial robust M regression.

Usage

1
2
3
sprmsCV(formula, data, as, etas, nfold = 10, fun = "Hampel", probp1 = 0.95, 
hampelp2 = 0.975, hampelp3 = 0.999, center = "median", scale = "qn", 
plot = TRUE, numit = 100, prec = 0.01, alpha = 0.15)

Arguments

formula

an object of class formula.

data

a data frame or list which contains the variables given in formula.

as

a vector with positive integers, which are the number of SPRM components to be estimated in the models.

etas

vector of values for the tuning parameter for the sparsity. Values have to between 0 and 1.

nfold

the number of folds used for cross validation, default is nford=10 for 10-fold CV.

fun

an internal weighting function for case weights. Choices are "Hampel" (preferred), "Huber" or "Fair".

probp1

the 1-alpha value at which to set the first outlier cutoff for the weighting function.

hampelp2

the 1-alpha values for second cutoff. Only applies to fun="Hampel".

hampelp3

the 1-alpha values for third cutoff. Only applies to fun="Hampel".

center

type of centering of the data in form of a string that matches an R function, e.g. "mean" or "median".

scale

type of scaling for the data in form of a string that matches an R function, e.g. "sd" or "qn" or alternatively "no" for no scaling.

plot

logical, default is TRUE. If TRUE two contour plots are generated for number of components and sparsity parameter. The first contour plot shows the trimmed mean squared error of the prediction of the response (see Details) the second the number of variables in the model.

numit

the number of maximal iterations for the convergence of the coefficient estimates.

prec

a value for the precision of estimation of the coefficients.

alpha

value used for alpha trimmed mean squared error, which is the cross validation criterion (see Details).

Details

The alpha - trimmed mean squared error of the predicted response over all observations is used as robust decision criterion to choose the optimal model.

There may occur combinations of "a" and "eta" where the model cannot be estimated. Then the function issues a warning "CV broke off at "a" and "eta"".

Value

opt.mod

object of class sprm with the selected parameters. (see sprms)

spe

array with squared prediction error for each observation and each combination of tuning parameters

nzcoef

array with the number of variables in the model for each cross validation subset and each combination of tuning parameters

Author(s)

Irene Hoffmann

References

Hoffmann, I., Serneels, S., Filzmoser, P., Croux, C. (2015). Sparse partial robust M regression. Chemometrics and Intelligent Laboratory Systems, 149, 50-59.

Serneels, S., Croux, C., Filzmoser, P., Van Espen, P.J. (2005). Partial Robust M-Regression. Chemometrics and Intelligent Laboratory Systems, 79, 55-64.

See Also

sprms, plot.sprm, predict.sprm, prmsCV

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
set.seed(50235)
U1 <- c(rep(3,20), rep(4,30))
U2 <- rep(3.5,50)
X1 <- replicate(5, U1+rnorm(50))
X2 <- replicate(20, U2+rnorm(50))
X <- cbind(X1,X2)
beta <- c(rep(1, 5), rep(0,20))
e <- c(rnorm(45,0,1.5),rnorm(5,-20,1))
y <- X%*%beta + e
d <- as.data.frame(X)
d$y <- y
res <- sprmsCV(y~., data=d, as=1:2, etas=seq(0,0.9,0.2), nfold=5, fun="Hampel", prec=0.1)
summary(res$opt.mod)

sprm documentation built on May 2, 2019, 9:57 a.m.