sprmdaCV: Cross validation method for sparse PRM classification models.
In sprm: Sparse and Non-Sparse Partial Robust M Regression and Classification

Description Usage Arguments Details Value Author(s) References See Also Examples

k-fold cross validation for the selection of the number of components for sparse PRM classification.

sprmdaCV(formula, data, as, etas, nfold = 10, fun = "Hampel", 
probp1 = 0.95, hampelp2 = 0.975, hampelp3 = 0.999, probp4=0.01, yweights = TRUE,
class = c("regfit", "lda"), prior = c(0.5, 0.5), center = "median", scale = "qn", 
print = FALSE, plot = TRUE, numit = 100, prec = 0.01)

`formula`	a formula, e.g. group ~ X1 + X2 with group a factor with two levels or a numeric vector coding class membership with 1 and -1 and X1,X2 numeric variables.
`data`	a data frame or list which contains the variables given in formula. The response specified in the formula needs to be a numeric vector coding the class membership with 1 and-1 or a vector of factors with two levels.
`as`	a vector with positive integers, which are the number of SPRM components to be estimated in the models.
`etas`	vector of values for the tuning parameter for the sparsity. Values have to between 0 and 1.
`nfold`	the number of folds used for cross validation, default is `nford=10` for 10-fold CV.
`fun`	an internal weighting function for case weights. Choices are `"Hampel"` (preferred), `"Huber"` or `"Fair"`.
`probp1`	the 1-alpha value at which to set the first outlier cutoff for the weighting function.
`hampelp2`	the 1-alpha values for second cutoff. Only applies to `fun="Hampel"`.
`hampelp3`	the 1-alpha values for third cutoff. Only applies to `fun="Hampel"`.
`probp4`	a quantile close to zero for the cutoff for potentially wrong class labels (see Reference). Ignorred if `yweights=FALSE`.
`yweights`	logical; if TRUE weights are calculated for observations with potentially wrong class labels.
`class`	type of classification; choices are "regfit" or "lda". If "regfit" an object of class prm is returned.
`prior`	vector of length 2 with proir probabilities of the groups; only used if class="lda".
`center`	type of centering of the data in form of a string that matches an R function, e.g. "mean" or "median".
`scale`	type of scaling for the data in form of a string that matches an R function, e.g. "sd" or "qn" or alternatively "no" for no scaling.
`print`	logical, default is `FALSE`. If `TRUE` the variables included in each component are reported.
`plot`	logical, default is `TRUE`. If `TRUE` two contour plots are generated for number of components and sparsity parameter. The first contour plot shows the mean weighted misclassification rate (see Details) the second the number of variables in the model.
`numit`	the number of maximal iterations for the convergence of the coefficient estimates.
`prec`	a value for the precision of estimation of the coefficients.

The robust cross validation creterion is a weighted misclassification rate. Class assignment of outliers is unreliable. Therefore, the case weights from the model are used to downweight the influence observations which were detected as outliers on the misclassification rate.

There may occur combinations of "a" and "eta" where the model cannot be estimated. Then the function issues a warning "CV broke off at "a" and "eta"".

`opt.mod`	object of class sprmda with the selected parameters. (see `sprms`)
`pcm`	array with predicted class membership of each observation and for each combination of tuning parameters
`nzcoef`	array with the number of variables in the model for each cross validation subset and each combination of tuning parameters

Irene Hoffmann

Hoffmann, I., Filzmoser, P., Serneels, S., Varmuza, K., Sparse and robust PLS for binary classification.

sprmda, biplot.sprmda, predict.sprmda, prmdaCV

data(iris)
data <- droplevels(subset(iris,iris$Species!="setosa"))
## for demonstration with only two values in etas
smod <- sprmdaCV(Species~.,data, as=2:3, etas=c(0.1,0.9), nfold=5, 
                 class="lda", numit=10, prec=0.1)
biplot(smod$opt.mod)
## Not run: 
## in practis a finer grid of as and etas should be searched 
## at the expence of computation time
smod <- sprmdaCV(Species~.,data, as=1:4, etas=seq(0.1,0.9,0.1), nfold=5, 
                 class="lda", numit=10, prec=0.1)

## End(Not run)