prmdaCV: Cross validation method for PRM classification models.

Description Usage Arguments Details Value Author(s) References See Also Examples

Description

k-fold cross validation for the selection of the number of components for PRM classification.

Usage

1
2
3
4
prmdaCV(formula, data, as, nfold = 10, fun = "Hampel", probp1 = 0.95, hampelp2 = 0.975,
hampelp3 = 0.999, probp4 = 0.01, yweights = TRUE, 
class = c("regfit", "lda"), prior = c(0.5, 0.5), center = "median", scale = "qn", 
plot = TRUE, numit = 100, prec = 0.01)

Arguments

formula

a formula, e.g. group ~ X1 + X2 with group a factor with two levels and X1,X2 numeric variables.

data

a data frame or list which contains the variables given in formula. The response specified in the formula needs to be a numeric vector coding the class membership with 1 and-1 or a vector of factors with two levels.

as

a vector with positive integers, which are the number of PRM components to be estimated in the models.

nfold

the number of folds used for cross validation, default is nford=10 for 10-fold CV.

fun

an internal weighting function for case weights. Choices are "Hampel" (preferred), "Huber" or "Fair".

probp1

the 1-alpha value at which to set the first outlier cutoff for the weighting function.

hampelp2

the 1-alpha values for second cutoff. Only applies to fun="Hampel".

hampelp3

the 1-alpha values for third cutoff. Only applies to fun="Hampel".

probp4

a quantile close to zero for the cutoff for potentially wrong class labels (see Reference). Ignorred if yweights=FALSE.

yweights

logical; if TRUE weights are calculated for observations with potentially wrong class labels.

class

type of classification; choices are "regfit" or "lda". If "regfit" an object of class prm is returned.

prior

vector of length 2 with proir probabilities of the groups; only used if class="lda".

center

type of centering of the data in form of a string that matches an R function, e.g. "mean" or "median".

scale

type of scaling for the data in form of a string that matches an R function, e.g. "sd" or "qn" or alternatively "no" for no scaling.

plot

logical, default is TRUE. If TRUE a plot is generated with a mean weighted misclassification rate for each model (see Details).

numit

the number of maximal iterations for the convergence of the coefficient estimates.

prec

a value for the precision of estimation of the coefficients.

Details

The robust cross validation creterion is a weighted misclassification rate. Class assignment of outliers is unreliable. Therefore, the case weights from the model are used to downweight the influence observations which were detected as outliers on the misclassification rate.

Value

opt.mod

object of class prmda. (see prmda)

pcm

matrix with predicted class membership for each observation and each number of components.

Author(s)

Irene Hoffmann

References

Hoffmann, I., Filzmoser, P., Serneels, S., Varmuza, K., Sparse and robust PLS for binary classification.

See Also

prmda, biplot.prmda, predict.prmda, sprmdaCV

Examples

1
2
3
4
data(iris)
data <- droplevels(subset(iris,iris$Species!="setosa"))
mod <- prmdaCV(Species~.,data, as=1:2, class="lda", numit=10, prec=0.1)
biplot(mod$opt.mod)

sprm documentation built on May 2, 2019, 9:57 a.m.