Robust PLS for binary classification

Share:

Description

Robust PLS and discriminant analysis for binary classification problems. This method for dimension reduction and discriminant analysis yields a classification model with a partial least squares alike interpretability that is robust to both vertical outliers and leverage points.

Usage

1
2
3
4
5
prmda(formula, data, a, fun = "Hampel", probp1 = 0.95, hampelp2 = 0.975, 
hampelp3 = 0.999, probp4 = 0.01, yweights = TRUE, 
class = c("regfit", "lda"), prior = c(0.5, 0.5), 
center = "median", scale = "qn", 
numit = 100, prec = 0.01)

Arguments

formula

a formula, e.g. group ~ X1 + X2 with group a factor with two levels or a numeric vector coding class membership with 1 and -1 and X1,X2 numeric variables.

data

a data frame or list which contains the variables given in formula. The response specified in the formula needs to be a numeric vector coding the class membership with 1 and-1 or a vector of factors with two levels.

a

the number of PRM components to be estimated in the model.

fun

an internal weighting function for case weights. Choices are "Hampel" (preferred), "Huber" or "Fair".

probp1

a quantile close to 1 at which to set the first outlier cutoff for the weighting function.

hampelp2

a quantile close to 1 with probp1<hampelp2 for second cutoff. Only applies to fun="Hampel".

hampelp3

a quantile close to 1 with probp1<hampelp2<hampelp3 for third cutoff. Only applies to fun="Hampel".

probp4

a quantile close to zero for the cutoff for potentially wrong class labels (see Reference). Ignorred if yweights=FALSE.

yweights

logical; if TRUE weights are calculated for observations with potentially wrong class labels.

class

type of classification; choices are "regfit" or "lda" (see Detail). If "regfit" an object of class prm is returned.

prior

vector of length 2 with prior probabilities of the groups; only used if class="lda".

center

type of centering of the data in form of a string that matches an R function, e.g. "mean" or "median".

scale

type of scaling for the data in form of a string that matches an R function, e.g. "sd" or "qn" or alternatively "no" for no scaling.

numit

the number of maximal iterations for the convergence of the case weights.

prec

a value for the precision of the convergence of the case weights.

Details

For class="lda" a robust LDA model is estimated in the PRM score space for class="regfit" the model ist a robust PLS regression model on the binary response.

Value

prmda returns an object of class prmda.

Functions summary, predict and biplot are available. Also the generic functions coefficients, fitted.values and residuals can be used to extract the corresponding elements from the sprmda object.

scores

the matrix of scores.

R

Direction vectors (or weighting vectors or rotation matrix) to obtain the scores. scores=Xs%*%R.

loadings

the matrix of loadings.

w

the overall case weights used for robust dimenstion reduction and classification (depending on the weight function). w=sqrt(wy*wt).

wt

the group wise obtained case weights in the score space.

wy

the case weights for potentially mislabeled observations.

Results from LDA model:

ldamod

list with robust pooled within-group covariance (cov) and the two robust group centers (m1, m2) in the score space

ldafit

postirior probabilities from robust LDA in the score space.

ldaclass

predicted class labels from robust LDA in the score space.

Results from the regression model with binary response:

coefficients

vector of coefficients of the weighted regression model.

intercept

intercept of weighted regression model.

residuals

vector of residuals, true response minus estimated response.

fitted.values

the vector of estimated response values.

coefficients.scaled

vector of coefficients of the weighted regression model with scaled data.

intercept.scaled

intercept of weighted regression model with scaled data.

Data preprocessing:

YMeans

value used internally to center response.

XMean

vector used internally to center data.

Xscales

vector used internally to scale data.

Yscales

value used internally to scale response.

inputs

list of inputs: parameters, data and scaled data.

Author(s)

Irene Hoffmann and Sven Serneels

References

Hoffmann, I., Filzmoser, P., Serneels, S., Varmuza, K., Sparse and robust PLS for binary classification.

See Also

prmdaCV

Examples

1
2
3
data(iris)
data <- droplevels(subset(iris,iris$Species!="setosa"))
mod <- prmda(Species~.,data, a=2, class="lda")