sprmda: Sparse and robust PLS for binary classification

Description Usage Arguments Details Value Author(s) References See Also Examples

Description

This method for dimension reduction and discriminant analysis yields a sparse classification model with a partial least squares alike interpretability that is robust to both vertical outliers and leverage points.

Usage

1
2
3
4
sprmda(formula, data, a, eta, fun = "Hampel", probp1 = 0.95, hampelp2 = 0.975, 
hampelp3 = 0.999, probp4=0.01, yweights = TRUE, 
class = c("regfit", "lda"), prior = c(0.5, 0.5), center = "median", scale = "qn", 
print = FALSE, numit = 100, prec = 0.01)

Arguments

formula

a formula, e.g. group ~ X1 + X2 with group a factor with two levels or a numeric vector coding class membership with 1 and -1 and X1,X2 numeric variables.

data

a data frame or list which contains the variables given in formula. The response specified in the formula needs to be a numeric vector coding the class membership with 1 and-1 or a vector of factors with two levels.

a

the number of SPRM components to be estimated in the model.

eta

a tuning parameter for the sparsity with 0\le eta<1.

fun

an internal weighting function for case weights. Choices are "Hampel" (preferred), "Huber" or "Fair".

probp1

the 1-alpha value at which to set the first outlier cutoff for the weighting function.

hampelp2

the 1-alpha values for second cutoff. Only applies to fun="Hampel".

hampelp3

the 1-alpha values for third cutoff. Only applies to fun="Hampel".

probp4

a quantile close to zero for the cutoff for potentially wrong class labels (see Reference). Ignorred if yweights=FALSE.

yweights

logical; if TRUE weights are calculated for observations with potentially wrong class labels.

class

type of classification; choices are "regfit" or "lda". If "regfit" an object of class prm is returned.

prior

vector of length 2 with proir probabilities of the groups; only used if class="lda".

center

type of centering of the data in form of a string that matches an R function, e.g. "mean" or "median".

scale

type of scaling for the data in form of a string that matches an R function, e.g. "sd" or "qn" or alternatively "no" for no scaling.

print

logical, default is FALSE. If TRUE the variables included in each component are reported.

numit

the number of maximal iterations for the convergence of the coefficient estimates.

prec

a value for the precision of estimation of the coefficients.

Details

For class="lda" a robust LDA model is estimated in the SPRM score space for class="regfit" the model ist a robust sparse PLS regression model on the binary response.

Value

sprmda returns an object of class sprmda.

Functions summary, predict and biplot are available. Also the generic functions coefficients, fitted.values and residuals can be used to extract the corresponding elements from the sprmda object.

scores

the matrix of scores.

R

Direction vectors (or weighting vectors or rotation matrix) to obtain the scores. scores=Xs%*%R.

loadings

the matrix of loadings.

w

the overall case weights used for robust dimenstion reduction and classification (depending on the weight function). w=sqrt(wy*wt).

wt

the group wise obtained case weights in the score space.

wy

the case weights for potentially mislabeled observations.

used.vars

Indices of variables included in the model.

Yvar

percentage of contribution for each component to the explanation of the variance of the response.

Xvar

percentage of contribution for each component to the explanation of the variance of the variables.

Results from LDA model:

ldamod

list with robust pooled within-group covariance (cov) and the two robust group centers (m1, m2) in the score space

ldafit

postirior probabilities from robust LDA in the score space.

ldaclass

predicted class labels from robust LDA in the score space.

Results from the regression model with binary response:

coefficients

vector of coefficients of the weighted regression model.

intercept

intercept of weighted regression model.

residuals

vector of residuals, true response minus estimated response.

fitted.values

the vector of estimated response values.

coefficients.scaled

vector of coefficients of the weighted regression model with scaled data.

intercept.scaled

intercept of weighted regression model with scaled data.

Data preprocessing:

YMeans

value used internally to center response.

XMean

vector used internally to center data.

Xscales

vector used internally to scale data.

Yscales

value used internally to scale response.

inputs

list of inputs: parameters, data and scaled data.

Author(s)

Irene Hoffmann and Sven Serneels

References

Hoffmann, I., Filzmoser, P., Serneels, S., Varmuza, K., Sparse and robust PLS for binary classification.

See Also

sprmdaCV

Examples

1
2
3
data(iris)
data <- droplevels(subset(iris,iris$Species!="setosa"))
smod <- sprmda(Species~.,data, a=2, eta=0.7, class="lda")

sprm documentation built on May 2, 2019, 9:57 a.m.