Sparse partial robust M regression

Description

Sparse partial robust M regression for models with univariate response. This method for dimension reduction and regression analysis yields estimates with a partial least squares alike interpretability that are both sparse and robust to both vertical outliers and leverage points. The sparsity is tuned with an L1 penalty.

Usage

1
2
3
sprms(formula, data, a, eta, fun = "Hampel", probp1 = 0.95, hampelp2 = 0.975,
hampelp3 = 0.999, center = "median", scale = "qn", print = FALSE, 
numit = 100, prec = 0.01)

Arguments

formula

an object of class formula.

data

a data frame which contains the variables given in formula or a list of two elements, where the first element is the response vector and the second element is a matrix of the explanatory variables.

a

the number of SPRMS components to be estimated in the model.

eta

a tuning parameter for the sparsity with 0\le eta<1.

fun

an internal weighting function for case weights. Choices are "Hampel" (preferred), "Huber" or "Fair".

probp1

the 1-alpha value at which to set the first outlier cutoff for the weighting function.

hampelp2

the 1-alpha values for second cutoff. Only applies to fun="Hampel".

hampelp3

the 1-alpha values for third cutoff. Only applies to fun="Hampel".

center

type of centering of the data in form of a string that matches an R function, e.g. "mean" or "median".

scale

type of scaling for the data in form of a string that matches an R function, e.g. "sd" or "qn" or alternatively "no" for no scaling.

print

logical, default is FALSE. If TRUE the variables included in each component are reported.

numit

the maximum number of iterations for the convergence of the coefficient estimates.

prec

a value for the precision of estimation of the coefficients.

Details

The NIPLS algorithm with a L1 sparsity constrained combined with weighted regression is used for the model estimation.

a is the number of components in the model. Note that it is not possible to simply reduce the number of weighting vectors to obtain a model with a smaller number of components. Each model has to be estimated separately due to its dependence on robust case weights.

Value

sprms returns an object of class sprm.

Functions summary, predict and plot are available. Also the generic functions coefficients, fitted.values and residuals can be used to extract the corresponding elements from the sprm object.

coefficients

vector of coefficients of the weighted regression model.

intercept

intercept of weighted regression model.

wy

the case weights in the y space.

wt

the case weights in the score space.

w

the overall case weights used for weighted regression (depending on the weight function). w=wy*wt.

scores

the matrix of scores.

R

Direction vectors (or weighting vectors or rotation matrix) to obtain the scores. scores=Xs%*%R.

loadings

the matrix of loadings.

fitted.values

the vector of estimated response values.

residuals

vector of residuals, true response minus estimated response.

coefficients.scaled

vector of coefficients of the weighted regression model with scaled data.

intercept.scaled

intercept of weighted regression model with scaled data.

YMeans

value used internally to center response.

XMean

vector used internally to center data.

Xscales

vector used internally to scale data.

Yscales

value used internally to scale response.

Yvar

percentage of contribution for each component to the explanation of the variance of the response.

Xvar

percentage of contribution for each component to the explanation of the variance of the variables.

inputs

list of inputs: parameters, data and scaled data.

used.vars

Indices of variables included in the model.

Author(s)

Sven Serneels, BASF Corp and Irene Hoffmann

References

Hoffmann, I., Serneels, S., Filzmoser, P., Croux, C. (2015). Sparse partial robust M regression. Chemometrics and Intelligent Laboratory Systems, 149, 50-59.

Serneels, S., Croux, C., Filzmoser, P., Van Espen, P.J. (2005). Partial Robust M-Regression. Chemometrics and Intelligent Laboratory Systems, 79, 55-64.

See Also

sprmsCV, plot.sprm, biplot.sprm, predict.sprm, prms

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
set.seed(50235)
U1 <- c(rep(3,20), rep(4,30))
U2 <- rep(3.5,50)
X1 <- replicate(5, U1+rnorm(50))
X2 <- replicate(20, U2+rnorm(50))
X <- cbind(X1,X2)
beta <- c(rep(1, 5), rep(0,20))
e <- c(rnorm(45,0,1.5),rnorm(5,-20,1))
y <- X%*%beta + e
d <- as.data.frame(X)
d$y <- y
mod <- sprms(y~., data=d, a=1, eta=0.5, fun="Hampel")
sprmfit <- predict(mod)

plot(y,sprmfit, main="SPRM")
abline(0,1)