Sparse least trimmed squares regression
Description
Compute least trimmed squares regression with an L1 penalty on the regression coefficients, which allows for sparse model estimates.
Usage
1 2 3 4 5 6 7 8 9 10 11 12 13 14  sparseLTS(x, ...)
## S3 method for class 'formula'
sparseLTS(formula, data, ...)
## Default S3 method:
sparseLTS(x, y, lambda, mode = c("lambda", "fraction"),
alpha = 0.75, normalize = TRUE, intercept = TRUE, nsamp = c(500, 10),
initial = c("sparse", "hyperplane", "random"), ncstep = 2,
use.correction = TRUE, tol = .Machine$double.eps^0.5,
eps = .Machine$double.eps, use.Gram, crit = c("BIC", "PE"),
splits = foldControl(), cost = rtmspe, costArgs = list(),
selectBest = c("hastie", "min"), seFactor = 1, ncores = 1, cl = NULL,
seed = NULL, model = TRUE, ...)

Arguments
x 
a numeric matrix containing the predictor variables. 
formula 
a formula describing the model. 
data 
an optional data frame, list or environment (or object coercible
to a data frame by 
y 
a numeric vector containing the response variable. 
lambda 
a numeric vector of nonnegative values to be used as penalty parameter. 
mode 
a character string specifying the type of penalty parameter. If

alpha 
a numeric value giving the percentage of the residuals for which the L1 penalized sum of squares should be minimized (the default is 0.75). 
normalize 
a logical indicating whether the predictor variables
should be normalized to have unit L2 norm (the default is

intercept 
a logical indicating whether a constant term should be
included in the model (the default is 
nsamp 
a numeric vector giving the number of subsamples to be used in
the two phases of the algorithm. The first element gives the number of
initial subsamples to be used. The second element gives the number of
subsamples to keep after the first phase of 
initial 
a character string specifying the type of initial subsamples
to be used. If 
ncstep 
a positive integer giving the number of Csteps to perform on all subsamples in the first phase of the algorithm (the default is to perform two Csteps). 
use.correction 
currently ignored. Small sample correction factors may be added in the future. 
tol 
a small positive numeric value giving the tolerance for convergence. 
eps 
a small positive numeric value used to determine whether the variability within a variable is too small (an effective zero). 
use.Gram 
a logical indicating whether the Gram matrix of the
explanatory variables should be precomputed in the lasso fits on the
subsamples. If the number of variables is large, computation may be faster
when this is set to 
crit 
a character string specifying the optimality criterion to be
used for selecting the final model. Possible values are 
splits 
an object giving data splits to be used for prediction error
estimation (see 
cost 
a cost function measuring prediction loss (see

costArgs 
a list of additional arguments to be passed to the
prediction loss function 
selectBest,seFactor 
arguments specifying a criterion for selecting
the best model (see 
ncores 
a positive integer giving the number of processor cores to be
used for parallel computing (the default is 1 for no parallelization). If
this is set to 
cl 
a parallel cluster for parallel computing as generated by

seed 
optional initial seed for the random number generator (see

model 
a logical indicating whether the data 
... 
additional arguments to be passed down. 
Value
If crit
is "PE"
, an object of class "perrySparseLTS"
(inheriting from class "perryTuning"
, see
perryTuning
). It contains information on the
prediction error criterion, and includes the final model with the optimal
tuning paramter as component finalModel
.
Otherwise an object of class "sparseLTS"
with the following
components:
lambda 
a numeric vector giving the values of the penalty parameter. 
best 
an integer vector or matrix containing the respective best subsets of h observations found and used for computing the raw estimates. 
objective 
a numeric vector giving the respective values of the sparse LTS objective function, i.e., the L1 penalized sums of the h smallest squared residuals from the raw fits. 
coefficients 
a numeric vector or matrix containing the respective coefficient estimates from the reweighted fits. 
fitted.values 
a numeric vector or matrix containing the respective fitted values of the response from the reweighted fits. 
residuals 
a numeric vector or matrix containing the respective residuals from the reweighted fits. 
center 
a numeric vector giving the robust center estimates of the corresponding reweighted residuals. 
scale 
a numeric vector giving the robust scale estimates of the corresponding reweighted residuals. 
cnp2 
a numeric vector giving the respective consistency factors applied to the scale estimates of the reweighted residuals. 
wt 
an integer vector or matrix containing binary weights that indicate outliers from the respective reweighted fits, i.e., the weights are 1 for observations with reasonably small reweighted residuals and 0 for observations with large reweighted residuals. 
df 
an integer vector giving the respective degrees of freedom of the obtained reweighted model fits, i.e., the number of nonzero coefficient estimates. 
intercept 
a logical indicating whether the model includes a constant term. 
alpha 
a numeric value giving the percentage of the residuals for which the L1 penalized sum of squares was minimized. 
quan 
the number h of observations used to compute the raw estimates. 
raw.coefficients 
a numeric vector or matrix containing the respective coefficient estimates from the raw fits. 
raw.fitted.values 
a numeric vector or matrix containing the respective fitted values of the response from the raw fits. 
raw.residuals 
a numeric vector or matrix containing the respective residuals from the raw fits. 
raw.center 
a numeric vector giving the robust center estimates of the corresponding raw residuals. 
raw.scale 
a numeric vector giving the robust scale estimates of the corresponding raw residuals. 
raw.cnp2 
a numeric value giving the consistency factor applied to the scale estimate of the raw residuals. 
raw.wt 
an integer vector or matrix containing binary weights that indicate outliers from the respective raw fits, i.e., the weights used for the reweighted fits. 
crit 
an object of class 
x 
the predictor matrix (if 
y 
the response variable (if 
call 
the matched function call. 
Note
Package robustHD has a builtin back end for sparse least trimmed squares using the C++ library Armadillo. Another back end is available through package sparseLTSEigen, which uses the C++ library Eigen. The latter is faster, currently does not work on 32bit R for Windows.
For both C++ back ends, parallel computing is implemented via OpenMP (http://openmp.org/).
Author(s)
Andreas Alfons
References
Alfons, A., Croux, C. and Gelper, S. (2013) Sparse least trimmed squares regression for analyzing highdimensional large data sets. The Annals of Applied Statistics, 7(1), 226–248.
See Also
coef
,
fitted
,
plot
,
predict
,
residuals
,
wt
, ltsReg
Examples
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23  ## generate data
# example is not highdimensional to keep computation time low
library("mvtnorm")
set.seed(1234) # for reproducibility
n < 100 # number of observations
p < 25 # number of variables
beta < rep.int(c(1, 0), c(5, p5)) # coefficients
sigma < 0.5 # controls signaltonoise ratio
epsilon < 0.1 # contamination level
Sigma < 0.5^t(sapply(1:p, function(i, j) abs(ij), 1:p))
x < rmvnorm(n, sigma=Sigma) # predictor matrix
e < rnorm(n) # error terms
i < 1:ceiling(epsilon*n) # observations to be contaminated
e[i] < e[i] + 5 # vertical outliers
y < c(x %*% beta + sigma * e) # response
x[i,] < x[i,] + 5 # bad leverage points
## fit sparse LTS model for one value of lambda
sparseLTS(x, y, lambda = 0.05, mode = "fraction")
## fit sparse LTS models over a grid of values for lambda
frac < seq(0.2, 0.05, by = 0.05)
sparseLTS(x, y, lambda = frac, mode = "fraction")
