Use bivariate winsorization to estimate the smallest value of the penalty parameter for sparse least trimmed squares regression that sets all coefficients to zero.
1 2 3 
x 
a numeric matrix containing the predictor variables. 
y 
a numeric vector containing the response variable. 
normalize 
a logical indicating whether the winsorized predictor
variables should be normalized to have unit L2 norm (the
default is 
intercept 
a logical indicating whether a constant term should be
included in the model (the default is 
const 
numeric; tuning constant to be used in univariate winsorization (defaults to 2). 
prob 
numeric; probability for the quantile of the chisquared distribution to be used in bivariate winsorization (defaults to 0.95). 
tol 
a small positive numeric value used to determine singularity
issues in the computation of correlation estimates for bivariate
winsorization (see 
eps 
a small positive numeric value used to determine whether the robust scale estimate of a variable is too small (an effective zero). 
... 
additional arguments to be passed to

The estimation procedure is inspired by the calculation of the respective penalty parameter in the first step of the classical LARS algorithm. First, twodimensional data blocks consisting of the response with each predictor variable are cleaned via bivariate winsorization. For each block, the following computations are then performed. If an intercept is included in the model, the cleaned response is centered and the corresponding cleaned predictor is centered and scaled to have unit norm. Otherwise the variables are not centered, but the predictor is scaled to have unit norm. Finally, the dot product of the response and the corresponding predictor is computed. The largest absolute value of those dot products, rescaled to fit the parametrization of the sparse LTS definition, yields the estimate of the smallest penalty parameter that sets all coefficients to zero.
A robust estimate of the smallest value of the penalty parameter for sparse LTS regression that sets all coefficients to zero.
Andreas Alfons
Alfons, A., Croux, C. and Gelper, S. (2013) Sparse least trimmed squares regression for analyzing highdimensional large data sets. The Annals of Applied Statistics, 7(1), 226–248.
Efron, B., Hastie, T., Johnstone, I. and Tibshirani, R. (2004) Least angle regression. The Annals of Statistics, 32(2), 407–499.
Khan, J.A., Van Aelst, S. and Zamar, R.H. (2007) Robust linear model selection based on least angle regression. Journal of the American Statistical Association, 102(480), 1289–1299.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20  ## generate data
# example is not highdimensional to keep computation time low
library("mvtnorm")
set.seed(1234) # for reproducibility
n < 100 # number of observations
p < 25 # number of variables
beta < rep.int(c(1, 0), c(5, p5)) # coefficients
sigma < 0.5 # controls signaltonoise ratio
epsilon < 0.1 # contamination level
Sigma < 0.5^t(sapply(1:p, function(i, j) abs(ij), 1:p))
x < rmvnorm(n, sigma=Sigma) # predictor matrix
e < rnorm(n) # error terms
i < 1:ceiling(epsilon*n) # observations to be contaminated
e[i] < e[i] + 5 # vertical outliers
y < c(x %*% beta + sigma * e) # response
x[i,] < x[i,] + 5 # bad leverage points
## estimate smallest value of the penalty parameter
## that sets all coefficients to 0
lambda0(x, y)

Questions? Problems? Suggestions? Tweet to @rdrrHQ or email at ian@mutexlabs.com.
Please suggest features or report bugs with the GitHub issue tracker.
All documentation is copyright its authors; we didn't write any of that.