(Robust) groupwise least angle regression
Description
(Robustly) sequence groups of candidate predictors according to their predictive content and find the optimal model along the sequence.
Usage
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29  grplars(x, ...)
## S3 method for class 'formula'
grplars(formula, data, ...)
## S3 method for class 'data.frame'
grplars(x, y, ...)
## Default S3 method:
grplars(x, y, sMax = NA, assign, fit = TRUE, s = c(0,
sMax), crit = c("BIC", "PE"), splits = foldControl(), cost = rmspe,
costArgs = list(), selectBest = c("hastie", "min"), seFactor = 1,
ncores = 1, cl = NULL, seed = NULL, model = TRUE, ...)
rgrplars(x, ...)
## S3 method for class 'formula'
rgrplars(formula, data, ...)
## S3 method for class 'data.frame'
rgrplars(x, y, ...)
## Default S3 method:
rgrplars(x, y, sMax = NA, assign, centerFun = median,
scaleFun = mad, regFun = lmrob, regArgs = list(), combine = c("min",
"euclidean", "mahalanobis"), const = 2, prob = 0.95, fit = TRUE,
s = c(0, sMax), crit = c("BIC", "PE"), splits = foldControl(),
cost = rtmspe, costArgs = list(), selectBest = c("hastie", "min"),
seFactor = 1, ncores = 1, cl = NULL, seed = NULL, model = TRUE, ...)

Arguments
x 
a matrix or data frame containing the candidate predictors. 
formula 
a formula describing the full model. 
data 
an optional data frame, list or environment (or object coercible
to a data frame by 
y 
a numeric vector containing the response. 
sMax 
an integer giving the number of predictor groups to be
sequenced. If it is 
assign 
an integer vector giving the predictor group to which each predictor variable belongs. 
fit 
a logical indicating whether to fit submodels along the sequence
( 
s 
an integer vector of length two giving the first and last
step along the sequence for which to compute submodels. The default
is to start with a model containing only an intercept (step 0) and
iteratively add all groups along the sequence (step 
crit 
a character string specifying the optimality criterion to be
used for selecting the final model. Possible values are 
splits 
an object giving data splits to be used for prediction error
estimation (see 
cost 
a cost function measuring prediction loss (see

costArgs 
a list of additional arguments to be passed to the
prediction loss function 
selectBest,seFactor 
arguments specifying a criterion for selecting
the best model (see 
ncores 
a positive integer giving the number of processor cores to be
used for parallel computing (the default is 1 for no parallelization). If
this is set to 
cl 
a parallel cluster for parallel computing as generated by

seed 
optional initial seed for the random number generator (see

model 
a logical indicating whether the model data should be included in the returned object. 
centerFun 
a function to compute a robust estimate for the center
(defaults to 
scaleFun 
a function to compute a robust estimate for the scale
(defaults to 
regFun 
a function to compute robust linear regressions that can be
interpreted as weighted least squares (defaults to

regArgs 
a list of arguments to be passed to 
combine 
a character string specifying how to combine the data
cleaning weights from the robust regressions with each predictor group.
Possible values are 
const 
numeric; tuning constant for multivariate winsorization to be used in the initial corralation estimates based on adjusted univariate winsorization (defaults to 2). 
prob 
numeric; probability for the quantile of the chisquared distribution to be used in multivariate winsorization (defaults to 0.95). 
... 
additional arguments to be passed down. 
Value
If fit
is FALSE
, an integer vector containing the indices of
the sequenced predictor groups.
Else if crit
is "PE"
, an object of class
"perrySeqModel"
(inheriting from classes "perryTuning"
,
see perryTuning
). It contains information on the
prediction error criterion, and includes the final model as component
finalModel
.
Otherwise an object of class "grplars"
(inheriting from class
"seqModel"
) with the following components:
active 
an integer vector containing the sequence of predictor groups. 
s 
an integer vector containing the steps for which submodels along the sequence have been computed. 
coefficients 
a numeric matrix in which each column contains the regression coefficients of the corresponding submodel along the sequence. 
fitted.values 
a numeric matrix in which each column contains the fitted values of the corresponding submodel along the sequence. 
residuals 
a numeric matrix in which each column contains the residuals of the corresponding submodel along the sequence. 
df 
an integer vector containing the degrees of freedom of the submodels along the sequence (i.e., the number of estimated coefficients). 
robust 
a logical indicating whether a robust fit was computed. 
scale 
a numeric vector giving the robust residual scale estimates for the submodels along the sequence (only returned for a robust fit). 
crit 
an object of class 
muX 
a numeric vector containing the center estimates of the predictor variables. 
sigmaX 
a numeric vector containing the scale estimates of the predictor variables. 
muY 
numeric; the center estimate of the response. 
sigmaY 
numeric; the scale estimate of the response. 
x 
the matrix of candidate predictors (if 
y 
the response (if 
assign 
an integer vector giving the predictor group to which each predictor variable belongs. 
w 
a numeric vector giving the data cleaning weights (only returned for a robust fit). 
call 
the matched function call. 
Author(s)
Andreas Alfons
See Also
coef
,
fitted
,
plot
,
predict
,
residuals
,
lmrob
Examples
1 2 3 4 5 6 7 8 9 10 11 12  data("TopGear")
# keep complete observations
keep < complete.cases(TopGear)
TopGear < TopGear[keep, ]
# remove information on car model
info < TopGear[, 1:3]
TopGear < TopGear[, (1:3)]
# logtransform price
TopGear$Price < log(TopGear$Price)
# robust groupwise LARS
rgrplars(MPG ~ ., data = TopGear, sMax = 15)
