lae: Lasso Averaging Estimation

Description Usage Arguments Details Value Author(s) References See Also Examples

Description

Lasso (least absolute shrinkage and selection operator) estimation is performed and evaluated for different tuning parameter choices. To address tuning parameter selection uncertainty a weighted average of these estimators is calculated. The weight vector is chosen such that a k-fold cross validation criterion is minimized.

Usage

1
2
3
lae(X, ycol = 1, kfold = 10, B.var = 100, calc.variance = FALSE,
 factor.variables = NULL, glm.family = "gaussian", tries = 10,
 standardize = TRUE, random = FALSE, pd = TRUE, ...)

Arguments

X

A dataframe or matrix containing the data to be analyzed.

ycol

An integer or string specifying the column of the outcome variable. The outcome for glm.family="cox" should be a two-column matrix with columns named 'time' and 'status'. The latter is a binary variable, with '1' indicating event, and '0' indicating right censored.

kfold

An integer specifying the kfold cross validation criterion to (i) use for tuning parameter selection (ii) be minimized for Lasso averaging estimation.

B.var

An integer specifying the number of bootstrap replications to be used to estimate the standard error of the Lasso estimator.

calc.variance

A logical value specifying whether the standard error of the estimates should be estimated at all (by means of bootstrapping). See also details below.

factor.variables

A (vector of) string(s) specifying which variables should be treated as factors, i.e. recoded into dummy variables. Factor variables will automatically be recoded if not specified here.

glm.family

A character vector specifying one of the following families: "gaussian", "binomial", "poisson".

tries

An integer for the number of tries in case lae fails; can be relevant if sub-datasets used during cross-validation lead to failure of lasso averaging. In this case, the sub-datasets are randomly re-defined.

standardize

A logical value speciying whether the covariate data should be standardized.

random

A logical value specifying whether creation of datasets for cross validation should be random or not.

pd

A logical value specifying whether messages should be printed or not.

...

Other arguments to be passed, i.e. to cv.glmnet. For example, pass alpha=0 for the Ridge estimator and alpha smaller than 1 for Elastic Net.

Details

Note that the candidate tuning parameters are selected automatically (by cv.glmnet). The bootstrap standard error for LASSO does not assume a fixed tuning parameter, i.e. tuning parameter selection is done seperately in each bootstrap sample. Lasso averaging works on standard errors related to each tuning parameter, but the variance between the different weighted estimates is taken into account. The importance measure based on the averaging weights could be interpreted as the importance of variables with respect to their predictive ability.

Value

Returns an object of class ‘lae’:

coefficients

A matrix of coefficients and standard errors for Lasso averaging, Lasso selection, and OLS estimation.

variable.importance

A matrix containing the relative importance of each variable based on model averaging weights.

sae.weights

A vector containing the weights used for Lasso averaging.

sel.weights

A vector indicating the complexity parameter that was chosen for Lasso estimation based on k-fold cross validation.

complexity.parameter

A vector of the actual complexity parameter values used as candidate values for Lasso Averaging Estimation.

setup

A list of length two containing the data matrix and model family.

Author(s)

Michael Schomaker

References

Schomaker, M. (2012) Shrinkage Averaging Estimation, Statistical Papers, 53:1015-1034

See Also

plot.lae for visualizing the estimation results.

Examples

1
2
3
data(Prostate)
lae(Prostate, ycol="lpsa")
lae(Prostate, ycol="lpsa", factor.variables="gleason",calc.variance=TRUE, kfold=5)

MAMI documentation built on May 6, 2019, 3:02 p.m.