Description Usage Arguments Details Value Author(s) References Examples
Provides p-values for lasso regression. This method implements the multi-sample splitting method for significance testing in a high-dimensional regression context. The basic idea is to split a sample in two, perform variable selection using LASSO on one half and derive p-values using ordinary least squares (OLS) on the other half.
Note that the penalisation parameter λ and s are identical - they are named this way for consistency with the package glmnet.
This method is only implemented for a single response variable, since general lasso regression requires the same set of parameters to be selected for every response variable, which is overly restrictive in some cases.
1 2 3 |
y |
Response vector |
X |
Design matrix |
B |
Number of times to partition the sample |
s |
The value of lambda to use in lasso. Can be:
|
include |
Set of predictors to be force-included in OLS analysis |
gamma.min |
Lower bound for gamma in the adaptive search for the best p-value (default 0.05) |
fixedP |
The fixed number of parameters to use (if |
nfolds |
Number of folds of cross-validation in the glmnet n-fold crossvalidation |
intercept |
Whether to include an intercept in the OLS regression (default = |
The method works by partitioning the dataset randomly in two halves. Lasso regression is
performed on one half, and using a particular value of the penalisation parameter lambda
then a subset of the predictor variables are chosen. Ordinary least squares regression is
then performed on the other half of the data. If S variables are chosen for a given
split, then the p-values are Bonferroni moderated to S.p. The p-values of all variables
not selected for a given split are then set to 1. This process is repeated B
times, and
subsequently B
sets of p-values are generated. These p-values are then aggregated across
splits to provide a given p-value for each predictor variable. For full details see the
original paper. The aggregation requires an extra parameter γ_min, which is recommended
to be 0.05 (and set by the parameter gamma.min
).
Care must be taken with regards to the number of measurements (length(y)
) and
the number of folds for cross-validation (nfolds
). The package glmnet
requres at least 3 samples in a cross-validation split in finding the optimal
λ (not the same as a multi-sample split). Therefore, if we start
with N samples, glmnet
receives at least floor(N/2) which it then splits
in to nfolds
for cross validation. As such we necessarily need
floor((floor(N/2))/nfolds) > 3
which is safely satisfied provided N > 6*nfolds + 3
When choosing B
, there is a trade-off between bias and efficiency. A larger B
will lead to a less biased result (i.e. less sensitive to the random sampling of folds) but
can require significantly more computation time. A heuristically 'good' value is B=50
.
The choice of λ = s
is detailed in the glmnet
package. For standard
problems the best choice may be lambda.min
, though if you are specifically trying to
minimise the number of parameters necessary, lambda.1se
(a one, not an L) is a good choice.
Alternatively it may be advantageous to select a fixed number of parameters on every split. This
can be performed by setting s="usefixed"
and fixedP
to the desired number of
parameters.
Occasionally it is necessary to force the inclusion of predictors into the OLS significance testing.
These can be included by setting include
to the numeric indices (i.e. the column numbers)
of the predictors to force-include.
Note that force exclusion of an intercept in OLS (by setting intercept = FALSE
) can seriously
bias results - only do this if you are sure at x_i = 0 for all i that y = 0 and that
all relationships are perfectly linear.
A vector of p-values, where the ith entry corresponds to the p-value for the predictor
defined by the ith column of X
.
Kieran Campbell kieran.campbell@dpag.ox.ac.uk
Meinshausen, Nicolai, Lukas Meier, and Peter Buhlmann. "P-values for high-dimensional regression." Journal of the American Statistical Association 104.488 (2009).
1 2 3 4 5 6 7 8 9 10 11 12 |
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.