Description Usage Arguments Details Value Author(s) References Examples
Provides pvalues for lasso regression. This method implements the multisample splitting method for significance testing in a highdimensional regression context. The basic idea is to split a sample in two, perform variable selection using LASSO on one half and derive pvalues using ordinary least squares (OLS) on the other half.
Note that the penalisation parameter λ and s are identical  they are named this way for consistency with the package glmnet.
This method is only implemented for a single response variable, since general lasso regression requires the same set of parameters to be selected for every response variable, which is overly restrictive in some cases.
1 2 3 
y 
Response vector 
X 
Design matrix 
B 
Number of times to partition the sample 
s 
The value of lambda to use in lasso. Can be:

include 
Set of predictors to be forceincluded in OLS analysis 
gamma.min 
Lower bound for gamma in the adaptive search for the best pvalue (default 0.05) 
fixedP 
The fixed number of parameters to use (if 
nfolds 
Number of folds of crossvalidation in the glmnet nfold crossvalidation 
intercept 
Whether to include an intercept in the OLS regression (default = 
The method works by partitioning the dataset randomly in two halves. Lasso regression is
performed on one half, and using a particular value of the penalisation parameter lambda
then a subset of the predictor variables are chosen. Ordinary least squares regression is
then performed on the other half of the data. If S variables are chosen for a given
split, then the pvalues are Bonferroni moderated to S.p. The pvalues of all variables
not selected for a given split are then set to 1. This process is repeated B
times, and
subsequently B
sets of pvalues are generated. These pvalues are then aggregated across
splits to provide a given pvalue for each predictor variable. For full details see the
original paper. The aggregation requires an extra parameter γ_min, which is recommended
to be 0.05 (and set by the parameter gamma.min
).
Care must be taken with regards to the number of measurements (length(y)
) and
the number of folds for crossvalidation (nfolds
). The package glmnet
requres at least 3 samples in a crossvalidation split in finding the optimal
λ (not the same as a multisample split). Therefore, if we start
with N samples, glmnet
receives at least floor(N/2) which it then splits
in to nfolds
for cross validation. As such we necessarily need
floor((floor(N/2))/nfolds) > 3
which is safely satisfied provided N > 6*nfolds + 3
When choosing B
, there is a tradeoff between bias and efficiency. A larger B
will lead to a less biased result (i.e. less sensitive to the random sampling of folds) but
can require significantly more computation time. A heuristically 'good' value is B=50
.
The choice of λ = s
is detailed in the glmnet
package. For standard
problems the best choice may be lambda.min
, though if you are specifically trying to
minimise the number of parameters necessary, lambda.1se
(a one, not an L) is a good choice.
Alternatively it may be advantageous to select a fixed number of parameters on every split. This
can be performed by setting s="usefixed"
and fixedP
to the desired number of
parameters.
Occasionally it is necessary to force the inclusion of predictors into the OLS significance testing.
These can be included by setting include
to the numeric indices (i.e. the column numbers)
of the predictors to forceinclude.
Note that force exclusion of an intercept in OLS (by setting intercept = FALSE
) can seriously
bias results  only do this if you are sure at x_i = 0 for all i that y = 0 and that
all relationships are perfectly linear.
A vector of pvalues, where the ith entry corresponds to the pvalue for the predictor
defined by the ith column of X
.
Kieran Campbell [email protected]
Meinshausen, Nicolai, Lukas Meier, and Peter Buhlmann. "Pvalues for highdimensional regression." Journal of the American Statistical Association 104.488 (2009).
1 2 3 4 5 6 7 8 9 10 11 12 
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.