Variable Selection on Random Sample Splits.
Performs repeated variable selection via the lasso on random sample splits.
multisplit(x, y, covar = NULL, B = 50)
The SNP data matrix, of size
The response vector. It can be continuous or discrete.
NULL or the matrix of covariates one wishes to control for, of
The number of random splits. Default value is 50.
The samples are divided into two random splits of approximately
equal size. The first subsample is used for variable selection, which is
implemented using glmnet. The first
which enter the lasso path are selected. The procedure is repeated
If one or more covariates are specified, these will be added unpenalized to the regression.
A data frame with 2 components. A matrix of size
B x [nobs/2]
containing the second subsample of each split, and a matrix of size
B x [nobs/6] containing the selected variables in each split.
Meinshausen, N., Meier, L. and Buhlmann, P. (2009), P-values for high-dimensional regression, Journal of the American Statistical Association 104, 1671-1681.
1 2 3 4 5 6