multisplit: Variable Selection on Random Sample Splits.

Description Usage Arguments Details Value References Examples

Description

Performs repeated variable selection via the lasso on random sample splits.

Usage

1
multisplit(x, y, covar = NULL, B = 50)

Arguments

x

The SNP data matrix, of size nobs x nvar. Each row represents a subject, each column a SNP.

y

The response vector. It can be continuous or discrete.

covar

NULL or the matrix of covariates one wishes to control for, of size nobs x ncovar.

B

The number of random splits. Default value is 50.

Details

The samples are divided into two random splits of approximately equal size. The first subsample is used for variable selection, which is implemented using glmnet. The first [nobs/6] variables which enter the lasso path are selected. The procedure is repeated B times.

If one or more covariates are specified, these will be added unpenalized to the regression.

Value

A data frame with 2 components. A matrix of size B x [nobs/2] containing the second subsample of each split, and a matrix of size B x [nobs/6] containing the selected variables in each split.

References

Meinshausen, N., Meier, L. and Buhlmann, P. (2009), P-values for high-dimensional regression, Journal of the American Statistical Association 104, 1671-1681.

Examples

1
2
3
4
5
6
library(MASS)
x <- mvrnorm(60,mu = rep(0,200), Sigma = diag(200))
beta <- rep(1,200)
beta[c(5,9,3)] <- 3
y <- x %*% beta + rnorm(60)
res.multisplit <- multisplit(x, y)


Search within the hierGWAS package
Search all R packages, documentation and source code

Questions? Problems? Suggestions? or email at ian@mutexlabs.com.

Please suggest features or report bugs with the GitHub issue tracker.

All documentation is copyright its authors; we didn't write any of that.