snpnet | R Documentation |
Fit the entire lasso or elastic-net solution path using the Batch Screening Iterative Lasso (BASIL) algorithm on large phenotype-genotype datasets.
snpnet(genotype.pfile, phenotype.file, phenotype, family = NULL, covariates = NULL, alpha = 1, nlambda = 100, lambda.min.ratio = ifelse(nobs < nvars, 0.01, 1e-04), lambda = NULL, split.col = NULL, p.factor = NULL, status.col = NULL, mem = NULL, configs = NULL)
genotype.pfile |
the PLINK 2.0 pgen file that contains genotype. We assume the existence of genotype.pfile.pgen,pvar.zst,psam. |
phenotype.file |
the path of the file that contains the phenotype values and can be read as as a table. There should be FID (family ID) and IID (individual ID) columns containing the identifier for each individual, and the phenotype column(s). (optional) some covariate columns and a column specifying the training/validation split can be included in this file. |
phenotype |
the name of the phenotype. Must be the same as the corresponding column name in the phenotype file. |
family |
the type of the phenotype: "gaussian", "binomial", or "cox". If not provided or NULL, it will be detected based on the number of levels in the response. |
covariates |
a character vector containing the names of the covariates included in the lasso fitting, whose coefficients will not be penalized. The names must exist in the column names of the phenotype file. |
alpha |
the elastic-net mixing parameter, where the penalty is defined as alpha * ||beta||_1 + (1-alpha)/2 * ||beta||_2^2. alpha = 1 corresponds to the lasso penalty, while alpha = 0 corresponds to the ridge penalty. |
nlambda |
the number of lambda values - default is 100. |
lambda.min.ratio |
smallest value for lambda, as a fraction of lambda.max, the (data derived) entry value, i.e. the smallest value for which all coefficients are zero. The default depends on the sample size nobs relative to the number of actual variables nvars (after QC filtering). If nobs > nvars, the default is 0.0001, close to zero. If nobs < nvars, the default is 0.01. A very small value of lambda.min.ratio will lead to a saturated fit in the nobs < nvars case. |
lambda |
one can specify the full lambda list on which the lasso/elastic-net will be solved. Once provided, 'lambda' and 'lambda.min.ratio' will be ignored. It can be used for refitting after the optimal parameter is selected by validation. |
split.col |
the column name in the phenotype file that specifies the membership of individuals to the training or the validation set. The individuals marked as "train" and "val" will be treated as the training and validation set, respectively. When specified, the model performance is evaluated on both the training and the validation sets. |
p.factor |
a named vector of separate penalty factors applied to each coefficient. This is a number that multiplies lambda to allow different shrinkage. If not provided, default is 1 for all variables. Otherwise should be complete and positive for all variables. |
status.col |
the column name for the status column for Cox proportional hazards model. When running the Cox model, the specified column must exist in the phenotype file. |
mem |
Memory (MB) available for the program. It tells PLINK 2.0 the amount of memory it can harness for the computation. IMPORTANT if using a job scheduler. |
configs |
a list of other config parameters.
|
Junyang Qian, Wenfei Du, Yosuke Tanigawa, Matthew Aguirre, Robert Tibshirani, Manuel A. Rivas, and Trevor Hastie. "A Fast and Flexible Algorithm for Solving the Lasso in Large-scale and Ultrahigh-dimensional Problems." bioRxiv (2019): https://doi.org/10.1101/630079
A list containing the solution path, the metric evaluated on training/validation set and others.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.