GridLMMnet: LASSO solutions in a linear mixed model using GridLMM
In deruncie/GridLMM: Efficient Mixed Models for GWAS with multiple Random Effects

View source: R/GridLMMnet.R

GridLMMnet

R Documentation

LASSO solutions in a linear mixed model using GridLMM

Description

Finds LASSO or Elastic Net solutions for a multiple regression problem with correlated errors.

Usage

GridLMMnet(
  formula,
  data,
  X,
  X_ID = "ID",
  weights = NULL,
  centerX = TRUE,
  scaleX = TRUE,
  relmat = NULL,
  normalize_relmat = TRUE,
  h2_step = 0.1,
  h2_start = NULL,
  alpha = 1,
  lambdaType = "s2e",
  scoreType = "LL",
  nlambda = 100,
  lambda.min.ratio = ifelse(nobs < nvars, 0.01, 1e-04),
  lambda = NULL,
  penalty.factor = NULL,
  nfolds = NULL,
  foldid = NULL,
  RE_setup = NULL,
  V_setup = NULL,
  save_V_folder = NULL,
  diagonalize = T,
  mc.cores = parallel::detectCores(),
  clusterType = "mclapply",
  verbose = T,
  ...
)

Arguments

`formula`	A two-sided linear formula as used in `lmer` describing the fixed-effects and random-effects of the model on the RHS and the response on the LHS. Note: correlated random-effects are not implemented, so using one or two vertical bars (`\|`) or one is identical. At least one random effect is needed. Unlike `lmer`, random effects can have as many as there are observations.
`data`	A data frame containing the variables named in `formula`.
`X`	Variables in model that well be penalized with the elastic net penalty. Covariates specified in `formula` are not penalized.
`X_ID`	Column of `data` that identifies the row of `X` that corresponding to each observation. It is possible that multiple observations reference the same row of `X`.
`weights`	An optional vector of observation-specific weights.
`centerX`	TRUE/FALSE for each. Applied to the `X` matrix before using `X` to form any GRMs.
`scaleX`	TRUE/FALSE for each. Applied to the `X` matrix before using `X` to form any GRMs.
`relmat`	Either: 1) A list of matrices that are proportional to the (within) covariance structures of the group level effects. 2) A list of lists with elements (`K`, `p`) with a covariance matrix and an integer listing the number of markers used to estimate the covariance matrix. This is used for appropriate downdating of `V` to remove proximal markers for each test. The names of the matrices / list elements should correspond to the columns in `data` that are used as grouping factors. All levels of the grouping factor should appear as rownames of the corresponding matrix.
`normalize_relmat`	should ZKZt matrices be normalized so that mean(diag) == 1? Default (true)
`h2_step`	Step size of the grid
`h2_start`	Optional. Matrix with each row a vector of `h^2` parameters defining starting values for the grid. Typically ML/REML solutions for the null model. If null, will be calculated using GridLMM_ML.
`alpha`	The elasticnet mixing parameter, with `0\le\alpha\le 1`. The penalty is defined as `(1-\alpha)/2\|\|\beta\|\|_2^2+\alpha\|\|\beta\|\|_1.` `alpha=1` is the lasso penalty, and `alpha=0` the ridge penalty.
`nlambda`	The number of `lambda` values - default is 100.
`lambda.min.ratio`	Smallest value for `lambda`, as a fraction of `lambda.max`, the (data derived) entry value (i.e. the smallest value for which all coefficients are zero). The default depends on the sample size `nobs` relative to the number of variables `nvars`. If `nobs > nvars`, the default is `0.0001`, close to zero. If `nobs < nvars`, the default is `0.01`. A very small value of `lambda.min.ratio` will lead to a saturated fit in the `nobs < nvars` case. This is undefined for `"binomial"` and `"multinomial"` models, and `glmnet` will exit gracefully when the percentage deviance explained is almost 1.
`lambda`	A user supplied `lambda` sequence. Typical usage is to have the program compute its own `lambda` sequence based on `nlambda` and `lambda.min.ratio`. Supplying a value of `lambda` overrides this. WARNING: use with care. Avoid supplying a single value for `lambda` (for predictions after CV use `predict()` instead). Supply instead a decreasing sequence of `lambda` values. `glmnet` relies on its warms starts for speed, and its often faster to fit a whole path than compute a single fit.
`penalty.factor`	Separate penalty factors can be applied to each coefficient. This is a number that multiplies `lambda` to allow differential shrinkage. Can be 0 for some variables, which implies no shrinkage, and that variable is always included in the model. Default is 1 for all variables (and implicitly infinity for variables listed in `exclude`). Note: the penalty factors are internally rescaled to sum to nvars, and the lambda sequence will reflect this change.
`foldid`	vector of integers that divide the data into a set of non-overlapping folds for cross-validation.
`V_setup`	Optional. A list produced by a GridLMM function containing the pre-processed V decompositions for each grid vertex, or the information necessary to create this. Generally saved from a previous run of GridLMM on the same data.
`save_V_folder`	Optional. A character vector giving a folder to save pre-processed V decomposition files for future / repeated use. If null, V decompositions are stored in memory
`diagonalize`	If TRUE and the model includes only a single random effect, the "GEMMA" trick will be used to diagonalize V. This is done by calculating the SVD of K, which can be slow for large samples.
`mc.cores`	Number of processor cores used for parallel evaluations. Note that this uses 'mclapply', so the memory requires grow rapidly with `mc.cores`, because the marker matrix gets duplicated in memory for each core.
`verbose`	Should progress be printed to the screen?
`...`

Details

Finds the full LASSO or Elastic Net solution path by running glmnet at each grid vertex. If foldid is provided, cross-validation scores will be calculated.

Value

If foldid and nfold are null, an object with S3 class "glmnet","*" , where "*" is "elnet". See glmnet. Otherwise, an object with S3 class "cv.glmnet". See cv.glmnet.

deruncie/GridLMM documentation built on July 3, 2025, 6:32 p.m.