GridLMMnet: LASSO solutions in a linear mixed model using GridLMM

View source: R/GridLMMnet.R

GridLMMnetR Documentation

LASSO solutions in a linear mixed model using GridLMM

Description

Finds LASSO or Elastic Net solutions for a multiple regression problem with correlated errors.

Usage

GridLMMnet(
  formula,
  data,
  X,
  X_ID = "ID",
  weights = NULL,
  centerX = TRUE,
  scaleX = TRUE,
  relmat = NULL,
  normalize_relmat = TRUE,
  h2_step = 0.1,
  h2_start = NULL,
  alpha = 1,
  lambdaType = "s2e",
  scoreType = "LL",
  nlambda = 100,
  lambda.min.ratio = ifelse(nobs < nvars, 0.01, 1e-04),
  lambda = NULL,
  penalty.factor = NULL,
  nfolds = NULL,
  foldid = NULL,
  RE_setup = NULL,
  V_setup = NULL,
  save_V_folder = NULL,
  diagonalize = T,
  mc.cores = parallel::detectCores(),
  clusterType = "mclapply",
  verbose = T,
  ...
)

Arguments

formula

A two-sided linear formula as used in lmer describing the fixed-effects and random-effects of the model on the RHS and the response on the LHS. Note: correlated random-effects are not implemented, so using one or two vertical bars (|) or one is identical. At least one random effect is needed. Unlike lmer, random effects can have as many as there are observations.

data

A data frame containing the variables named in formula.

X

Variables in model that well be penalized with the elastic net penalty. Covariates specified in formula are not penalized.

X_ID

Column of data that identifies the row of X that corresponding to each observation. It is possible that multiple observations reference the same row of X.

weights

An optional vector of observation-specific weights.

centerX

TRUE/FALSE for each. Applied to the X matrix before using X to form any GRMs.

scaleX

TRUE/FALSE for each. Applied to the X matrix before using X to form any GRMs.

relmat

Either: 1) A list of matrices that are proportional to the (within) covariance structures of the group level effects. 2) A list of lists with elements (K, p) with a covariance matrix and an integer listing the number of markers used to estimate the covariance matrix. This is used for appropriate downdating of V to remove proximal markers for each test. The names of the matrices / list elements should correspond to the columns in data that are used as grouping factors. All levels of the grouping factor should appear as rownames of the corresponding matrix.

normalize_relmat

should ZKZt matrices be normalized so that mean(diag) == 1? Default (true)

h2_step

Step size of the grid

h2_start

Optional. Matrix with each row a vector of h^2 parameters defining starting values for the grid. Typically ML/REML solutions for the null model. If null, will be calculated using GridLMM_ML.

alpha

The elasticnet mixing parameter, with 0\le\alpha\le 1. The penalty is defined as

(1-\alpha)/2||\beta||_2^2+\alpha||\beta||_1.

alpha=1 is the lasso penalty, and alpha=0 the ridge penalty.

nlambda

The number of lambda values - default is 100.

lambda.min.ratio

Smallest value for lambda, as a fraction of lambda.max, the (data derived) entry value (i.e. the smallest value for which all coefficients are zero). The default depends on the sample size nobs relative to the number of variables nvars. If nobs > nvars, the default is 0.0001, close to zero. If nobs < nvars, the default is 0.01. A very small value of lambda.min.ratio will lead to a saturated fit in the nobs < nvars case. This is undefined for "binomial" and "multinomial" models, and glmnet will exit gracefully when the percentage deviance explained is almost 1.

lambda

A user supplied lambda sequence. Typical usage is to have the program compute its own lambda sequence based on nlambda and lambda.min.ratio. Supplying a value of lambda overrides this. WARNING: use with care. Avoid supplying a single value for lambda (for predictions after CV use predict() instead). Supply instead a decreasing sequence of lambda values. glmnet relies on its warms starts for speed, and its often faster to fit a whole path than compute a single fit.

penalty.factor

Separate penalty factors can be applied to each coefficient. This is a number that multiplies lambda to allow differential shrinkage. Can be 0 for some variables, which implies no shrinkage, and that variable is always included in the model. Default is 1 for all variables (and implicitly infinity for variables listed in exclude). Note: the penalty factors are internally rescaled to sum to nvars, and the lambda sequence will reflect this change.

foldid

vector of integers that divide the data into a set of non-overlapping folds for cross-validation.

V_setup

Optional. A list produced by a GridLMM function containing the pre-processed V decompositions for each grid vertex, or the information necessary to create this. Generally saved from a previous run of GridLMM on the same data.

save_V_folder

Optional. A character vector giving a folder to save pre-processed V decomposition files for future / repeated use. If null, V decompositions are stored in memory

diagonalize

If TRUE and the model includes only a single random effect, the "GEMMA" trick will be used to diagonalize V. This is done by calculating the SVD of K, which can be slow for large samples.

mc.cores

Number of processor cores used for parallel evaluations. Note that this uses 'mclapply', so the memory requires grow rapidly with mc.cores, because the marker matrix gets duplicated in memory for each core.

verbose

Should progress be printed to the screen?

...

Details

Finds the full LASSO or Elastic Net solution path by running glmnet at each grid vertex. If foldid is provided, cross-validation scores will be calculated.

Value

If foldid and nfold are null, an object with S3 class "glmnet","*" , where "*" is "elnet". See glmnet. Otherwise, an object with S3 class "cv.glmnet". See cv.glmnet.


deruncie/GridLMM documentation built on May 2, 2023, 7:18 p.m.