cv_plmm | R Documentation |
Performs k-fold cross validation for lasso-, MCP-, or SCAD-penalized
linear mixed models over a grid of values for the regularization parameter lambda
.
cv_plmm(
design,
y = NULL,
K = NULL,
diag_K = NULL,
eta_star = NULL,
penalty = "lasso",
type = "blup",
gamma,
alpha = 1,
lambda_min,
nlambda = 100,
lambda,
eps = 1e-04,
max_iter = 10000,
warn = TRUE,
init = NULL,
cluster,
nfolds = 5,
seed,
fold = NULL,
trace = FALSE,
save_rds = NULL,
return_fit = TRUE,
...
)
design |
The first argument must be one of three things:
(1) |
y |
Optional: In the case where |
K |
Similarity matrix used to rotate the data. This should either be (1) a known matrix that reflects the covariance of y, (2) an estimate (Default is |
diag_K |
Logical: should K be a diagonal matrix? This would reflect observations that are unrelated, or that can be treated as unrelated. Defaults to FALSE. Note: plmm() does not check to see if a matrix is diagonal. If you want to use a diagonal K matrix, you must set diag_K = TRUE. |
eta_star |
Optional argument to input a specific eta term rather than estimate it from the data. If K is a known covariance matrix that is full rank, this should be 1. |
penalty |
The penalty to be applied to the model. Either "lasso" (the default), "SCAD", or "MCP". |
type |
A character argument indicating what should be returned from predict.plmm(). If type == 'lp', predictions are based on the linear predictor, X beta. If type == 'blup', predictions are based on the sum of the linear predictor and the estimated random effect (BLUP). Defaults to 'blup', as this has shown to be a superior prediction method in many applications. |
gamma |
The tuning parameter of the MCP/SCAD penalty (see details). Default is 3 for MCP and 3.7 for SCAD. |
alpha |
Tuning parameter for the Mnet estimator which controls the relative contributions from the MCP/SCAD penalty and the ridge, or L2 penalty. alpha=1 is equivalent to MCP/SCAD penalty, while alpha=0 would be equivalent to ridge regression. However, alpha=0 is not supported; alpha may be arbitrarily small, but not exactly 0. |
lambda_min |
The smallest value for lambda, as a fraction of lambda.max. Default is .001 if the number of observations is larger than the number of covariates and .05 otherwise. |
nlambda |
Length of the sequence of lambda. Default is 100. |
lambda |
A user-specified sequence of lambda values. By default, a sequence of values of length nlambda is computed, equally spaced on the log scale. |
eps |
Convergence threshold. The algorithm iterates until the RMSD for the change in linear predictors for each coefficient is less than eps. Default is |
max_iter |
Maximum number of iterations (total across entire path). Default is 10000. |
warn |
Return warning messages for failures to converge and model saturation? Default is TRUE. |
init |
Initial values for coefficients. Default is 0 for all columns of X. |
cluster |
Option for in-memory data only: cv_plmm() can be run in parallel across a cluster using the parallel package. The cluster must be set up in advance using parallel::makeCluster(). The cluster must then be passed to cv_plmm(). Note: this option is not yet implemented for filebacked data. |
nfolds |
The number of cross-validation folds. Default is 5. |
seed |
You may set the seed of the random number generator in order to obtain reproducible results. |
fold |
Which fold each observation belongs to. By default, the observations are randomly assigned. |
trace |
If set to TRUE, inform the user of progress by announcing the beginning of each CV fold. Default is FALSE. |
save_rds |
Optional: if a filepath and name without the '.rds' suffix is specified (e.g., |
return_fit |
Optional: a logical value indicating whether the fitted model should be returned as a |
... |
Additional arguments to |
a list with 12 items:
type: the type of prediction used ('lp' or 'blup')
cve: numeric vector with the cross validation error (CVE) at each value of lambda
cvse: numeric vector with the estimated standard error associated with each value of for cve
fold: numeric n
length vector of integers indicating the fold to which each observation was assigned
lambda: numeric vector of lambda
values
fit: the overall fit of the object, including all predictors; this is a
list as returned by plmm()
min: The index corresponding to the value of lambda
that minimizes cve
lambda_min: The lambda
value at which cve
is minmized
min1se: The index corresponding to the value of lambda
within
standard error of that which minimizes cve
lambda1se: largest value of lambda such that error is within 1 standard error of the minimum.
null.dev: numeric value representing the deviance for the
intercept-only model. If you have supplied your own lambda
sequence,
this quantity may not be meaningful.
estimated_Sigma: an n x n matrix representing the estimated covariance matrix.
admix_design <- create_design(X = admix$X, y = admix$y)
cv_fit <- cv_plmm(design = admix_design)
print(summary(cv_fit))
plot(cv_fit)
# Note: for examples with filebacked data, see the filebacking vignette
# https://pbreheny.github.io/plmmr/articles/filebacking.html
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.