gesso.cv: Cross-Validation

Description Usage Arguments Value Examples

View source: R/gesso.R

Description

Performs nfolds-fold cross-validation to tune hyperparmeters lambda_1 and lambda_2 for the gesso model.

Usage

1
2
3
4
5
gesso.cv(G, E, Y, C = NULL, normalize = TRUE, normalize_response = FALSE, grid = NULL,
         grid_size = 20, grid_min_ratio = NULL, alpha = NULL, family = "gaussian", 
         type_measure = "loss", fold_ids = NULL, nfolds = 4, 
         parallel = TRUE, seed = 42, tolerance = 1e-3, max_iterations = 5000, 
         min_working_set_size = 100, verbose = TRUE)

Arguments

G

matrix of main effects of size n x p, variables organized by columns

E

vector of environmental measurments

Y

outcome vector. Set family="gaussian" for the continuous outcome and family="binomial" for the binary outcome with 0/1 levels

C

matrix of confounders of size n x m, variables organized by columns

normalize

TRUE to normalize matrix G and vector E

normalize_response

TRUE to normalize vector Y (for family="gaussian")

grid

grid sequence for tuning hyperparameters, we use the same grid for lambda_1 and lambda_2

grid_size

specify grid_size to generate grid automatically. Grid is generated by calculating max_lambda from the data (smallest lambda such that all the coefficients are zero). min_lambda is calculated as a product of max_lambda and grid_min_ratio. The program then generates grid_size values equidistant on the log10 scale from min_lambda to max_lambda

grid_min_ratio

parameter to determine min_lambda (smallest value for the grid of lambdas), default is 0.1 for p > n, 0.01 otherwise

alpha

if NULL independent 2D grid is used for (lambda_1, lambda_2), else 1D grid is used where lambda_2 = alpha * lambda_1, i.e. (lambda_1, alpha * lambda_1)

family

"gaussian" for continuous outcome and "binomial" for binary

type_measure

loss to use for cross-validation. Specity type_measure="loss" for neative log likelihood or type_measure="auc" for AUC (for family="binomial" only)

fold_ids

option to input custom folds assignments

tolerance

tolerance for the dual gap convergence criterion

max_iterations

maximum number of iterations

min_working_set_size

minimum size of the working set

nfolds

number of cross-validation splits

parallel

TRUE to enable parallel cross-validation

seed

set random seed to control random folds assignments

verbose

TRUE to print messages

Value

A list of objects

cv_result

a tibble with cross-validation results: averaged across folds loss and the number of non-zero coefficients for each value of (lambda_1, lambda_2) path. Could be used for custom parameters tuning (ex: select (lambda_1, lambda_2) with a sertain number of non-zero main effects and/or a sertain number of interactions).

  • mean_loss averaged across folds loss value, vector of size lambda_1*lambda_2

  • mean_beta_g_nonzero averaged across folds number of non-zero main effects, vector of size lambda_1*lambda_2

  • mean_beta_gxe_nonzero averaged across folds number of non-zero interactions, vector of size lambda_1*lambda_2

  • lambda_1 lambda_1 pass, decreasing

  • lambda_2 lambda_2 pass, oscillating

lambda_min

a tibble of optimal (lambda_1, lambda_2) values, tuning parameter values that give minimum cross-validation loss (mean_loss)

fit

list, return of the function gesso.fit on the full data

grid

vector of values used for hyperparameters tuning

full_cv_result

inner variables

Examples

1
2
3
4
5
data = data.gen()
tune_model = gesso.cv(data$G_train, data$E_train, data$Y_train, 
                      grid_size=20, parallel=TRUE, nfolds=3)
gxe_coefficients = gesso.coef(tune_model$fit, tune_model$lambda_min)$beta_gxe        
g_coefficients = gesso.coef(tune_model$fit, tune_model$lambda_min)$beta_g          

gesso documentation built on Nov. 30, 2021, 9:09 a.m.