LEGIT_cv: Cross-validation for the LEGIT model
In AlexiaJM/LEGIT: Latent Environmental & Genetic InTeraction (LEGIT) Model

Description Usage Arguments Format Value References Examples

Uses cross-validation on the LEGIT model. Note that this is not a very fast implementation since it was written in R.

LEGIT_cv

`data`	data.frame of the dataset to be used.
`genes`	data.frame of the variables inside the genetic score G (can be any sort of variable, doesn't even have to be genetic).
`env`	data.frame of the variables inside the environmental score E (can be any sort of variable, doesn't even have to be environmental).
`formula`	Model formula. Use E for the environmental score and G for the genetic score. Do not manually code interactions, write them in the formula instead (ex: GEz or G:E:z).
`cv_iter`	Number of cross-validation iterations (Default = 5).
`cv_folds`	Number of cross-validation folds (Default = 10). Using `cv_folds=NROW(data)` will lead to leave-one-out cross-validation.
`folds`	Optional list of vectors containing the fold number for each observation. Bypass cv_iter and cv_folds. Setting your own folds could be important for certain data types like time series or longitudinal data.
`classification`	Set to TRUE if you are doing classification (binary outcome).
`start_genes`	Optional starting points for genetic score (must be the same length as the number of columns of `genes`).
`start_env`	Optional starting points for environmental score (must be the same length as the number of columns of `env`).
`eps`	Threshold for convergence (.01 for quick batch simulations, .0001 for accurate results).
`maxiter`	Maximum number of iterations.
`family`	Outcome distribution and link function (Default = gaussian).
`ylim`	Optional vector containing the known min and max of the outcome variable. Even if your outcome is known to be in [a,b], if you assume a Gaussian distribution, predict() could return values outside this range. This parameter ensures that this never happens. This is not necessary with a distribution that already assumes the proper range (ex: [0,1] with binomial distribution).
`seed`	Seed for cross-validation folds.
`Huber_p`	Parameter controlling the Huber cross-validation error (Default = 1.345).
`id`	Optional id of observations, can be a vector or data.frame (only used when returning list of possible outliers).
`crossover`	If not NULL, estimates the crossover point of E using the provided value as starting point (To test for diathesis-stress vs differential susceptibility).
`crossover_fixed`	If TRUE, instead of estimating the crossover point of E, we force/fix it to the value of "crossover". (Used when creating a diathes-stress model) (Default = FALSE).

An object of class function of length 1.

If classification = FALSE, returns a list containing, in the following order: a vector of the cross-validated R^2 at each iteration, a vector of the Huber cross-validation error at each iteration, a vector of the L1-norm cross-validation error at each iteration, a matrix of the possible outliers (standardized residuals > 2.5 or < -2.5) and their corresponding standardized residuals and standardized pearson residuals. If classification = TRUE, returns a list containing, in the following order: a vector of the cross-validated R^2 at each iteration, a vector of the Huber cross-validation error at each iteration, a vector of the L1-norm cross-validation error at each iteration, a vector of the AUC at each iteration, a matrix of the best choice of threshold (based on Youden index) and the corresponding specificity and sensitivity at each iteration, and a list of objects of class "roc" (to be able to make roc curve plots) at each iteration. The Huber and L1-norm cross-validation errors are alternatives to the usual cross-validation L2-norm error (which the R^2 is based on) that are more resistant to outliers, the lower the values the better.

Denis Heng-Yan Leung. Cross-validation in nonparametric regression with outliers. Annals of Statistics (2005): 2291-2310.

## Not run: 
train = example_3way(250, 2.5, seed=777)
# Cross-validation 4 times with 5 Folds
cv_5folds = LEGIT_cv(train$data, train$G, train$E, y ~ G*E*z, cv_iter=4, cv_folds=5)
cv_5folds
# Leave-one-out cross-validation (Note: very slow)
cv_loo = LEGIT_cv(train$data, train$G, train$E, y ~ G*E*z, cv_iter=1, cv_folds=250)
cv_loo
# Cross-validation 4 times with 5 Folds (binary outcome)
train_bin = example_2way(500, 2.5, logit=TRUE, seed=777)
cv_5folds_bin = LEGIT_cv(train_bin$data, train_bin$G, train_bin$E, y ~ G*E, 
cv_iter=4, cv_folds=5, classification=TRUE, family=binomial)
cv_5folds_bin
par(mfrow=c(2,2))
pROC::plot.roc(cv_5folds_bin$roc_curve[[1]])
pROC::plot.roc(cv_5folds_bin$roc_curve[[2]])
pROC::plot.roc(cv_5folds_bin$roc_curve[[3]])
pROC::plot.roc(cv_5folds_bin$roc_curve[[4]])

## End(Not run)