Description Usage Arguments Format Value References Examples
Uses cross-validation on the LEGIT model. Note that this is not a very fast implementation since it was written in R.
1 |
data |
data.frame of the dataset to be used. |
genes |
data.frame of the variables inside the genetic score G (can be any sort of variable, doesn't even have to be genetic). |
env |
data.frame of the variables inside the environmental score E (can be any sort of variable, doesn't even have to be environmental). |
formula |
Model formula. Use E for the environmental score and G for the genetic score. Do not manually code interactions, write them in the formula instead (ex: G*E*z or G:E:z). |
cv_iter |
Number of cross-validation iterations (Default = 5). |
cv_folds |
Number of cross-validation folds (Default = 10). Using |
folds |
Optional list of vectors containing the fold number for each observation. Bypass cv_iter and cv_folds. Setting your own folds could be important for certain data types like time series or longitudinal data. |
classification |
Set to TRUE if you are doing classification (binary outcome). |
start_genes |
Optional starting points for genetic score (must be the same length as the number of columns of |
start_env |
Optional starting points for environmental score (must be the same length as the number of columns of |
eps |
Threshold for convergence (.01 for quick batch simulations, .0001 for accurate results). |
maxiter |
Maximum number of iterations. |
family |
Outcome distribution and link function (Default = gaussian). |
ylim |
Optional vector containing the known min and max of the outcome variable. Even if your outcome is known to be in [a,b], if you assume a Gaussian distribution, predict() could return values outside this range. This parameter ensures that this never happens. This is not necessary with a distribution that already assumes the proper range (ex: [0,1] with binomial distribution). |
seed |
Seed for cross-validation folds. |
Huber_p |
Parameter controlling the Huber cross-validation error (Default = 1.345). |
id |
Optional id of observations, can be a vector or data.frame (only used when returning list of possible outliers). |
crossover |
If not NULL, estimates the crossover point of E using the provided value as starting point (To test for diathesis-stress vs differential susceptibility). |
crossover_fixed |
If TRUE, instead of estimating the crossover point of E, we force/fix it to the value of "crossover". (Used when creating a diathes-stress model) (Default = FALSE). |
An object of class function
of length 1.
If classification
= FALSE, returns a list containing, in the following order: a vector of the cross-validated R^2 at each iteration, a vector of the Huber cross-validation error at each iteration, a vector of the L1-norm cross-validation error at each iteration, a matrix of the possible outliers (standardized residuals > 2.5 or < -2.5) and their corresponding standardized residuals and standardized pearson residuals. If classification
= TRUE, returns a list containing, in the following order: a vector of the cross-validated R^2 at each iteration, a vector of the Huber cross-validation error at each iteration, a vector of the L1-norm cross-validation error at each iteration, a vector of the AUC at each iteration, a matrix of the best choice of threshold (based on Youden index) and the corresponding specificity and sensitivity at each iteration, and a list of objects of class "roc" (to be able to make roc curve plots) at each iteration. The Huber and L1-norm cross-validation errors are alternatives to the usual cross-validation L2-norm error (which the R^2 is based on) that are more resistant to outliers, the lower the values the better.
Denis Heng-Yan Leung. Cross-validation in nonparametric regression with outliers. Annals of Statistics (2005): 2291-2310.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 | ## Not run:
train = example_3way(250, 2.5, seed=777)
# Cross-validation 4 times with 5 Folds
cv_5folds = LEGIT_cv(train$data, train$G, train$E, y ~ G*E*z, cv_iter=4, cv_folds=5)
cv_5folds
# Leave-one-out cross-validation (Note: very slow)
cv_loo = LEGIT_cv(train$data, train$G, train$E, y ~ G*E*z, cv_iter=1, cv_folds=250)
cv_loo
# Cross-validation 4 times with 5 Folds (binary outcome)
train_bin = example_2way(500, 2.5, logit=TRUE, seed=777)
cv_5folds_bin = LEGIT_cv(train_bin$data, train_bin$G, train_bin$E, y ~ G*E,
cv_iter=4, cv_folds=5, classification=TRUE, family=binomial)
cv_5folds_bin
par(mfrow=c(2,2))
pROC::plot.roc(cv_5folds_bin$roc_curve[[1]])
pROC::plot.roc(cv_5folds_bin$roc_curve[[2]])
pROC::plot.roc(cv_5folds_bin$roc_curve[[3]])
pROC::plot.roc(cv_5folds_bin$roc_curve[[4]])
## End(Not run)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.