stepwise_search | R Documentation |

[Fast, recommended for small number of variables] Adds the best variable or drops the worst variable one at a time in the genetic (if `search="genes"`

) or environmental score (if `search="env"`

). You can select the desired search criterion (AIC, BIC, cross-validation error, cross-validation AUC) to determine which variable is the best/worst and should be added/dropped. Note that when the number of variables in *G* and *E* is large, this does not generally converge to the optimal subset, this function is only recommended when you have a small number of variables (e.g. 2 environments, 6 genetic variants). If using cross-validation (`search_criterion="cv"`

or `search_criterion="cv_AUC"`

), to prevent cross-validating with each variable (extremely slow), we recommend setting a p-value threshold (`p_threshold`

) and forcing the algorithm not to look at models with bigger AIC (`exclude_worse_AIC=TRUE`

).

```
stepwise_search(
data,
formula,
interactive_mode = FALSE,
genes_original = NULL,
env_original = NULL,
genes_extra = NULL,
env_extra = NULL,
search_type = "bidirectional-forward",
search = "both",
search_criterion = "AIC",
forward_exclude_p_bigger = 0.2,
backward_exclude_p_smaller = 0.01,
exclude_worse_AIC = TRUE,
max_steps = 100,
cv_iter = 5,
cv_folds = 10,
folds = NULL,
Huber_p = 1.345,
classification = FALSE,
start_genes = NULL,
start_env = NULL,
eps = 0.01,
maxiter = 100,
family = gaussian,
ylim = NULL,
seed = NULL,
print = TRUE,
remove_miss = FALSE,
test_only = FALSE
)
```

`data` |
data.frame of the dataset to be used. |

`formula` |
Model formula. Use |

`interactive_mode` |
If TRUE, uses interactive mode. In interactive mode, at each iteration, the user is shown the AIC, BIC, p-value and also the cross-validation |

`genes_original` |
data.frame of the variables inside the genetic score |

`env_original` |
data.frame of the variables inside the environmental score |

`genes_extra` |
data.frame of the additionnal variables to try including inside the genetic score |

`env_extra` |
data.frame of the variables to try including inside the environmental score |

`search_type` |
If |

`search` |
If |

`search_criterion` |
Criterion used to determine which variable is the best to add or worst to drop. If |

`forward_exclude_p_bigger` |
If p-value > |

`backward_exclude_p_smaller` |
If p-value < |

`exclude_worse_AIC` |
If AIC with variable > AIC without variable, we ignore the variable (Default = TRUE). This is an exclusion option which purpose is skipping variables that are likely not worth looking to make the algorithm faster, especially with cross-validation. Set to FALSE to prevent any exclusion here. |

`max_steps` |
Maximum number of steps taken (Default = 50). |

`cv_iter` |
Number of cross-validation iterations (Default = 5). |

`cv_folds` |
Number of cross-validation folds (Default = 10). Using |

`folds` |
Optional list of vectors containing the fold number for each observation. Bypass cv_iter and cv_folds. Setting your own folds could be important for certain data types like time series or longitudinal data. |

`Huber_p` |
Parameter controlling the Huber cross-validation error (Default = 1.345). |

`classification` |
Set to TRUE if you are doing classification (binary outcome). |

`start_genes` |
Optional starting points for genetic score (must be the same length as the number of columns of |

`start_env` |
Optional starting points for environmental score (must be the same length as the number of columns of |

`eps` |
Threshold for convergence (.01 for quick batch simulations, .0001 for accurate results). |

`maxiter` |
Maximum number of iterations. |

`family` |
Outcome distribution and link function (Default = gaussian). |

`ylim` |
Optional vector containing the known min and max of the outcome variable. Even if your outcome is known to be in [a,b], if you assume a Gaussian distribution, predict() could return values outside this range. This parameter ensures that this never happens. This is not necessary with a distribution that already assumes the proper range (ex: [0,1] with binomial distribution). |

`seed` |
Seed for cross-validation folds. |

`print` |
If TRUE, print all the steps and notes/warnings. Highly recommended unless you are batch running multiple stepwise searchs. (Default=TRUE). |

`remove_miss` |
If TRUE, remove missing data completely, otherwise missing data is only removed when adding or dropping a variable (Default = FALSE). |

`test_only` |
If TRUE, only uses the first fold for training and predict the others folds; do not train on the other folds. So instead of cross-validation, this gives you train/test and you get the test R-squared as output. |

Returns an object of the class "LEGIT" which is list containing, in the following order: a glm fit of the main model, a glm fit of the genetic score, a glm fit of the environmental score, a list of the true model parameters (AIC, BIC, rank, df.residual, null.deviance) for which the individual model parts (main, genetic, environmental) don't estimate properly.

```
## Not run:
## Continuous example
train = example_3way(250, 2.5, seed=777)
# Forward search for genes based on BIC (in interactive mode)
forward_genes_BIC = stepwise_search(train$data, genes_extra=train$G, env_original=train$E,
formula=y ~ E*G*z,search_type="forward", search="genes", search_criterion="BIC",
interactive_mode=TRUE)
# Bidirectional-backward search for environments based on cross-validation error
bidir_backward_env_cv = stepwise_search(train$data, genes_original=train$G, env_original=train$E,
formula=y ~ E*G*z,search_type="bidirectional-backward", search="env", search_criterion="cv")
## Binary example
train_bin = example_2way(500, 2.5, logit=TRUE, seed=777)
# Forward search for genes based on cross-validated AUC (in interactive mode)
forward_genes_AUC = stepwise_search(train_bin$data, genes_extra=train_bin$G,
env_original=train_bin$E, formula=y ~ E*G,search_type="forward", search="genes",
search_criterion="cv_AUC", classification=TRUE, family=binomial, interactive_mode=TRUE)
# Forward search for genes based on AIC
bidir_forward_genes_AIC = stepwise_search(train_bin$data, genes_extra=train_bin$G,
env_original=train_bin$E, formula=y ~ E*G,search_type="bidirectional-forward", search="genes",
search_criterion="AIC", classification=TRUE, family=binomial)
## End(Not run)
```

Embedding an R snippet on your website

Add the following code to your website.

For more information on customizing the embed code, read Embedding Snippets.