LEGIT: Latent Environmental & Genetic InTeraction (LEGIT) model

Description Usage Arguments Value References Examples

View source: R/LEGIT.R

Description

Constructs a generalized linear model (glm) with a weighted latent environmental score and weighted latent genetic score using alternating optimization.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
LEGIT(
  data,
  genes,
  env,
  formula,
  start_genes = NULL,
  start_env = NULL,
  eps = 0.001,
  maxiter = 100,
  family = gaussian,
  ylim = NULL,
  print = TRUE,
  print_steps = FALSE,
  crossover = NULL,
  crossover_fixed = FALSE,
  reverse_code = FALSE,
  rescale = FALSE,
  lme4 = FALSE
)

Arguments

data

data.frame of the dataset to be used.

genes

data.frame of the variables inside the genetic score G (can be any sort of variable, doesn't even have to be genetic).

env

data.frame of the variables inside the environmental score E (can be any sort of variable, doesn't even have to be environmental).

formula

Model formula. Use E for the environmental score and G for the genetic score. Do not manually code interactions, write them in the formula instead (ex: G*E*z or G:E:z).

start_genes

Optional starting points for genetic score (must be the same length as the number of columns of genes).

start_env

Optional starting points for environmental score (must be the same length as the number of columns of env).

eps

Threshold for convergence (.01 for quick batch simulations, .0001 for accurate results).

maxiter

Maximum number of iterations.

family

Outcome distribution and link function (Default = gaussian).

ylim

Optional vector containing the known min and max of the outcome variable. Even if your outcome is known to be in [a,b], if you assume a Gaussian distribution, predict() could return values outside this range. This parameter ensures that this never happens. This is not necessary with a distribution that already assumes the proper range (ex: [0,1] with binomial distribution).

print

If FALSE, nothing except warnings will be printed (Default = TRUE).

print_steps

If TRUE, print the parameters at all iterations, good for debugging (Default = FALSE).

crossover

If not NULL, estimates the crossover point of E using the provided value as starting point (To test for diathesis-stress vs differential susceptibility).

crossover_fixed

If TRUE, instead of estimating the crossover point of E, we force/fix it to the value of "crossover". (Used when creating a diathes-stress model) (Default = FALSE).

reverse_code

If TRUE, after fitting the model, the genes with negative weights are reverse coded (ex: g_rev = 1 - g). It assumes that the original coding is in [0,1]. The purpose of this option is to prevent genes with negative weights which cause interpretation problems (ex: depression normally decreases attention but with a negative genetic score, it increases attention). Warning, using this option with GxG interactions could cause nonsensical results since GxG could be inverted. Also note that this may fail with certain models (Default=FALSE).

rescale

If TRUE, the environmental variables are automatically rescaled to the range [-1,1]. This improves interpretability (Default=FALSE).

lme4

If TRUE, uses lme4::lmer or lme4::glmer; Note that is an experimental feature, bugs may arise and certain functions may fail. Currently only summary(), plot(), GxE_interaction_test(), LEGIT(), LEGIT_cv() work. Also note that the AIC and certain elements ignore the existence of the genes and environment variables, thus the AIC may not be used for variable selection of the genes and the environment. However, the AIC can still be used to compare models with the same genes and environments. (Default=FALSE).

Value

Returns an object of the class "LEGIT" which is list containing, in the following order: a glm fit of the main model, a glm fit of the genetic score, a glm fit of the environmental score, a list of the true model parameters (AIC, BIC, rank, df.residual, null.deviance) for which the individual model parts (main, genetic, environmental) don't estimate properly and the formula.

References

Alexia Jolicoeur-Martineau, Ashley Wazana, Eszter Szekely, Meir Steiner, Alison S. Fleming, James L. Kennedy, Michael J. Meaney, Celia M.T. Greenwood and the MAVAN team. Alternating optimization for GxE modelling with weighted genetic and environmental scores: examples from the MAVAN study (2017). arXiv:1703.08111.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
train = example_2way(500, 1, seed=777)
fit_best = LEGIT(train$data, train$G, train$E, y ~ G*E, train$coef_G, train$coef_E)
fit_default = LEGIT(train$data, train$G, train$E, y ~ G*E)
summary(fit_default)
summary(fit_best)

train = example_3way(500, 2.5, seed=777)
fit_best = LEGIT(train$data, train$G, train$E, y ~ G*E*z, train$coef_G, train$coef_E)
fit_default = LEGIT(train$data, train$G, train$E, y ~ G*E*z)
summary(fit_default)
summary(fit_best)

LEGIT documentation built on Jan. 12, 2022, 1:08 a.m.