IMLEGIT: Independent Multiple Latent Environmental & Genetic...

Description Usage Arguments Value References Examples

View source: R/LEGIT.R

Description

Constructs a generalized linear model (glm) with latent variables using alternating optimization. This is an extension of the LEGIT model to accommodate more than 2 latent variables.

Usage

1
2
IMLEGIT(data, latent_var, formula, start_latent_var = NULL, eps = 0.001,
  maxiter = 100, family = gaussian, ylim = NULL, print = TRUE)

Arguments

data

data.frame of the dataset to be used.

latent_var

list of data.frame. The elements of the list are the datasets used to construct each latent variable. For interpretability and proper convergence, not using the same variable in more than one latent variable is highly recommended. It is recommended to set names to the list elements to prevent confusion because otherwise, the latent variables will be named L1, L2, ... (See examples below for more details)

formula

Model formula. The names of latent_var can be used in the formula to represent the latent variables. If names(latent_var) is NULL, then L1, L2, ... can be used in the formula to represent the latent variables. Do not manually code interactions, write them in the formula instead (ex: G*E1*E2 or G:E1:E2).

start_latent_var

Optional list of starting points for each latent variable (The list must have the same length as the number of latent variables and each element of the list must have the same length as the number of variables of the corresponding latent variable).

eps

Threshold for convergence (.01 for quick batch simulations, .0001 for accurate results).

maxiter

Maximum number of iterations.

family

Outcome distribution and link function (Default = gaussian).

ylim

Optional vector containing the known min and max of the outcome variable. Even if your outcome is known to be in [a,b], if you assume a Gaussian distribution, predict() could return values outside this range. This parameter ensures that this never happens. This is not necessary with a distribution that already assumes the proper range (ex: [0,1] with binomial distribution).

print

If FALSE, nothing except warnings will be printed. (Default = TRUE).

Value

Returns an object of the class "IMLEGIT" which is list containing, in the following order: a glm fit of the main model, a list of the glm fits of the latent variables and a list of the true model parameters (AIC, BIC, rank, df.residual, null.deviance) for which the individual model parts (main, genetic, environmental) don't estimate properly.

References

Alexia Jolicoeur-Martineau, Ashley Wazana, Eszter Szekely, Meir Steiner, Alison S. Fleming, James L. Kennedy, Michael J. Meaney, Celia M.T. Greenwood and the MAVAN team. Alternating optimization for GxE modelling with weighted genetic and environmental scores: examples from the MAVAN study (2017). arXiv:1703.08111.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
train = example_2way(500, 1, seed=777)
fit_best = IMLEGIT(train$data, list(G=train$G, E=train$E), y ~ G*E, 
list(train$coef_G, train$coef_E))
fit_default = IMLEGIT(train$data, list(G=train$G, E=train$E), y ~ G*E)
summary(fit_default)
summary(fit_best)
train = example_3way_3latent(500, 1, seed=777)
fit_best = IMLEGIT(train$data, train$latent_var, y ~ G*E*Z, 
list(train$coef_G, train$coef_E, train$coef_Z))
fit_default = IMLEGIT(train$data, train$latent_var, y ~ G*E*Z)
summary(fit_default)
summary(fit_best)

Example output

Loading required package: formula.tools
Converged in 9 iterations
Converged in 11 iterations
$fit_main

Call:
stats::glm(formula = formula, family = family, data = data, model = FALSE, 
    y = FALSE)

Deviance Residuals: 
     Min        1Q    Median        3Q       Max  
-2.78312  -0.57544   0.01023   0.63266   2.72051  

Coefficients: (-7 not defined because of singularities)
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) -0.98949    0.04421 -22.382  < 2e-16 ***
G            2.07210    0.30017   6.903 1.59e-11 ***
E            3.08836    0.05005  61.706  < 2e-16 ***
G:E          5.26692    0.33619  15.666  < 2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for gaussian family taken to be 0.9588924)

    Null deviance: 4333.5  on 499  degrees of freedom
Residual deviance:  468.9  on 489  degrees of freedom
AIC: 1410.8

Number of Fisher Scoring iterations: 2


$fit_G

Call:
stats::glm(formula = formula_step[[i]], family = family, data = data, 
    model = FALSE, y = FALSE)

Deviance Residuals: 
     Min        1Q    Median        3Q       Max  
-2.78303  -0.57544   0.01055   0.63263   2.72214  

Coefficients: (-5 not defined because of singularities)
      Estimate Std. Error t value Pr(>|t|)    
g1     0.09097    0.02211   4.114 4.57e-05 ***
g2     0.06109    0.02739   2.230   0.0262 *  
g3    -0.30112    0.02053 -14.667  < 2e-16 ***
g4     0.10666    0.02193   4.863 1.56e-06 ***
g1_g3  0.18372    0.04182   4.393 1.37e-05 ***
g2_g3  0.25644    0.04550   5.636 2.94e-08 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for gaussian family taken to be 0.9588907)

    Null deviance: 4333.5  on 500  degrees of freedom
Residual deviance:  468.9  on 489  degrees of freedom
AIC: 1410.8

Number of Fisher Scoring iterations: 2


$fit_E

Call:
stats::glm(formula = formula_step[[i]], family = family, data = data, 
    model = FALSE, y = FALSE)

Deviance Residuals: 
     Min        1Q    Median        3Q       Max  
-2.78317  -0.57543   0.01023   0.63265   2.72052  

Coefficients: (-8 not defined because of singularities)
    Estimate Std. Error t value Pr(>|t|)    
e1 -0.458840   0.009451  -48.55   <2e-16 ***
e2  0.349199   0.009151   38.16   <2e-16 ***
e3  0.191962   0.009278   20.69   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for gaussian family taken to be 0.9588924)

    Null deviance: 4333.5  on 500  degrees of freedom
Residual deviance:  468.9  on 489  degrees of freedom
AIC: 1410.8

Number of Fisher Scoring iterations: 2


$fit_main

Call:
stats::glm(formula = formula, family = family, data = data, model = FALSE, 
    y = FALSE)

Deviance Residuals: 
     Min        1Q    Median        3Q       Max  
-2.78303  -0.57577   0.01051   0.63213   2.72088  

Coefficients: (-7 not defined because of singularities)
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) -0.98932    0.04421 -22.379  < 2e-16 ***
G            2.07182    0.30020   6.901  1.6e-11 ***
E            3.08877    0.05005  61.714  < 2e-16 ***
G:E          5.26743    0.33622  15.667  < 2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for gaussian family taken to be 0.9588912)

    Null deviance: 4333.5  on 499  degrees of freedom
Residual deviance:  468.9  on 489  degrees of freedom
AIC: 1410.8

Number of Fisher Scoring iterations: 2


$fit_G

Call:
stats::glm(formula = formula_step[[i]], family = family, data = data, 
    model = FALSE, y = FALSE)

Deviance Residuals: 
     Min        1Q    Median        3Q       Max  
-2.78296  -0.57577   0.01054   0.63210   2.72229  

Coefficients: (-5 not defined because of singularities)
      Estimate Std. Error t value Pr(>|t|)    
g1     0.09090    0.02211   4.111 4.62e-05 ***
g2     0.06104    0.02739   2.228   0.0263 *  
g3    -0.30118    0.02053 -14.671  < 2e-16 ***
g4     0.10661    0.02193   4.861 1.58e-06 ***
g1_g3  0.18379    0.04182   4.395 1.36e-05 ***
g2_g3  0.25649    0.04550   5.637 2.92e-08 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for gaussian family taken to be 0.9588899)

    Null deviance: 4333.5  on 500  degrees of freedom
Residual deviance:  468.9  on 489  degrees of freedom
AIC: 1410.8

Number of Fisher Scoring iterations: 2


$fit_E

Call:
stats::glm(formula = formula_step[[i]], family = family, data = data, 
    model = FALSE, y = FALSE)

Deviance Residuals: 
     Min        1Q    Median        3Q       Max  
-2.78307  -0.57577   0.01051   0.63212   2.72089  

Coefficients: (-8 not defined because of singularities)
    Estimate Std. Error t value Pr(>|t|)    
e1 -0.458845   0.009452  -48.55   <2e-16 ***
e2  0.349195   0.009151   38.16   <2e-16 ***
e3  0.191961   0.009278   20.69   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for gaussian family taken to be 0.9588912)

    Null deviance: 4333.5  on 500  degrees of freedom
Residual deviance:  468.9  on 489  degrees of freedom
AIC: 1410.8

Number of Fisher Scoring iterations: 2


Converged in 6 iterations
Converged in 8 iterations
$fit_main

Call:
stats::glm(formula = formula, family = family, data = data, model = FALSE, 
    y = FALSE)

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-3.2859  -0.6868   0.0383   0.6744   3.0719  

Coefficients: (-9 not defined because of singularities)
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) -2.18841    0.20357 -10.750  < 2e-16 ***
G            3.58237    1.10101   3.254  0.00122 ** 
E            2.91802    0.24181  12.067  < 2e-16 ***
Z            1.02907    0.06514  15.797  < 2e-16 ***
G:E          7.54922    1.44259   5.233 2.49e-07 ***
G:Z          1.61548    0.34724   4.652 4.24e-06 ***
E:Z         -1.53780    0.07488 -20.537  < 2e-16 ***
G:E:Z        1.41976    0.43812   3.241  0.00128 ** 
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for gaussian family taken to be 1.095196)

    Null deviance: 5134.92  on 499  degrees of freedom
Residual deviance:  528.98  on 483  degrees of freedom
AIC: 1483.1

Number of Fisher Scoring iterations: 2


$fit_G

Call:
stats::glm(formula = formula_step[[i]], family = family, data = data, 
    model = FALSE, y = FALSE)

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-3.2859  -0.6850   0.0383   0.6748   3.0726  

Coefficients: (-11 not defined because of singularities)
       Estimate Std. Error t value Pr(>|t|)    
g1     0.196216   0.009334  21.021  < 2e-16 ***
g2     0.163200   0.010855  15.035  < 2e-16 ***
g3    -0.289568   0.008224 -35.211  < 2e-16 ***
g4     0.114555   0.008830  12.974  < 2e-16 ***
g1_g3  0.058679   0.016888   3.475 0.000558 ***
g2_g3  0.177782   0.018256   9.738  < 2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for gaussian family taken to be 1.095187)

    Null deviance: 5134.92  on 500  degrees of freedom
Residual deviance:  528.98  on 483  degrees of freedom
AIC: 1483.1

Number of Fisher Scoring iterations: 2


$fit_E

Call:
stats::glm(formula = formula_step[[i]], family = family, data = data, 
    model = FALSE, y = FALSE)

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-3.2858  -0.6869   0.0383   0.6744   3.0725  

Coefficients: (-14 not defined because of singularities)
   Estimate Std. Error t value Pr(>|t|)    
e1 -0.44290    0.01185  -37.36   <2e-16 ***
e2  0.33701    0.01285   26.24   <2e-16 ***
e3  0.22008    0.01279   17.21   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for gaussian family taken to be 1.095196)

    Null deviance: 5134.92  on 500  degrees of freedom
Residual deviance:  528.98  on 483  degrees of freedom
AIC: 1483.1

Number of Fisher Scoring iterations: 2


$fit_Z

Call:
stats::glm(formula = formula_step[[i]], family = family, data = data, 
    model = FALSE, y = FALSE)

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-3.2861  -0.6868   0.0385   0.6743   3.0723  

Coefficients: (-14 not defined because of singularities)
   Estimate Std. Error t value Pr(>|t|)    
z1  0.18158    0.02052   8.849  < 2e-16 ***
z2  0.74541    0.02135  34.921  < 2e-16 ***
z3  0.07302    0.01985   3.679  0.00026 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for gaussian family taken to be 1.095196)

    Null deviance: 5134.92  on 500  degrees of freedom
Residual deviance:  528.98  on 483  degrees of freedom
AIC: 1483.1

Number of Fisher Scoring iterations: 2


$fit_main

Call:
stats::glm(formula = formula, family = family, data = data, model = FALSE, 
    y = FALSE)

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-3.2843  -0.6890   0.0256   0.6746   3.0711  

Coefficients: (-9 not defined because of singularities)
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) -2.18181    0.20304 -10.746  < 2e-16 ***
G            3.60279    1.10178   3.270  0.00115 ** 
E            2.92403    0.24123  12.122  < 2e-16 ***
Z            1.02969    0.06497  15.849  < 2e-16 ***
G:E          7.52947    1.44405   5.214 2.74e-07 ***
G:Z          1.61119    0.34747   4.637 4.56e-06 ***
E:Z         -1.53617    0.07468 -20.570  < 2e-16 ***
G:E:Z        1.42947    0.43858   3.259  0.00120 ** 
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for gaussian family taken to be 1.095191)

    Null deviance: 5134.92  on 499  degrees of freedom
Residual deviance:  528.98  on 483  degrees of freedom
AIC: 1483.1

Number of Fisher Scoring iterations: 2


$fit_G

Call:
stats::glm(formula = formula_step[[i]], family = family, data = data, 
    model = FALSE, y = FALSE)

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-3.2843  -0.6911   0.0256   0.6693   3.0705  

Coefficients: (-11 not defined because of singularities)
       Estimate Std. Error t value Pr(>|t|)    
g1     0.195433   0.009330  20.946  < 2e-16 ***
g2     0.162524   0.010851  14.977  < 2e-16 ***
g3    -0.290351   0.008221 -35.317  < 2e-16 ***
g4     0.113960   0.008828  12.909  < 2e-16 ***
g1_g3  0.059325   0.016885   3.514 0.000484 ***
g2_g3  0.178407   0.018251   9.775  < 2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for gaussian family taken to be 1.095183)

    Null deviance: 5134.92  on 500  degrees of freedom
Residual deviance:  528.97  on 483  degrees of freedom
AIC: 1483.1

Number of Fisher Scoring iterations: 2


$fit_E

Call:
stats::glm(formula = formula_step[[i]], family = family, data = data, 
    model = FALSE, y = FALSE)

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-3.2844  -0.6890   0.0256   0.6746   3.0706  

Coefficients: (-14 not defined because of singularities)
   Estimate Std. Error t value Pr(>|t|)    
e1 -0.44272    0.01185  -37.36   <2e-16 ***
e2  0.33728    0.01285   26.25   <2e-16 ***
e3  0.21999    0.01278   17.21   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for gaussian family taken to be 1.095191)

    Null deviance: 5134.92  on 500  degrees of freedom
Residual deviance:  528.98  on 483  degrees of freedom
AIC: 1483.1

Number of Fisher Scoring iterations: 2


$fit_Z

Call:
stats::glm(formula = formula_step[[i]], family = family, data = data, 
    model = FALSE, y = FALSE)

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-3.2841  -0.6890   0.0254   0.6747   3.0708  

Coefficients: (-14 not defined because of singularities)
   Estimate Std. Error t value Pr(>|t|)    
z1  0.18131    0.02054   8.830  < 2e-16 ***
z2  0.74581    0.02136  34.913  < 2e-16 ***
z3  0.07287    0.01986   3.669  0.00027 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for gaussian family taken to be 1.09519)

    Null deviance: 5134.92  on 500  degrees of freedom
Residual deviance:  528.98  on 483  degrees of freedom
AIC: 1483.1

Number of Fisher Scoring iterations: 2

LEGIT documentation built on June 24, 2018, 5:01 p.m.