# Example 2: Using SEAGLE with Simulated Data In SEAGLE: Scalable Exact Algorithm for Large-Scale Set-Based Gene-Environment Interaction Tests

This tutorial demonstrates how to use the SEAGLE package when the user inputs a matrix ${\bf G}$. We'll begin by loading the SEAGLE package.

knitr::opts_chunk$set( collapse = TRUE, comment = "#>" )  library(SEAGLE)  As an example, we'll generate some synthetic data for usage in this tutorial. Let's consider a dataset with$n=5000$individuals and$L=100$loci, where the first$40$are causal. The makeSimData function generates a covariate matrix$\widetilde{\bf X} \in \mathbb{R}^{n \times 3}$, where the first column is the all ones vector for the intercept and the second and third columns are${\bf X} \sim \text{N}(0,1)$and${\bf E} \sim \text{N}(0,1)$, respectively. The last two columns are scaled to have$0$mean and unit variance. The makeSimData function additionally generates the genetic marker matrix${\bf G}$with synthetic haplotype data from the COSI software. Detailed procedures for generating${\bf G}$can be found in the accompanying journal manuscript. Finally, the makeSimData function also generates a continuous phenotype${\bf y}$according to the following fixed effects model $${\bf y} = \tilde{\bf X} \boldsymbol{\gamma}{\widetilde{\bf X}} + {\bf G}\boldsymbol{\gamma}{G} + \text{diag}(E){\bf G}\boldsymbol{\gamma}{GE} + {\bf e}.$$ Here,$\boldsymbol{\gamma}{\tilde{\bf X}}$is the all ones vector of length$P=3$,$\boldsymbol{\gamma}{G} \in \mathbb{R}^{L}$,$\boldsymbol{\gamma}{GE}\in \mathbb{R}^{L}$, and${\bf e} \sim \text{N}({\bf 0}, \sigma\, {\bf I}{n})$. The entries of$\boldsymbol{\gamma}{G}$and$\boldsymbol{\gamma}{GE}$pertaining to causal loci are set to be$\gamma{G}$= gammaG and$\gamma_{GE}$= gammaGE, respectively. The remaining entries of$\boldsymbol{\gamma}{G}$and$\boldsymbol{\gamma}{GE}$pertaining to non-causal loci are set to$0$. dat <- makeSimData(H=cosihap, n=5000, L=100, gammaG=1, gammaGE=0, causal=40, seed=1)  Now that we have our data, we can prepare it for use in the SEAGLE algorithm. We will input our${\bf y}$,${\bf X}$,${\bf E}$, and${\bf G}$into the prep.SEAGLE function. The intercept = 1 parameter indicates that the first column of${\bf X}$is the all ones vector for the intercept. This preparation procedure formats the input data for the SEAGLE function by checking the dimensions of the input data. It also pre-computes a QR decomposition for$\widetilde{\bf X} = \begin{pmatrix} {\bf 1}{n} & {\bf X} & {\bf E} \end{pmatrix}$, where${\bf 1}{n}$denotes the all ones vector of length$n$. objSEAGLE <- prep.SEAGLE(y=dat$y, X=dat$X, intercept=1, E=dat$E, G=dat$G)  Finally, we'll input the prepared data into the SEAGLE function to compute the score-like test statistic$T$and its corresponding p-value. The init.tau and init.sigma parameters are the initial values for$\tau$and$\sigma$employed in the REML EM algorithm. res <- SEAGLE(objSEAGLE, init.tau=0.5, init.sigma=0.5) res$T
res$pv  The score-like test statistic$T$for the G$\times$E effect and its corresponding p-value can be found in res$T and res\$pv, respectively.

## Try the SEAGLE package in your browser

Any scripts or data that you put into this service are public.

SEAGLE documentation built on Nov. 6, 2021, 1:06 a.m.