In etma: Epistasis Test in Meta-Analysis

%\VignetteEngine{knitr::knitr} %\VignetteIndexEntry{etma} %\VignetteEncoding{UTF-8}

knitr::opts_chunk$set(echo = TRUE)

A tutorial of epistasis detection using ETMA.

Introduction:

Epistasis Test in Meta-Analysis (ETMA) is a statistical method using summary data from genetic association studies to detect gene-gene interaction. This package etma has a main function for detecting epistasis using ETMA, and contains three complete example data sets.

Background:

Conventional genome-wide association studies (GWAS) have been proven to be a successful strategy for identifying genetic variants associated with complex human traits. However, there is still a large heritability gap between GWAS and transitional family studies. The ‘missing heritability’ has been suggested to be due to lack of studies focused on epistasis, also called gene–gene interactions, because individual trials have often had insufficient sample size. Meta-analysis is a common method for increasing statistical power. However, sufficient detailed information is difficult to obtain. A previous study employed a meta-regression-based method to detect epistasis, but it faced the challenge of inconsistent estimates. Here, we describe a Markov chain Monte Carlo-based method, called ‘Epistasis Test in Meta-Analysis’ (ETMA), which uses genotype summary data to obtain consistent estimates of epistasis effects in meta-analysis.

Installation:

User may open the main R window and enter the following text to install etma package (assuming an internet connection and appropriate access rights on the computer):

install.packages("etma")

After installation, the user will need to enter the following text to load the etma package:

library(etma)

Datasets:

Use the data command to load these data and the print command to view them as follows. To analyze the data, use help(read.table) to view the details. User can use the help command to view the detailed definition of variables.

GSTs family and cancer

data(data.GST)
head(data.GST)

PAH metabolism pathway and oral cancer

data(data.PAH)
head(data.PAH)

RAS and chronic kidney disease

data(data.RAS)
head(data.RAS)

Simple example:

The main function of etma package is ‘ETMA’, and ETMA use an n by 8 matrix including the numbers of variants of SNP1 and SNP2 in case and control in each study (n is the number of studies) to analyse gene-gene interaction. Thus, the inputs of ETMA function include: (1) the number of wild type of SNP1 in case group, (2) the number of mutation type of SNP1 in case group, (3) the number of wild type of SNP1 in control group, (4) the number of mutation type of SNP1 in control group, (5) the number of wild type of SNP2 in case group, (6) the number of mutation type of SNP2 in case group, (7) the number of wild type of SNP2 in control group, and (8) the number of mutation type of SNP1 in control group.

Because ETMA is based on MCMC and a 2-steps iteration process, the main options of ETMA function include: (1) the maximum number of iterations (default is 20), (2) the length of chain to obtain the study-level parameters in step 1 (default is 20,000), (3) the length of chain to obtain the global-level parameters in step 2 (default is 200,000), and (4) the start seed of this algorithm (default is a random seed). Moreover, user also can choose whether want to export MCMC plots in each iterations.

The main outputs include: (1) the beta values (logarithmic ORs) of each SNP and interaction term, (2) the variance covariance matrix of beta value, and (3) the p matrix in iterations process. According these outputs, we can calculate ORs, their confidence intervals, and p values.

Use the ETMA command to analyze gene–gene interaction using ETMA and save the results to ggint.toy (Note: the computing time in this example is about 3-5 secs).

ggint.toy=ETMA(case.ACE.0,case.ACE.1,ctrl.ACE.0,ctrl.ACE.1,
                  case.AGT.0,case.AGT.1,ctrl.AGT.0,ctrl.AGT.1,
                  data=data.RAS,iterations.step1=100,iterations.step2=300,
                  start.seed=1,show.detailed.plot=FALSE,show.final.plot=FALSE)

After the analysis, use the print and summary commands to view the result of gene–gene interaction analysis.

print(ggint.toy)
summary(ggint.toy)

Complete example:

Following examples are complete examples. They need 20,000/200,000 learning time in step 1/step 2, respectively (default). Please note they need more than 15 mins, and one of example need about 3 hrs. The complete learning time is necessary in real data analysis. Please use default setting as following to analysis your data.

GSTs family and cancer (note: the computing time for this example is about 3 h):

ggint1=ETMA(case.GSTM1.0,case.GSTM1.1,ctrl.GSTM1.0,ctrl.GSTM1.1,
           case.GSTT1.0,case.GSTT1.1,ctrl.GSTT1.0,ctrl.GSTT1.1,
           data=data.GST,start.seed=1,show.detailed.plot=TRUE,show.final.plot=TRUE)
print(ggint1)
summary(ggint1)

PAH metabolism pathway and oral cancer (note: the computing time for this example is about 15 min):

ggint2=ETMA(case.CYP1A1.0,case.CYP1A1.1,ctrl.CYP1A1.0,ctrl.CYP1A1.1,
           case.GSTM1.0,case.GSTM1.1,ctrl.GSTM1.0,ctrl.GSTM1.1,
           data=data.PAH,start.seed=1,show.detailed.plot=TRUE,show.final.plot=TRUE)
print(ggint2)
summary(ggint2)

RAS and chronic kidney disease (note: the computing time for this example is about 15 min):

ggint3=ETMA(case.ACE.0,case.ACE.1,ctrl.ACE.0,ctrl.ACE.1,
           case.AGT.0,case.AGT.1,ctrl.AGT.0,ctrl.AGT.1,
           data=data.RAS,start.seed=1,show.detailed.plot=TRUE,show.final.plot=TRUE)
print(ggint3)
summary(ggint3)