Interpretable eQTL Effect Sizes Using a Log of Linear Model"

# knitr::opts_chunk$set(echo = TRUE)

Introduction

We present a package for estimation of cis-eQTL effect sizes, using a new model called ACME which respects biological understanding of cis-eQTL action. The model, fully presented and validated in [@palowitch2017estimation], involves an additive effect of allele count and multiplicative component random noise (hence "ACME": Additive-Contribution, Multiplicative-Error), and is defined as

$$y_i = \log(\beta_0 + \beta_1 s_i) + Z_i^T \gamma + \epsilon_i$$

where

We estimate the model using a fast iterative algorithm.\ The algorithm estimates the model which is nonlinear only with respect to $\eta = \beta_1 / \beta_0$

$$y_i = \log(1 + s_i \eta) + \log(\beta_0) + Z_i^T \gamma + \epsilon_i$$

Installation

ACMEeqtl can be installed with the following command.

install.packages("ACMEeqtl")

Using the Package

ACMEeqtl package provides functions for analysis of a single gene-SNP pair as well as fast parallel testing of all local gene-SNP pairs.

library(pander)
panderOptions("digits", 3)
library(ACMEeqtl)
library(ACMEeqtl)

Testing a Single Gene-SNP Pair

First we generate sample gene expression, SNP allele counts, and a set of covariates.

# Model parameters
beta0 = 10000
beta1 = 50000

# Data dimensions
nsample = 1000
ncvrt = 19

### Data generation
### Zero average covariates
cvrt = matrix(rnorm(nsample * ncvrt), nsample, ncvrt)
cvrt = t(t(cvrt) - colMeans(cvrt))

# Generate SNPs
s = rbinom(n = nsample, size = 2, prob = 0.2)

# Generate log-normalized expression
y = log(beta0 + beta1 * s) + 
    cvrt %*% rnorm(ncvrt) + 
    rnorm(nsample)

We provide two equivalent functions for model estimation.

z1 = effectSizeEstimationR(s, y, cvrt)
z2 = effectSizeEstimationC(s, y, cvrt)

pander(rbind(z1,z2))

Variables z1, z2 show that the estimation was done in r z1[3] iterations, with estimated parameters

Testing All Local Gene-SNP Pairs

First we generate a eQTL dataset in filematrix format (see filematrix package).

tempdirectory = tempdir()
z = create_artificial_data(
    nsample = 100,
    ngene = 100,
    nsnp = 501,
    ncvrt = 1,
    minMAF = 0.2,
    saveDir = tempdirectory,
    returnData = FALSE,
    savefmat = TRUE,
    savetxt = FALSE,
    verbose = FALSE)

In this example, we use 2 CPU cores (threads) for testing of all gene-SNP pairs within 100,000 bp.

multithreadACME(
    genefm = "gene",
    snpsfm = "snps",
    glocfm = "gene_loc",
    slocfm = "snps_loc",
    cvrtfm = "cvrt",
    acmefm = "ACME",
    cisdist = 1.5e+06,
    threads = 2,
    workdir = file.path(tempdirectory, "filematrices"),
    verbose = FALSE)

Now the filematrix ACME holds estimations for all local gene-SNP pairs.

fm = fm.open(file.path(tempdirectory, "filematrices", "ACME"))
TenResults = fm[,1:10]
rownames(TenResults) = rownames(fm)
close(fm)

pander(t(TenResults))

Testing Multi-SNP Model for All Local Gene-SNP Pairs

Now we can estimate multi-SNP ACME models for each gene:

multisnpACME(
    genefm = "gene",
    snpsfm = "snps",
    glocfm = "gene_loc",
    slocfm = "snps_loc",
    cvrtfm = "cvrt",
    acmefm = "ACME",
    workdir = file.path(tempdirectory, "filematrices"),
    genecap = Inf,
    verbose = FALSE)

Now the filematrix ACME_multiSNP holds estimations for all multi-SNP models.

fm = fm.open(file.path(tempdirectory, "filematrices", "ACME_multiSNP"))
TenResults = fm[,1:10]
rownames(TenResults) = rownames(fm)
close(fm)

pander(t(TenResults))

Note that each multi-SNP model will contain at least one SNP, even if that initial SNP was not significant under the single-SNP models. This initial SNP will be the one with the highest adjusted-R$^2$ value among the single-SNP models. However, after the initial SNP, further SNPs are added only if the combined model's adjusted-R$^2$ is greater than that from the previous combined model.

References



Try the ACMEeqtl package in your browser

Any scripts or data that you put into this service are public.

ACMEeqtl documentation built on May 2, 2019, 4:03 p.m.