# knitr::opts_chunk$set(echo = TRUE)
We present a package for estimation of cis-eQTL effect sizes, using a new model called ACME which respects biological understanding of cis-eQTL action. The model, fully presented and validated in [@palowitch2017estimation], involves an additive effect of allele count and multiplicative component random noise (hence "ACME": Additive-Contribution, Multiplicative-Error), and is defined as
$$y_i = \log(\beta_0 + \beta_1 s_i) + Z_i^T \gamma + \epsilon_i$$
where
We estimate the model using a fast iterative algorithm.\ The algorithm estimates the model which is nonlinear only with respect to $\eta = \beta_1 / \beta_0$
$$y_i = \log(1 + s_i \eta) + \log(\beta_0) + Z_i^T \gamma + \epsilon_i$$
ACMEeqtl can be installed with the following command.
install.packages("ACMEeqtl")
ACMEeqtl package provides functions for analysis of a single gene-SNP pair as well as fast parallel testing of all local gene-SNP pairs.
library(pander) panderOptions("digits", 3) library(ACMEeqtl)
library(ACMEeqtl)
First we generate sample gene expression, SNP allele counts, and a set of covariates.
# Model parameters beta0 = 10000 beta1 = 50000 # Data dimensions nsample = 1000 ncvrt = 19 ### Data generation ### Zero average covariates cvrt = matrix(rnorm(nsample * ncvrt), nsample, ncvrt) cvrt = t(t(cvrt) - colMeans(cvrt)) # Generate SNPs s = rbinom(n = nsample, size = 2, prob = 0.2) # Generate log-normalized expression y = log(beta0 + beta1 * s) + cvrt %*% rnorm(ncvrt) + rnorm(nsample)
We provide two equivalent functions for model estimation.
effectSizeEstimationR
-- fully coded in ReffectSizeEstimationC
-- faster version with core coded in C.z1 = effectSizeEstimationR(s, y, cvrt) z2 = effectSizeEstimationC(s, y, cvrt) pander(rbind(z1,z2))
Variables z1
, z2
show that the estimation was done in r z1[3]
iterations, with estimated parameters
r as.character(round(z1[1],1))
(true parameter is r as.character(beta0)
)r as.character(round(z1[2],1))
(true parameter is r as.character(beta1)
)First we generate a eQTL dataset in filematrix format (see filematrix package).
tempdirectory = tempdir() z = create_artificial_data( nsample = 100, ngene = 100, nsnp = 501, ncvrt = 1, minMAF = 0.2, saveDir = tempdirectory, returnData = FALSE, savefmat = TRUE, savetxt = FALSE, verbose = FALSE)
In this example, we use 2 CPU cores (threads) for testing of all gene-SNP pairs within 100,000 bp.
multithreadACME( genefm = "gene", snpsfm = "snps", glocfm = "gene_loc", slocfm = "snps_loc", cvrtfm = "cvrt", acmefm = "ACME", cisdist = 1.5e+06, threads = 2, workdir = file.path(tempdirectory, "filematrices"), verbose = FALSE)
Now the filematrix ACME
holds estimations for all local gene-SNP pairs.
fm = fm.open(file.path(tempdirectory, "filematrices", "ACME")) TenResults = fm[,1:10] rownames(TenResults) = rownames(fm) close(fm) pander(t(TenResults))
Now we can estimate multi-SNP ACME models for each gene:
multisnpACME( genefm = "gene", snpsfm = "snps", glocfm = "gene_loc", slocfm = "snps_loc", cvrtfm = "cvrt", acmefm = "ACME", workdir = file.path(tempdirectory, "filematrices"), genecap = Inf, verbose = FALSE)
Now the filematrix ACME_multiSNP
holds estimations for all multi-SNP models.
fm = fm.open(file.path(tempdirectory, "filematrices", "ACME_multiSNP")) TenResults = fm[,1:10] rownames(TenResults) = rownames(fm) close(fm) pander(t(TenResults))
Note that each multi-SNP model will contain at least one SNP, even if that initial SNP was not significant under the single-SNP models. This initial SNP will be the one with the highest adjusted-R$^2$ value among the single-SNP models. However, after the initial SNP, further SNPs are added only if the combined model's adjusted-R$^2$ is greater than that from the previous combined model.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.