# knitr::opts_chunk$set(echo = TRUE)
We present a package for estimation of cis-eQTL effect sizes, using a new model called ACME which respects biological understanding of cis-eQTL action. The model, fully presented and validated in [@palowitch2017estimation], involves an additive effect of allele count and multiplicative component random noise (hence "ACME": Additive-Contribution, Multiplicative-Error), and is defined as
$$y_i = \log(\beta_0 + \beta_1 s_i) + Z_i^T \gamma + \epsilon_i$$
where
We estimate the model using a fast iterative algorithm.\ The algorithm estimates the model which is nonlinear only with respect to $\eta = \beta_1 / \beta_0$
$$y_i = \log(1 + s_i \eta) + \log(\beta_0) + Z_i^T \gamma + \epsilon_i$$
ACMEeqtl can be installed with the following command.
install.packages("ACMEeqtl")
ACMEeqtl package provides functions for analysis of a single gene-SNP pair as well as fast parallel testing of all local gene-SNP pairs.
library(pander) panderOptions("digits", 3) library(ACMEeqtl)
library(ACMEeqtl)
First we generate sample gene expression, SNP allele counts, and a set of covariates.
# Model parameters beta0 = 10000 beta1 = 50000 # Data dimensions nsample = 1000 ncvrt = 19 ### Data generation ### Zero average covariates cvrt = matrix(rnorm(nsample * ncvrt), nsample, ncvrt) cvrt = t(t(cvrt) - colMeans(cvrt)) # Generate SNPs s = rbinom(n = nsample, size = 2, prob = 0.2) # Generate log-normalized expression y = log(beta0 + beta1 * s) + cvrt %*% rnorm(ncvrt) + rnorm(nsample)
We provide two equivalent functions for model estimation.
effectSizeEstimationR
-- fully coded in ReffectSizeEstimationC
-- faster version with core coded in C.z1 = effectSizeEstimationR(s, y, cvrt) z2 = effectSizeEstimationC(s, y, cvrt) pander(rbind(z1,z2))
Variables z1
, z2
show that the estimation was done in r z1[3]
iterations, with estimated parameters
r as.character(round(z1[1],1))
(true parameter is r as.character(beta0)
)r as.character(round(z1[2],1))
(true parameter is r as.character(beta1)
)First we generate a eQTL dataset in filematrix format (see filematrix package).
tempdirectory = tempdir() z = create_artificial_data( nsample = 100, ngene = 100, nsnp = 501, ncvrt = 1, minMAF = 0.2, saveDir = tempdirectory, returnData = FALSE, savefmat = TRUE, savetxt = FALSE, verbose = FALSE)
In this example, we use 2 CPU cores (threads) for testing of all gene-SNP pairs within 100,000 bp.
multithreadACME( genefm = "gene", snpsfm = "snps", glocfm = "gene_loc", slocfm = "snps_loc", cvrtfm = "cvrt", acmefm = "ACME", cisdist = 1.5e+06, threads = 2, workdir = file.path(tempdirectory, "filematrices"), verbose = FALSE)
Now the filematrix ACME
holds estimations for all local gene-SNP pairs.
fm = fm.open(file.path(tempdirectory, "filematrices", "ACME")) TenResults = fm[,1:10] rownames(TenResults) = rownames(fm) close(fm) pander(t(TenResults))
Now we can estimate multi-SNP ACME models for each gene:
multisnpACME( genefm = "gene", snpsfm = "snps", glocfm = "gene_loc", slocfm = "snps_loc", cvrtfm = "cvrt", acmefm = "ACME", workdir = file.path(tempdirectory, "filematrices"), genecap = Inf, verbose = FALSE)
Now the filematrix ACME_multiSNP
holds estimations for all multi-SNP models.
fm = fm.open(file.path(tempdirectory, "filematrices", "ACME_multiSNP")) TenResults = fm[,1:10] rownames(TenResults) = rownames(fm) close(fm) pander(t(TenResults))
Note that each multi-SNP model will contain at least one SNP, even if that initial SNP was not significant under the single-SNP models. This initial SNP will be the one with the highest adjusted-R$^2$ value among the single-SNP models. However, after the initial SNP, further SNPs are added only if the combined model's adjusted-R$^2$ is greater than that from the previous combined model.
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.