gmap.ac.gpd | R Documentation |
The gmap.ac.gpd function performs genomic mediation analysis with Adaptive Permutation scheme and Adaptive Confunders. It tests for mediation effects for a set of user specified mediation trios(e.g., eQTL, cis- and trans-genes) in the genome with the assumption of the presence of cis-association. The gmap.ac.gpd function considers either a user provided pool of potential confounding variables, real or constructed by other methods, or all the PCs based on the feature data as the potential confounder pool. When the empirical P-value is small enough, the GPD fit is used to estimate a more accurate empirical P value.
It returns the mediation p-values(nominal P-value, empirical P-values obtained from ordinary calculations and empirical P-values estimated using GPD fitting), the coefficient of linear models(e.g, t_stat, std.error, beta, beta.total) and the proportions mediated(e.g., the percentage of reduction in trans-effects after accounting for cis-mediation) based on the mediation tests i) adjusting for known confounders only, and ii) adjusting for known confounders and adaptively selected potential confounders for each mediation trio.
gmap.ac.gpd( snp.dat, fea.dat, known.conf, trios.idx, cl = NULL, cov.pool = NULL, pc.num = 30, Minperm = 100, Maxperm = 10000, gpd.perm = 0.01, fdr = 0.05, fdr_filter = 0.1 )
snp.dat |
The eQTL genotype matrix. Each row is an eQTL, each column is a sample. |
fea.dat |
A feature profile matrix. Each row is for one feature, each column is a sample. |
known.conf |
A known confounders matrix which is adjusted in all mediation tests. Each row is a confounder, each column is a sample. |
trios.idx |
A matrix of selected trios indexes (row numbers) for
mediation tests. Each row consists of the index (i.e., row number) of the
eQTL in |
cl |
Parallel backend if it is set up. It is used for parallel
computing. We set |
cov.pool |
The pool of candidate confounding variables from which
potential confounders are adaptively selected to adjust for each trio. Each
row is a covariate, each column is a sample. We set |
pc.num |
If |
Minperm |
The minimum number of permutations. When the number of
permutation statistics better than the original statistic is greater than
|
Maxperm |
Maximum number of permutation. We set |
gpd.perm |
Decide when to use GPD to fit estimation parameters. When the
proportion of permutation better than the original statistic is greater
than par, the GPD is fitted to estimate the empirical P-value. We set
|
fdr |
The false discovery rate to select confounders. We set
|
fdr_filter |
The false discovery rate to filter common child and
intermediate variables. We set |
The function performs genomic mediation analysis with Adaptive
Permutation scheme and Adaptive Confunders. Adaptive Permutation
scheme
When using Fixed Permutation scheme, good estimation of
insignificant adjusted P-values can be achieved with few permutations while
many more are needed to estimate highly significant ones. Therefore, we
implemented an alternative permutation scheme that adapts the number of
permutations to the significance level of the variant–phenotype pairs
Adaptive Confunding adjustment
One challenge in mediation test in
genomic studies is how to adjust unmeasured confounding variables for the
cis- and trans-genes (i.e., mediator-outcome) relationship.The current
function adaptively selects the variables to adjust for each mediation trio
given a large pool of constructed or real potential confounding variables.
The function allows the input of variables known to be potential cis- and
trans-genes (mediator-outcome) confounders in all mediation tests
(known.conf
), and the input of the pool of candidate confounders
from which potential confounders for each mediation test will be adaptively
selected (cov.pool
). When no pool is provided (cov.pool =
NULL
), all the PCs based on feature profile (fea.dat
) will be
constructed as the potential confounder pool. calculate Empirical
P-values using GPD fitting
The use of a fixed number of permutations to
calculate empirical P-values has the disadvantage that the minimum
empirical P-value that can be calculated is 1/N. This makes a larger number
of permutations needed to calculate a smaller P-value. Therefore, we model
the tail of the permutation value as a Generalized Pareto
Distribution(GPD), enabling a smaller empirical P-value with fewer
permutation times. calculate Empirical P-values using GPD
fitting
The use of a fixed number of permutations to calculate empirical
P-values has the disadvantage that the minimum empirical P-value that can
be calculated is 1/N. This makes a larger number of permutations needed to
calculate a smaller P-value. Therefore, we model the tail of the
permutation value as a Generalized Pareto Distribution(GPD), enabling a
smaller empirical P-value with fewer permutation times.
The algorithm will return a list of nperm, empirical.p, empirical.p.gpd, nominal.p, beta, std.error, t_stat, beta.total, beta.change.
nperm |
The actual number of permutations for testing mediation. |
empirical.p |
The mediation empirical P-values with nperm times permutation. A matrix with dimension of the number of trios. |
empirical.p.gpd |
The mediation empirical P-values with nperm times permutation using GPD fit. A matrix with dimension of the number of trios. |
nominal.p |
The mediation nominal P-values. A matrix with dimension of the number of trios. |
std.error |
The return std.error value of feature1 for fit liner models. A matrix with dimension of the number of trios. |
t_stat |
The return t_stat value of feature1 for fit liner models. A matrix with dimension of the number of trios. |
beta |
The return beta value of feature2 for fit liner models in the case of feature1. A matrix with dimension of the number of trios. |
beta.total |
The return beta value of feature2 for fit liner models without considering feature1. A matrix with dimension of the number of trios. |
beta.change |
The proportions mediated. A matrix with dimension of the number of trios. |
pc.matrix |
PCs will be returned if the PCs based on expression data are used as the pool of potential confounders. Each column is a PC. |
sel.conf.ind |
An indicator matrix with dimension of the
number of trios by the number of covariates in |
Ongen H, Buil A, Brown AA, Dermitzakis ET, Delaneau O. (2016) Fast and efficient QTL mapper for thousands of molecular phenotypes. Bioinformatics. 2016;32:1479–1485. doi: 10.1093/bioinformatics/btv722
Yang F, Wang J, Consortium G, Pierce BL, Chen LS. (2017) Identifying cis-mediators for trans-eQTLs across many human tissues using genomic mediation analysis. Genome Research. 2017;27:1859–1871. doi: 10.1101/gr.216754.116
Knijnenburg TA, Wessels LFA, Reinders MJT, Shmulevich I. (2009) Fewer permutations, more accurate P-values. Bioinformatics. 2009;25:i161–i168. doi: 10.1093/bioinformatics/btp211
output <- gmap.ac.gpd(known.conf = dat$known.conf, fea.dat = dat$fea.dat, snp.dat = dat$snp.dat, trios.idx = dat$trios.idx[1:10,], Minperm = 100, Maxperm = 10000) ## Not run: ## generate a cluster with 2 nodes for parallel computing cl <- makeCluster(2) ## Use the specified candidate confusion variable pool ## When the empirical P-value is less than 0.02, a more accurate empirical P-value is estimated using the GPD fit. output <- gmap.ac.gpd(known.conf = dat$known.conf, fea.dat = dat$fea.dat, snp.dat = dat$snp.dat, trios.idx = dat$trios.idx[1:10,], cl = cl, cov.pool = dat$cov.pool, Minperm = 100, Maxperm = 10000, gpd.perm = 0.02) stopCluster(cl) ## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.