gmfp.ac.gpd: Genomic Mediation analysis with Fixed Permutation scheme and...

View source: R/gmfp.ac.gpd.R

gmfp.ac.gpdR Documentation

Genomic Mediation analysis with Fixed Permutation scheme and Adaptive Confunders and Generalized Pareto Distribution(GPD)

Description

The gmfp.ac.gpd function performs genomic mediation analysis with Fixed Permutation scheme and Adaptive Confunders. It tests for mediation effects for a set of user specified mediation trios(e.g., eQTL, cis- and trans-genes) in the genome with the assumption of the presence of cis-association. The gmfp.ac.gpd function considers either a user provided pool of potential confounding variables, real or constructed by other methods, or all the PCs based on the feature data as the potential confounder pool. When the empirical P-value is small enough, the GPD fit is used to estimate a more accurate empirical P value.

It returns the mediation p-values(nominal P-value, empirical P-values obtained from ordinary calculations and empirical P-values estimated using GPD fitting), the coefficient of linear models(e.g, t_stat, std.error, beta, beta.total), and the proportions mediated(e.g., the percentage of reduction in trans-effects after accounting for cis-mediation) based on the mediation tests i) adjusting for known confounders only, and ii) adjusting for known confounders and adaptively selected potential confounders for each mediation trio.

Usage

gmfp.ac.gpd(
  snp.dat,
  fea.dat,
  known.conf,
  trios.idx,
  cl = NULL,
  cov.pool = NULL,
  pc.num = 30,
  nperm = 10000,
  gpd.perm = 0.01,
  fdr = 0.05,
  fdr_filter = 0.1
)

Arguments

snp.dat

The eQTL genotype matrix. Each row is an eQTL, each column is a sample.

fea.dat

A feature profile matrix. Each row is for one feature, each column is a sample.

known.conf

A confounders matrix which is adjusted in all mediation tests. Each row is a confounder, each column is a sample.

trios.idx

A matrix of selected trios indexes (row numbers) for mediation tests. Each row consists of the index (i.e., row number) of the eQTL in snp.dat, the index of cis-gene feature in fea.dat, and the index of trans-gene feature in fea.dat. The dimension is the number of trios by three.

cl

Parallel backend if it is set up. It is used for parallel computing. We set cl=NULL as default.

cov.pool

The pool of candidate confounding variables from which potential confounders are adaptively selected to adjust for each trio. Each row is a covariate, each column is a sample. We set cov.pool=NULL as default, which will calculate PCs of features as cov.pool.

pc.num

If cov.pool=NULL, use the previous num PCs as cov.pool.We set pc.num=30 as default. Please ensure the value is less than the column of the pool.

nperm

The number of permutations for testing mediation. If nperm=0, only the nominal P-value is calculated. We set nperm=10000 as default.

gpd.perm

Decide when to use GPD to fit estimation parameters. When the proportion of permutation better than the original statistic is greater than par, the GPD is fitted to estimate the empirical P-value. We set gpd.perm=0.01 as default.

fdr

The false discovery rate to select confounders. We set fdr=0.05 as default.

fdr_filter

The false discovery rate to filter common child and intermediate variables. We set fdr_filter=0.1 as default.

Details

The function performs genomic mediation analysis with Fixed Permutation scheme and Adaptive Confunders. Fixed Permutation schemeWhen calculating the empirical P-value, the data is permutated by a fixed number of times, and the statistics after permutation are separately calculated. Assuming that the number of permutation is N, where the number of permutation statistics that is better than the original statistic is M, then the Empirical P-value = (M + 1) / (N + 1). Adaptive Confunding adjustment One challenge in mediation test in genomic studies is how to adjust unmeasured confounding variables for the cis- and trans-genes (i.e., mediator-outcome) relationship.The current function adaptively selects the variables to adjust for each mediation trio given a large pool of constructed or real potential confounding variables. The function allows the input of variables known to be potential cis- and trans-genes (mediator-outcome) confounders in all mediation tests (known.conf), and the input of the pool of candidate confounders from which potential confounders for each mediation test will be adaptively selected (cov.pool). When no pool is provided (cov.pool = NULL), all the PCs based on feature profile (fea.dat) will be constructed as the potential confounder pool. calculate Empirical P-values using GPD fittingThe use of a fixed number of permutations to calculate empirical P-values has the disadvantage that the minimum empirical P-value that can be calculated is 1/N. This makes a larger number of permutations needed to calculate a smaller P-value. Therefore, we model the tail of the permutation value as a Generalized Pareto Distribution(GPD), enabling a smaller empirical P-value with fewer permutation times.

Value

The algorithm will return a list of empirical.p, empirical.p.gpd, nominal.p, beta, std.error, t_stat, beta.total, beta.change.

empirical.p

The mediation Empirical P-values with nperm times permutation. A matrix with dimension of the number of trios.

empirical.p.gpd

The mediation empirical P-values with nperm times permutation using GPD fit. A matrix with dimension of the number of trios.

nominal.p

The mediation nominal P-values. A matrix with dimension of the number of trios.

std.error

The return std.error value of feature1 for fit liner models. A matrix with dimension of the number of trios.

t_stat

The return t_stat value of feature1 for fit liner models. A matrix with dimension of the number of trios.

beta

The return beta value of feature2 for fit liner models in the case of feature1. A matrix with dimension of the number of trios.

beta.total

The return beta value of feature2 for fit liner models without considering feature1. A matrix with dimension of the number of trios.

beta.change

The proportions mediated. A matrix with dimension of the number of trios.

pc.matrix

PCs will be returned if the PCs based on feature data are used as the pool of potential confounders. Each column is a PC.

sel.conf.ind

An indicator matrix with dimension of the number of trios by the number of covariates in cov.pool or pc.matrixif the principal components (PCs) based on feature data are used as the pool of potential confounders.

References

Yang F, Wang J, Consortium G, Pierce BL, Chen LS. (2017) Identifying cis-mediators for trans-eQTLs across many human tissues using genomic mediation analysis. Genome Research. 2017;27:1859–1871. doi: 10.1101/gr.216754.116

Knijnenburg TA, Wessels LFA, Reinders MJT, Shmulevich I. (2009) Fewer permutations, more accurate P-values. Bioinformatics. 2009;25:i161–i168. doi: 10.1093/bioinformatics/btp211

Examples


output <- gmfp.ac.gpd(known.conf = dat$known.conf, fea.dat = dat$fea.dat,
                      snp.dat = dat$snp.dat, trios.idx = dat$trios.idx[1:10,], nperm = 100)

## Not run: 
  ## generate a cluster with 2 nodes for parallel computing
  cl <- makeCluster(2)

  ## Use the specified candidate confusion variable pool
  ## When the empirical P-value is less than 0.02, a more accurate
     empirical P-value is estimated using the GPD fit.
  output <- gmfp.ac.gpd(known.conf = dat$known.conf, fea.dat = dat$fea.dat,
                        snp.dat = dat$snp.dat, trios.idx = dat$trios.idx[1:10,],
                        cl = cl, cov.pool = dat$cov.pool, nperm = 100, gpd.perm = 0.02)

  stopCluster(cl)

## End(Not run)


QidiPeng/eQTLMAPT documentation built on Jan. 25, 2023, 11:03 p.m.