powerEQTL.ANOVA: Power Calculation for EQTL Analysis Based on Un-Balanced... In powerEQTL: Power and Sample Size Calculation for Bulk Tissue and Single-Cell eQTL Analysis

Description

Power calculation for eQTL analysis that tests if a SNP is associated to a gene probe by using un-balanced one-way ANOVA. This function can be used to calculate one of the 3 parameters (power, sample size, and minimum allowable MAF) by setting the corresponding parameter as NULL and providing values for the other 2 parameters.

Usage

 1 2 3 4 5 6 7 8 9 powerEQTL.ANOVA(MAF, deltaVec=c(-0.13, 0.13), n=200, power = NULL, sigma = 0.13, FWER = 0.05, nTests = 200000, n.lower = 4, n.upper = 1e+30)

Arguments

 MAF numeric. Minor allele frequency. deltaVec numeric. A vector having 2 elements. The first element is equal to mu_2 - mu_1 and the second element is equal to mu_3 - mu_2, where mu_1 is the mean gene expression level for the mutation homozygotes, mu_2 is the mean gene expression level for the heterozygotes, and mu_3 is the mean gene expression level for the wild-type gene expression level. n integer. Total number of subjects. power numeric. Power for testing if 3 genotypes have the same mean gene expression levels. sigma numeric. Standard deviation of the random error. FWER numeric. Family-wise Type I error rate. nTests integer. Number of tests (i.e., number of all (SNP, gene) pairs) in eQTL analysis. n.lower numeric. Lower bound of the total number of subjects. Only used when calculating total number of subjects. n.upper numeric. Upper bound of the total number of subjects. Only used when calculating total number of subjects.

Details

If we would like to test potential non-linear relationship between genotype of a SNP and expression of a gene, we can use un-balanced one-way ANOVA. Actually, an article published by the GTEx Consortium in 2013 used this approach.

Suppose there are k=3 groups of subjects: (1) mutation homozygotes; (2) heterozygotes; and (3) wildtype homozygotes. We would like to test if the mean expression μ_i, i=1, …, k, of the gene is the same among the k groups of subjects. We can use the following one-way ANOVA model to characterize the relationship between observed gene expression level y_{ij} and the population mean expression level μ_i:

y_{ij} = μ_i + ε_{ij}, \quad ε_{ij} \sim N≤ft(0, σ^2\right),

where i=1,…, k, j= 1, …, n_i, y_{ij} is the observed gene expression level for the j-th subject in the i-th group, μ_i is the mean gene expression level of the i-th group, ε_{ij} is the random error, k is the number of groups, n_i is the number of subjects in the i-th group. Denote the total number of subjects as N = ∑_{i=1}^{k} n_i. That is, we have n_1 mutation homozygotes, n_2 heterozygotes, and n_3 wildtype homozygotes.

We would like to test the null hypothesis H_0 and alternative hypothesis H_1:

H_0: μ_1 = μ_2 = μ_3,

H_1: \mbox{not all means are the same}.

According to O'Brien and Muller (1993), the power calculation formula for unbalanced one-way ANOVA is

power = Pr(F >= F(1 - alpha, k - 1, N - k)| F ~ F(k - 1, N - k, lambda)),

where k = 3 is the number of groups of subjects, N is the total number of subjects, F_{1 - alpha}(k - 1, N - k) is the 100 * (1 - alpha)-th percentile of central F distribution with degrees of freedoms k - 1 and N - k, and F_{k - 1, N - k, lambda} is the non-central F distribution with degrees of freedoms k - 1 and N - k and non-central parameter (ncp) lambda. The ncp lambda is equal to

lambda = N * sum(wi * (mu_i - mu)^2, i = 1,.., k)/sigma^2,

where mu_i is the mean gene expression level for the i-th group of subjects, w_i is the weight for the i-th group of subjects, sigma^2 is the variance of the random errors in ANOVA (assuming each group has equal variance), and mu is the weighted mean gene expression level

mu = sum(w_i * mu_i, i = 1, ..., k).

The weights w_i=n_i/N are the sample proportions for the 3 groups of subjects, where N=n_1+n_2+n_3 is the total number of subjects. Hence, sum(w_i, i = 1, 2, 3) = 1. Based on Hardy-Weinberg Equilibrium, we have w_1 = θ^2, w_2 = 2θ(1-θ), and w_3 = (1-θ)^2, where θ is MAF.

Without loss of generality, we set μ_1 = -δ_1, μ_2=0, and μ_3 = δ_2.

We adopted the parameters from the GTEx cohort (see the Power analysis" section of Nature Genetics, 2013; https://www.nature.com/articles/ng.2653), where they modeled the expression data as having a log-normal distribution with a log standard deviation of 0.13 within each genotype class (AA, AB, BB). This level of noise is based on estimates from initial GTEx data. In their power analysis, they assumed the across-genotype difference delta = 0.13 (i.e., equivalent to detecting a log expression change similar to the standard deviation within a single genotype class).

Value

power if the input parameter power = NULL.

sample size (total number of subjects) if the input parameter n = NULL;

minimum detectable slope if the input parameter slope = NULL.

Author(s)

Xianjun Dong <XDONG@rics.bwh.harvard.edu>, Xiaoqi Li<xli85@bwh.harvard.edu>, Tzuu-Wang Chang <Chang.Tzuu-Wang@mgh.harvard.edu>, Scott T. Weiss <restw@channing.harvard.edu>, Weiliang Qiu <weiliang.qiu@gmail.com>

References

The GTEx Consortium. The Genotype-Tissue Expression (GTEx) project. Nature Genetics, 45:580-585, 2013.

O'Brien, RG and Muller, KE. Unified power analysis for t-tests through multivariate hypotheses. In LK Edwards, editor, Applied Analysis of Variance in Behavioral Science, pages 297-344. New York: Dekker, 1993.

Dong X, Li X, Chang T-W, Scherzer CR, Weiss ST, and Qiu W. powerEQTL: An R package and shiny application for sample size and power calculation of bulk tissue and single-cell eQTL analysis. Bioinformatics, 2021;, btab385

Examples

 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 # calculate power powerEQTL.ANOVA(MAF = 0.1, deltaVec = c(-0.13, 0.13), n = 282, power = NULL, sigma = 0.13, FWER = 0.05, nTests = 200000) # calculate sample size (total number of subjects) powerEQTL.ANOVA(MAF = 0.1, deltaVec = c(-0.13, 0.13), n = NULL, power = 0.8, sigma = 0.13, FWER = 0.05, nTests = 200000) # calculate minimum allowable MAF powerEQTL.ANOVA(MAF = NULL, deltaVec = c(-0.13, 0.13), n = 282, power = 0.8, sigma = 0.13, FWER = 0.05, nTests = 200000)

powerEQTL documentation built on July 22, 2021, 9:08 a.m.