Description Usage Arguments Details Value Note Author(s) References Examples
View source: R/powerEQTL.scRNAseq.sim.R
Power calculation for association between genotype and gene expression based on single cell RNAseq data via simulation from ZINB mixed effects regression model. This function can be used to calculate one of the 4 parameters (power, sample size, minimum detectable slope, and minimum allowable MAF) by setting the corresponding parameter as NULL and providing values for the other 3 parameters.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 | powerEQTL.scRNAseq.sim(slope,
n,
m,
power = NULL,
m.int = -1,
sigma.int = 1,
zero.p = 0.1,
theta = 1,
MAF = 0.2,
FWER = 0.05,
nTests = 1,
nSim = 1000,
estMethod = "GLMMadaptive",
nCores = 1,
n.lower = 2.01,
n.upper = 1e+4,
slope.lower = 1e-6,
slope.upper = log(1.0e+6),
MAF.lower = 0.05,
MAF.upper = 0.49
)
|
slope |
numeric. Slope (see details). |
n |
integer. Total number of subjects. |
m |
integer. Number of cells per subject. |
power |
numeric. Power for testing if the slope is equal to zero. |
m.int |
numeric. Mean of random intercept (see details). |
sigma.int |
numeric. Standard deviation of random intercept (see details). |
zero.p |
numeric. Probability that an excess zero occurs. |
theta |
numeric. Dispersion parameter of negative binomial distribution.
The smaller |
MAF |
numeric. Minor allele frequency of the SNP. |
FWER |
numeric. Family-wise Type I error rate. |
nTests |
integer. Number of tests (i.e., number of all (SNP, gene) pairs) in eQTL analysis. |
nSim |
integer. Number of simulated datasets to be generated. |
estMethod |
character. Indicates which method would be used to fit zero inflated negative binomial mixed effects model. Currently, the possible choice is “GLMMadaptive”. |
nCores |
integer. Number of computer cores used by |
n.lower |
numeric. Lower bound of the total number of subjects. Only used when calculating total number of subjects. |
n.upper |
numeric. Upper bound of the total number of subjects. Only used when calculating total number of subjects. |
slope.lower |
numeric. Lower bound of the slope. Only used when calculating minimum slope. |
slope.upper |
numeric. Upper bound of the slope Only used when calculating minimum slope. |
MAF.lower |
numeric. Lower bound of the MAF. Only used when calculating minimum MAF. |
MAF.upper |
numeric. Upper bound of the MAF Only used when calculating minimum MAF. |
This function calculates the power for testing if genotypes of a SNP is associated with the expression of a gene via nSim
datasets generated from zero-inflated negative binomial (ZINB) regression model.
Each dataset is generated from zero-inflated negative binomial mixed effects regression model with only one covariate: genotype. That is, the read counts of a gene follows a mixture of 2-component distributions. One component takes only one value: zero. The other component is negative binomial distribution, which takes non-negative values 0, 1, 2, .... The log mean of the negative binomial distribution is linear function of the genotype.
For each dataset, the p-value for testing if the slope for genotype is equal to zero will be calculated.
The proportion of p-values < α is the estimated power, where α = FWER/nTests.
Each simulated dataset contains gene expression levels of one gene and genotypes of one SNP for subjects with multiple cells. The gene expression levels (read counts) follow zero-inflated negative binomial distribution. Denote Y_{ij} as the read counts for the j-th cell of the i-th subject, i=1,…, n, j=1,…, m, n is the number of subjects, and m is the number of cells per subject. Denote p as the probability that Y_{ij}=0 is an excess zero. With probability 1-p, Y_{ij} follows a negative binomial distribution NB(μ, θ), where μ is the mean (i.e., μ=E(Y_{ij})) and θ is the dispersion parameter. The variance of the NB distribution is μ + μ^2/θ. The relationship between gene expression and genotype for the i-th subject is characterized by the equation
μ_i = \exp(β_{0i} + β_{1} x_i),
where β_{0i} is the random intercept following a normal distribution N(β_0, σ^2) to account for within-subject correlation of gene expression, β_0 is the mean of the random intercept, σ is the standard deviation of the random intercept, β_1 is the slope, and x_i is the additive-coded genotype for the SNP with minor allele frequency MAF.
We assume that the SNP satisfies the Hardy-Weinberg Equilibrium. That is, the probabilities of the 3 genotypes (0, 1, 2) are (1-MAF)^2, 2 MAF (1-MAF), MAF^2, respectively.
For simplicity, we assume that excess zeros are caused by technical issues, hence are not related to genotypes.
power if the input parameter power = NULL
.
sample size (total number of subjects) if the input parameter n = NULL
;
minimum detectable slope if the input parameter slope = NULL
;
minimum allowable MAF if the input parameter MAF = NULL
.
The speed of simulation approach is slow. It is recommended to run the function powerEQTL.scRNAseq.sim
in parallel computing environment (e.g., Unix/Linux) and set nCores > 1
.
Also, it is important to set appropriate ranges to search sample size, minimum detectable slope, or minimum allowable MAF. If the function (e.g., GLMMadaptive) to fit data returns an error message
for a simulated dataset, then the function powerEQTL.scRNAseq.sim
will try a few more runs with different simulated datasets. If all the runs failed, zero (for power estimation) or NA
or NaN
(for estimating slope, sample size, or MAF) would be output. It is recommended to set nSim >= 1000
to get stable results.
Xianjun Dong <XDONG@rics.bwh.harvard.edu>, Xiaoqi Li<xli85@bwh.harvard.edu>, Tzuu-Wang Chang <Chang.Tzuu-Wang@mgh.harvard.edu>, Scott T. Weiss <restw@channing.harvard.edu>, Weiliang Qiu <weiliang.qiu@gmail.com>
Dong X, Li X, Chang T-W, Scherzer CR, Weiss ST, and Qiu W. powerEQTL: An R package and shiny application for sample size and power calculation of bulk tissue and single-cell eQTL analysis. Bioinformatics, 2021;, btab385
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 | nSubj = 10
nCellPerSubj = 10
# calculate power
power = powerEQTL.scRNAseq.sim(
slope = 1.62, # slope
n = nSubj, # total number of subjects
m = nCellPerSubj, # number of cells per subject
power = NULL, # power to be estimated
m.int = -1, # mean of random intercept
sigma.int = 1, # SD of the random intercept
zero.p = 0.01, # probability that an excess zero occurs
theta = 1, # dispersion parameter of NB distribution NB(mu, theta)
MAF = 0.45,
FWER = 0.05,
nTests = 1,
nSim = 5, # number of simulations
estMethod = "GLMMadaptive", # parameter estimation method for ZINB
nCores = 1 # number of computer cores used by 'mclapply'
)
print(power)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.