Home

/

CRAN

/

powerEQTL

/

powerEQTL.scRNAseq.sim: Power Calculation for Association Between Genotype and Gene...

powerEQTL.scRNAseq.sim: Power Calculation for Association Between Genotype and Gene...
In powerEQTL: Power and Sample Size Calculation for Bulk Tissue and Single-Cell eQTL Analysis

View source: R/powerEQTL.scRNAseq.sim.R

powerEQTL.scRNAseq.sim

R Documentation

Power Calculation for Association Between Genotype and Gene Expression Based on Single Cell RNAseq Data via Simulation from ZINB Mixed Effects Regression Model

Description

Power calculation for association between genotype and gene expression based on single cell RNAseq data via simulation from ZINB mixed effects regression model. This function can be used to calculate one of the 4 parameters (power, sample size, minimum detectable slope, and minimum allowable MAF) by setting the corresponding parameter as NULL and providing values for the other 3 parameters.

Usage

powerEQTL.scRNAseq.sim(slope, 
		       n, 
		       m, 
                       power = NULL,
		       m.int = -1, 
		       sigma.int = 1, 
		       zero.p = 0.1, 
		       theta = 1, 
		       MAF = 0.2, 
		       FWER = 0.05, 
		       nTests = 1, 
		       nSim = 1000, 
		       estMethod = "GLMMadaptive", 
		       nCores = 1,
                       n.lower = 2.01,
                       n.upper = 1e+4,
                       slope.lower =  1e-6,
                       slope.upper = log(1.0e+6),
                       MAF.lower = 0.05,
                       MAF.upper = 0.49
)

Arguments

`slope`	numeric. Slope (see details).
`n`	integer. Total number of subjects.
`m`	integer. Number of cells per subject.
`power`	numeric. Power for testing if the slope is equal to zero.
`m.int`	numeric. Mean of random intercept (see details).
`sigma.int`	numeric. Standard deviation of random intercept (see details).
`zero.p`	numeric. Probability that an excess zero occurs.
`theta`	numeric. Dispersion parameter of negative binomial distribution. The smaller `theta` is, the larger variance of NB random variable is.
`MAF`	numeric. Minor allele frequency of the SNP.
`FWER`	numeric. Family-wise Type I error rate.
`nTests`	integer. Number of tests (i.e., number of all (SNP, gene) pairs) in eQTL analysis.
`nSim`	integer. Number of simulated datasets to be generated.
`estMethod`	character. Indicates which method would be used to fit zero inflated negative binomial mixed effects model. Currently, the possible choice is “GLMMadaptive”.
`nCores`	integer. Number of computer cores used by `mclapply` for parallel computing. For Windows, `nCores=1`.
`n.lower`	numeric. Lower bound of the total number of subjects. Only used when calculating total number of subjects.
`n.upper`	numeric. Upper bound of the total number of subjects. Only used when calculating total number of subjects.
`slope.lower`	numeric. Lower bound of the slope. Only used when calculating minimum slope.
`slope.upper`	numeric. Upper bound of the slope Only used when calculating minimum slope.
`MAF.lower`	numeric. Lower bound of the MAF. Only used when calculating minimum MAF.
`MAF.upper`	numeric. Upper bound of the MAF Only used when calculating minimum MAF.

Details

This function calculates the power for testing if genotypes of a SNP is associated with the expression of a gene via nSim datasets generated from zero-inflated negative binomial (ZINB) regression model.

Each dataset is generated from zero-inflated negative binomial mixed effects regression model with only one covariate: genotype. That is, the read counts of a gene follows a mixture of 2-component distributions. One component takes only one value: zero. The other component is negative binomial distribution, which takes non-negative values 0, 1, 2, .... The log mean of the negative binomial distribution is linear function of the genotype.

For each dataset, the p-value for testing if the slope for genotype is equal to zero will be calculated.

The proportion of p-values < \alpha is the estimated power, where \alpha = FWER/nTests.

Each simulated dataset contains gene expression levels of one gene and genotypes of one SNP for subjects with multiple cells. The gene expression levels (read counts) follow zero-inflated negative binomial distribution. Denote Y_{ij} as the read counts for the j-th cell of the i-th subject, i=1,\ldots, n, j=1,\ldots, m, n is the number of subjects, and m is the number of cells per subject. Denote p as the probability that Y_{ij}=0 is an excess zero. With probability 1-p, Y_{ij} follows a negative binomial distribution NB(\mu, \theta), where \mu is the mean (i.e., \mu=E(Y_{ij})) and \theta is the dispersion parameter. The variance of the NB distribution is \mu + \mu^2/\theta. The relationship between gene expression and genotype for the i-th subject is characterized by the equation

\mu_i = \exp(\beta_{0i} + \beta_{1} x_i),

where \beta_{0i} is the random intercept following a normal distribution N(\beta_0, \sigma^2) to account for within-subject correlation of gene expression, \beta_0 is the mean of the random intercept, \sigma is the standard deviation of the random intercept, \beta_1 is the slope, and x_i is the additive-coded genotype for the SNP with minor allele frequency MAF.

We assume that the SNP satisfies the Hardy-Weinberg Equilibrium. That is, the probabilities of the 3 genotypes (0, 1, 2) are (1-MAF)^2, 2 MAF (1-MAF), MAF^2, respectively.

For simplicity, we assume that excess zeros are caused by technical issues, hence are not related to genotypes.

Value

power if the input parameter power = NULL.

sample size (total number of subjects) if the input parameter n = NULL;

minimum detectable slope if the input parameter slope = NULL;

minimum allowable MAF if the input parameter MAF = NULL.

Note

The speed of simulation approach is slow. It is recommended to run the function powerEQTL.scRNAseq.sim in parallel computing environment (e.g., Unix/Linux) and set nCores > 1. Also, it is important to set appropriate ranges to search sample size, minimum detectable slope, or minimum allowable MAF. If the function (e.g., GLMMadaptive) to fit data returns an error message for a simulated dataset, then the function powerEQTL.scRNAseq.sim will try a few more runs with different simulated datasets. If all the runs failed, zero (for power estimation) or NA or NaN (for estimating slope, sample size, or MAF) would be output. It is recommended to set nSim >= 1000 to get stable results.

Author(s)

Xianjun Dong <XDONG@rics.bwh.harvard.edu>, Xiaoqi Li<xli85@bwh.harvard.edu>, Tzuu-Wang Chang <Chang.Tzuu-Wang@mgh.harvard.edu>, Scott T. Weiss <restw@channing.harvard.edu>, Weiliang Qiu <weiliang.qiu@gmail.com>

References

Dong X, Li X, Chang T-W, Scherzer CR, Weiss ST, and Qiu W. powerEQTL: An R package and shiny application for sample size and power calculation of bulk tissue and single-cell eQTL analysis. Bioinformatics, 2021;, btab385

Examples


nSubj = 10
nCellPerSubj = 10

# calculate power
power = powerEQTL.scRNAseq.sim(
  slope = 1.62, # slope
  n = nSubj, # total number of subjects
  m = nCellPerSubj, # number of cells per subject 
  power = NULL, # power to be estimated
  m.int = -1, # mean of random intercept
  sigma.int = 1, # SD of the random intercept
  zero.p = 0.01, # probability that an excess zero occurs
  theta = 1, # dispersion parameter of NB distribution NB(mu, theta)
  MAF = 0.45, 
  FWER = 0.05,
  nTests = 1,
  nSim  =  5, # number of simulations
  estMethod = "GLMMadaptive", # parameter estimation method for ZINB
  nCores = 1 # number of computer cores used by 'mclapply'
)

print(power)

powerEQTL documentation built on June 8, 2025, 10:37 a.m.

powerEQTL index

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

powerEQTL
Power and Sample Size Calculation for Bulk Tissue and Single-Cell eQTL Analysis

powerEQTL.scRNAseq.sim: Power Calculation for Association Between Genotype and Gene...
In powerEQTL: Power and Sample Size Calculation for Bulk Tissue and Single-Cell eQTL Analysis

Power Calculation for Association Between Genotype and Gene Expression Based on Single Cell RNAseq Data via Simulation from ZINB Mixed Effects Regression Model

Description

Usage

Arguments

Details

Value

Note

Author(s)

References

Examples

Related to powerEQTL.scRNAseq.sim in powerEQTL...

R Package Documentation

Browse R Packages

We want your feedback!

powerEQTL Power and Sample Size Calculation for Bulk Tissue and Single-Cell eQTL Analysis

powerEQTL.scRNAseq.sim: Power Calculation for Association Between Genotype and Gene... In powerEQTL: Power and Sample Size Calculation for Bulk Tissue and Single-Cell eQTL Analysis

Power Calculation for Association Between Genotype and Gene Expression Based on Single Cell RNAseq Data via Simulation from ZINB Mixed Effects Regression Model

Description

Usage

Arguments

Details

Value

Note

Author(s)

References

Examples

Related to powerEQTL.scRNAseq.sim in powerEQTL...

R Package Documentation

Browse R Packages

We want your feedback!

powerEQTL
Power and Sample Size Calculation for Bulk Tissue and Single-Cell eQTL Analysis

powerEQTL.scRNAseq.sim: Power Calculation for Association Between Genotype and Gene...
In powerEQTL: Power and Sample Size Calculation for Bulk Tissue and Single-Cell eQTL Analysis