Description Usage Arguments Details Value Author(s) References Examples
View source: R/simDat_scRNAseq_eQTL.R
Generate gene expression levels of one gene and genotypes of one SNP for subjects with multiple cells based on ZINB mixed effects regression model.
1 2 3 4 5 6 7 8 | simDat.eQTL.scRNAseq(nSubj = 50,
nCellPerSubj = 100,
zero.p = 0.01,
m.int = 0,
sigma.int = 1,
slope = 1,
theta = 1,
MAF = 0.45)
|
nSubj |
integer. Total number of subjects. |
nCellPerSubj |
integer. Number of cells per subject. |
zero.p |
numeric. Probability that an excess zero occurs. |
m.int |
numeric. Mean of random intercept (see details). |
sigma.int |
numeric. Standard deviation of random intercept (see details). |
slope |
numeric. Slope (see details). |
theta |
numeric. dispersion parameter of negative binomial distribution.
The smaller |
MAF |
numeric. Minor allele frequency of the SNP. |
This function simulates gene expression levels of one gene and genotypes of one SNP for subjects with multiple cells based on zero-inflated negative binomial (ZINB) regression model with only one covariate: genotype. That is, the read counts of a gene follows a mixture of 2-component distributions. One component takes only one value: zero. The other component is negative binomial distribution, which takes non-negative values 0, 1, 2, .... The log mean of the negative binomial distribution is linear function of the genotype.
Denote Y_{ij} as the read counts for the j-th cell of the i-th subject, i=1,…, n, j=1,…, m, n is the number of subjects, and m is the number of cells per subject.
Denote p as the probability that Y_{ij}=0 is an excess zero. With probability 1-p, Y_{ij} follows a negative binomial distribution NB(μ, θ), where μ is the mean (i.e., μ=E(Y_{ij})) and θ is the dispersion parameter. The variance of the NB distribution is μ + μ^2/θ. The relationship between gene expression and genotype for the i-th subject is characterized by the equation
μ_i = \exp(β_{0i} + β_{1} x_i),
where β_{0i} is the random intercept following a normal distribution N(β_0, σ^2) to account for within-subject correlation of gene expression, β_0 is the mean of the random intercept, σ is the standard deviation of the random intercept, β_1 is the slope, and x_i is the additive-coded genotype for the SNP with minor allele frequency MAF.
We assume that the SNP satisfies the Hardy-Weinberg Equilibrium. That is, the probabilities of the 3 genotypes (0, 1, 2) are (1-MAF)^2, 2 MAF (1-MAF), MAF^2, respectively.
For simplicity, we assume that excess zeros are caused by technical issues, hence are not related to genotypes.
A data frame with 3 columns:
id |
subject id |
geno |
additive-coded genotype of the SNP |
counts |
gene expression of the gene |
Xianjun Dong <XDONG@rics.bwh.harvard.edu>, Xiaoqi Li<xli85@bwh.harvard.edu>, Tzuu-Wang Chang <Chang.Tzuu-Wang@mgh.harvard.edu>, Scott T. Weiss <restw@channing.harvard.edu>, Weiliang Qiu <weiliang.qiu@gmail.com>
Dong X, Li X, Chang T-W, Scherzer CR, Weiss ST, and Qiu W. powerEQTL: An R package and shiny application for sample size and power calculation of bulk tissue and single-cell eQTL analysis. Bioinformatics, 2021;, btab385
1 2 3 4 5 6 7 8 9 10 |
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.