esSim: An ExpressionSet Object Storing Simulated Genotype Data

Description Usage Details References Examples

Description

An ExpressionSet object storing simulated genotype data. The minor allele frequency (MAF) of cases has the same prior as that of controls.

Usage

1
data("esSim")

Details

In this simulation, we generate additive-coded genotypes for 3 clusters of SNPs based on a mixture of 3 Bayesian hierarchical models.

In cluster +, the minor allele frequency (MAF) θ_{x+} of cases is greater than the MAF θ_{y+} of controls.

In cluster 0, the MAF θ_{0} of cases is equal to the MAF of controls.

In cluster -, the MAF θ_{x-} of cases is smaller than the MAF θ_{y-} of controls.

The proportions of the 3 clusters of SNPs are π_{+}, π_{0}, and π_{-}, respectively.

We assume a “half-flat shape” bivariate prior for the MAF in cluster +

2h_{+}≤ft(θ_{x+}\right)h_{+}≤ft(θ_{y+}\right) I≤ft(θ_{x+}>θ_{y+}\right),

where I(a) is hte indicator function taking value 1 if the event a is true, and value 0 otherwise. The function h_{+} is the probability density function of the beta distribution Beta≤ft(α_{+}, β_{+}\right).

We assume θ_{0} has the beta prior Beta(α_0, β_0).

We also assume a “half-flat shape” bivariate prior for the MAF in cluster -

2h_{-}≤ft(θ_{x-}\right)h_{-}≤ft(θ_{y-}\right) I≤ft(θ_{x-}>θ_{y-}\right).

The function h_{-} is the probability density function of the beta distribution Beta≤ft(α_{-}, β_{-}\right).

Given a SNP, we assume Hardy-Weinberg equilibrium holds for its genotypes. That is, given MAF θ, the probabilities of genotypes are

Pr(geno=2) = θ^2

Pr(geno=1) = 2θ≤ft(1-θ\right)

Pr(geno=0) = ≤ft(1-θ\right)^2

We also assume the genotypes 0 (wild-type), 1 (heterozygote), and 2 (mutation) follows a multinomial distribution Multinomial≤ft\{1, ≤ft[ θ^2, 2θ≤ft(1-θ\right), ≤ft(1-θ\right)^2 \right]\right\}

We set the number of cases as 100, the number of controls as 100, and the number of SNPs as 1000.

The hyperparameters are α_{+}=2, β_{+}=5, π_{+}=0.1, α_{0}=2, β_{0}=5, π_{0}=0.8, α_{-}=2, β_{-}=5, π_{-}=0.1.

Note that when we generate MAFs from the half-flat shape bivariate priors, we might get very small MAFs or get MAFs >0.5. In these cased, we then delete this SNP.

So the final number of SNPs generated might be less than the initially-set number 1000 of SNPs.

For the dataset stored in esSim, there are 872 SNPs. 83 SNPs are in cluster -, 714 SNPs are in cluster 0, and 75 SNPs are in cluster +.

References

Yan X, Xing L, Su J, Zhang X, Qiu W. Model-based clustering for identifying disease-associated SNPs in case-control genome-wide association studies. Scientific Reports 9, Article number: 13686 (2019) https://www.nature.com/articles/s41598-019-50229-6.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
data(esSim)
print(esSim)

pDat=pData(esSim)
print(pDat[1:2,])
print(table(pDat$memSubjs))

fDat=fData(esSim)
print(fDat[1:2,])
print(table(fDat$memGenes))
print(table(fDat$memGenes2))

ubcxzhang/GWASbyCluster documentation built on Nov. 5, 2019, 11:03 a.m.