Home

/

GitHub

/

ubcxzhang/GWASbyCluster

/

estMemSNPs.oneSetHyperPara: Estimate SNP cluster membership

estMemSNPs.oneSetHyperPara: Estimate SNP cluster membership
In ubcxzhang/GWASbyCluster: Identifying Significant SNPs in Genome Wide Association Studies (GWAS) via Clustering

Description Usage Arguments Details Value Author(s) References Examples

View source: R/estMemSNPs_v10.R

Estimate SNP cluster membership. Only update cluster mixture proportions. Assume all 3 clusters have the same set of hyperparameters.

estMemSNPs.oneSetHyperPara(es, 
           var.memSubjs = "memSubjs", 
           eps = 1.0e-3,
           MaxIter = 50, 
           bVec = rep(3, 3), 
           pvalAdjMethod = "none", 
           method = "FDR",
           fdr = 0.05,
           verbose = FALSE)

`es`	An ExpressionSet object storing SNP genotype data. It contains 3 matrices. The first matrix, which can be extracted by `exprs` method (e.g., `exprs(es)`), stores genotype data, with rows are SNPs and columns are subjects. The second matrix, which can be extracted by `pData` method (e.g., `pData(es)`), stores phenotype data describing subjects. Rows are subjects, and columns are phenotype variables. The third matrix, which can be extracted by `fData` method (e.g., `fData(es)`), stores feature data describing SNPs. Rows are SNPs and columns are feature variables.
`var.memSubjs`	character. The name of the phenotype variable indicating subject's case-control status. It must take only two values: 1 indicating case and 0 indicating control.
`eps`	numeric. A small positive number as threshold for convergence of EM algorithm.
`MaxIter`	integer. A positive integer indicating maximum iteration in EM algorithm.
`bVec`	numeric. A vector of 2 elements. Indicates the parameters of the symmetric Dirichlet prior for proportion mixtures.
`pvalAdjMethod`	character. Indicating p-value adjustment method. c.f. `p.adjust`.
`method`	method to obtain SNP cluster membership based on the responsibility matrix. The default value is “FDR”. The other possible value is “max”. see details.
`fdr`	numeric. A small positive FDR threshold used to call SNP cluster membership
`verbose`	logical. Indicating if intermediate and final results should be output.

We characterize the distribution of genotypes of SNPs by a mixture of 3 Bayesian hierarchical models. The 3 Bayeisan hierarchical models correspond to 3 clusters of SNPs.

In cluster +, the minor allele frequency (MAF) θ_{x+} of cases is greater than the MAF θ_{y+} of controls.

In cluster 0, the MAF θ_{0} of cases is equal to the MAF of controls.

In cluster -, the MAF θ_{x-} of cases is smaller than the MAF θ_{y-} of controls.

The proportions of the 3 clusters of SNPs are π_{+}, π_{0}, and π_{-}, respectively.

We assume a “half-flat shape” bivariate prior for the MAF in cluster +

2h≤ft(θ_{x+}\right)h≤ft(θ_{y+}\right) I≤ft(θ_{x+}>θ_{y+}\right),

where I(a) is hte indicator function taking value 1 if the event a is true, and value 0 otherwise. The function h is the probability density function of the beta distribution Beta≤ft(α, β\right).

We assume θ_{0} has the beta prior Beta(α, β).

We also assume a “half-flat shape” bivariate prior for the MAF in cluster -

2h≤ft(θ_{x-}\right)h≤ft(θ_{y-}\right) I≤ft(θ_{x-}>θ_{y-}\right).

Given a SNP, we assume Hardy-Weinberg equilibrium holds for its genotypes. That is, given MAF θ, the probabilities of genotypes are

Pr(geno=2) = θ^2

Pr(geno=1) = 2θ≤ft(1-θ\right)

Pr(geno=0) = ≤ft(1-θ\right)^2

We also assume the genotypes 0 (wild-type), 1 (heterozygote), and 2 (mutation) follows a multinomial distribution Multinomial≤ft\{1, ≤ft[ θ^2, 2θ≤ft(1-θ\right), ≤ft(1-θ\right)^2 \right]\right\}

For each SNP, we calculat its posterior probabilities that it belongs to cluster k. This forms a matrix with 3 columns. Rows are SNPs. The 1st column is the posterior probability that the SNP belongs to cluster -. The 2nd column is the posterior probability that the SNP belongs to cluster 0. The 3rd column is the posterior probability that the SNP belongs to cluster +. We call this posterior probability matrix as responsibility matrix. To determine which cluster a SNP eventually belongs to, we can use 2 methods. The first method (the default method) is “FDR” method, which will use FDR criterion to determine SNP cluster membership. The 2nd method is use the maximum posterior probability to decide which cluster a SNP belongs to.

A list of 10 elements

`wMat`	matrix of posterior probabilities. The rows are SNPs. There are 3 columns. The first column is the posterior probability that a SNP belongs to cluster - given genotypes of subjects. The second column is the posterior probability that a SNP belongs to cluster 0 given genotypes of subjects. The third column is the posterior probability that a SNP belongs to cluster + given genotypes of subjects.
`memSNPs`	a vector of SNP cluster membership for the 3-cluster partitionfrom the mixture of 3 Bayesian hierarchical models.
`memSNPs2`	a vector of binary SNP cluster membership. 1 indicates the SNP has different MAFs between cases and controls. 0 indicates the SNP has the same MAF in cases as that in controls.
`piVec`	a vector of cluster mixture proportions.
`alpha`	the first shape parameter of the beta prior for MAF obtaind from initial 3-cluster partitions based on GWAS.
`beta`	the second shape parameter of the beta prior for MAF obtaind from initial 3-cluster partitions based on GWAS.
`loop`	number of iteration in EM algorithm
`diff`	sum of the squared difference of cluster mixture proportions between current iteration and previous iteration in EM algorithm. if `eps < eps`, we claim the EM algorithm converges.
`res.limma`	object returned by limma

Yan Xu <yanxu@uvic.ca>, Li Xing <sfulxing@gmail.com>, Jessica Su <rejas@channing.harvard.edu>, Xuekui Zhang <xuekui@uvic.ca>, Weiliang Qiu <Weiliang.Qiu@gmail.com>

Yan X, Xing L, Su J, Zhang X, Qiu W. Model-based clustering for identifying disease-associated SNPs in case-control genome-wide association studies. Scientific Reports 9, Article number: 13686 (2019) https://www.nature.com/articles/s41598-019-50229-6.

data(esSimDiffPriors)
print(esSimDiffPriors)
fDat = fData(esSimDiffPriors)
print(fDat[1:2,])
print(table(fDat$memGenes))

res = estMemSNPs.oneSetHyperPara(
  es = esSimDiffPriors, 
  var.memSubjs = "memSubjs")

print(table(fDat$memGenes, res$memSNPs))

ubcxzhang/GWASbyCluster documentation built on Nov. 5, 2019, 11:03 a.m.

ubcxzhang/GWASbyCluster index

README.md

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

ubcxzhang/GWASbyCluster
Identifying Significant SNPs in Genome Wide Association Studies (GWAS) via Clustering

estMemSNPs.oneSetHyperPara: Estimate SNP cluster membership
In ubcxzhang/GWASbyCluster: Identifying Significant SNPs in Genome Wide Association Studies (GWAS) via Clustering

Description

Usage

Arguments

Details

Value

Author(s)

References

Examples

Related to estMemSNPs.oneSetHyperPara in ubcxzhang/GWASbyCluster...

R Package Documentation

Browse R Packages

We want your feedback!

ubcxzhang/GWASbyCluster Identifying Significant SNPs in Genome Wide Association Studies (GWAS) via Clustering

estMemSNPs.oneSetHyperPara: Estimate SNP cluster membership In ubcxzhang/GWASbyCluster: Identifying Significant SNPs in Genome Wide Association Studies (GWAS) via Clustering

Description

Usage

Arguments

Details

Value

Author(s)

References

Examples

Related to estMemSNPs.oneSetHyperPara in ubcxzhang/GWASbyCluster...

R Package Documentation

Browse R Packages

We want your feedback!

ubcxzhang/GWASbyCluster
Identifying Significant SNPs in Genome Wide Association Studies (GWAS) via Clustering

estMemSNPs.oneSetHyperPara: Estimate SNP cluster membership
In ubcxzhang/GWASbyCluster: Identifying Significant SNPs in Genome Wide Association Studies (GWAS) via Clustering