# em.hmm: EM algorithm for HMM to estimate LIS statistic In PLIS: Multiplicity control using Pooled LIS statistic

## Description

em.hmm calculates the MLE for a HMM model with hidden states being 0/1. the distribution of observed Z values given state 0 is assumed to be normal and gvien state 1, is assumed to be a normal mixture with L components

## Usage

 `1` ```em.hmm(x, L=2, maxiter = 1000, est.null = FALSE) ```

## Arguments

 `x` the observed Z values `L` the number of components in the non-null mixture, default value=2 `maxiter` the maximum number of iterations, default value=1000 `est.null` logical. If FALSE (the default) set the null distribution as N(0,1), otherwise will estimate the null distribution.

None.

## Value

 `pii` the initial state distribution, pii=(prob. of being 0, prob. of being 1) `A` transition matrix, A=(a00 a01\ a10 a11) `f0` the null distribution `pc` probability weights of each component in the non-null mixture `f1` an L by 2 matrix, specifying the dist. of each component in the non-null mixture `LIS` the LIS statistics `ni` the number of iterations excecuted `logL` log likelihood `BIC` BIC score for the estimated model `converged` Logic, Convergence indicator of the EM procedure

## Author(s)

Wei Z, Sun W, Wang K and Hakonarson H

## References

Multiple Testing in Genome-Wide Association Studies via Hidden Markov Models, Bioinformatics, 2009

 ``` 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73``` ```##(1) Example for analyzing simulated data grp1.nonNull.loci=c(21:30, 51:60); grp2.nonNull.loci=c(41:60) grp1.theta<-grp2.theta<-rep(0,200) grp1.theta[grp1.nonNull.loci]=2; grp2.theta[grp2.nonNull.loci]=2 grp1.zval=rnorm(n=length(grp1.theta),mean=grp1.theta) grp2.zval=rnorm(n=length(grp2.theta),mean=grp2.theta) ##Group 1 #Use default L=2 grp1.L2rlts=em.hmm(grp1.zval) #Use true value L=1 grp1.L1rlts=em.hmm(grp1.zval,L=1) #Choose L by BIC criteria grp1.Allrlts=sapply(1:3, function(k) em.hmm(grp1.zval,L=k)) BICs=c() for(i in 1:3) { BICs=c(BICs,grp1.Allrlts[[i]]\$BIC) } grp1.BICrlts=grp1.Allrlts[[which(BICs==max(BICs))]] rank(grp1.BICrlts\$LIS)[grp1.nonNull.loci] rank(-abs(grp1.zval))[grp1.nonNull.loci] ##Group 2 grp2.Allrlts=sapply(1:3, function(k) em.hmm(grp2.zval,L=k)) BICs=c() for(i in 1:3) { BICs=c(BICs,grp2.Allrlts[[i]]\$BIC) } grp2.BICrlts=grp2.Allrlts[[which(BICs==max(BICs))]] rank(grp2.BICrlts\$LIS)[grp2.nonNull.loci] rank(-abs(grp2.zval))[grp2.nonNull.loci] ##PLIS: control global FDR states=plis(c(grp1.BICrlts\$LIS,grp2.BICrlts\$LIS),fdr=0.1,adjust=FALSE)\$States #0 accept; 1 reject under fdr level 0.1 ##(2) Example for analyzing Genome-Wide Association Studies (GWAS) data #Information in GWAS.SampleData can be obtained by using PLINK #http://pngu.mgh.harvard.edu/~purcell/plink/ #not running #please uncomment to run # #data(GWAS.SampleData) # #chr1.data=GWAS.SampleData[which(GWAS.SampleData[,"CHR"]==1),] #chr6.data=GWAS.SampleData[which(GWAS.SampleData[,"CHR"]==6),] # ##Make sure SNPs in the linear physical order #chr1.data<-chr1.data[order(chr1.data[,"BP"]),] #chr6.data<-chr6.data[order(chr6.data[,"BP"]),] # ##convert p values by chi_sq test to z values; odds ratio (OR) is needed. #chr1.zval<-rep(0, nrow(chr1.data)) #chr1.ors=(chr1.data[,"OR"]>1) #chr1.zval[chr1.ors]<-qnorm(chr1.data[chr1.ors, "P"]/2, 0, 1, lower.tail=FALSE) #chr1.zval[!chr1.ors]<-qnorm(chr1.data[!chr1.ors, "P"]/2, 0, 1, lower.tail=TRUE) #chr1.L2rlts=em.hmm(chr1.zval) # #chr6.zval<-rep(0, nrow(chr6.data)) #chr6.ors=(chr6.data[,"OR"]>1) #chr6.zval[chr6.ors]<-qnorm(chr6.data[chr6.ors, "P"]/2, 0, 1, lower.tail=FALSE) #chr6.zval[!chr6.ors]<-qnorm(chr6.data[!chr6.ors, "P"]/2, 0, 1, lower.tail=TRUE) #chr6.L2rlts=em.hmm(chr6.zval) # ##Note that for analyzing a chromosome in real GWAS dataset, em.hmm can take as long as 10+ hrs ##L=2 or 3 is recommended for GWAS based on our experience ##em.hmm can be run in parallel for different chromosomes before applying the PLIS procedure #plis.rlts=plis(c(chr1.L2rlts\$LIS,chr6.L2rlts\$LIS),fdr=0.01) #all.Rlts=cbind(rbind(chr1.data,chr6.data), LIS=c(chr1.L2rlts\$LIS,chr6.L2rlts\$LIS), gFDR=plis.rlts\$aLIS, fdr001state=plis.rlts\$States) #all.Rlts[order(all.Rlts[,"LIS"])[1:10],] ```