Description Usage Arguments Details Value Author(s) References Examples
Estimate Bayes and local Bayes false discovery rates (FDRs) from multiple studies, for replicability analysis and for meta-analysis, as presented in Heller and Yekutieli (see reference below).
1 2 3 4 5 6 |
pdf.binned.z |
A 3-dimensional array which contains for each study (first dimension), the probabilities of a z-score to fall in the bin (second dimension), under each hypothesis status (third dimension). The third dimension can be of size 2 or 3, depending on the number of association states: if the association can be either null or non-null (e.g. only in one direction), the dimension is 2; if the association can be either null, or positive, or negative, the dimension is 3.
Element |
binned.z.mat |
A matrix of the bin numbers for each of the z-scores (rows) in each study (columns).
Element |
non.null |
Indicates the desired analysis: |
non.null.rows |
Vector of row indices in |
Pi.previous.result |
An optional Vector of probabilities for each association status. If |
control |
List of control parameters to pass to the EM algorithm. See |
clusters |
Used for performing analysis in each cluster, and than aggregating results together.Default value is |
clusters.ldr.report |
Sets whether local fdr values (available through the function |
clusters.verbose |
if set to |
For N
studies, each examining the same M
features, the binned z-scores and the (estimated) probabilities under the null and non-null states in each study are given as input.
These inputs can be produced from the z-scores using the function ztobins
.
The function calls piem
for the computation of the probabilities for each vector of association status. The number of probabilies estimated is x^N
, where x=2,3
is the number of possible association states in each study.
The function calls ldr
for the computation of the conditional probability of each of the vectors of association status in the null set given the binned z-scores. The null set contains the rows in hconfigs(N,x)
that: are excluded from non.null.rows
if non.null
is user.defined
; that are non-zero if non.null
is meta-analysis
; that contain at most one 1 if non.null
is replication
and x=2
; that contain at most one 1 or one -1 if non.null
is replication
and x=3
.
The local Bayes FDR is estimated to be the sum of conditional probabilities in the null set for each feature. The empirical Bayes FDR is the average of all local Bayes FDRs that are at most the value of the local Bayes FDR for each feature. The list of discoveries at level q are all features with empirical Bayes FDR at most q.
If many studies are available, one may not be able to compute RepFDR directly at the original data. If however, different groups of studies are known to be independent (e.g., if a SNP is non null for studies 1,2 is independent of the SNP being non null in studies 3,4) one may Run RepFDR in each cluster seperatly and then aggregate the results. This is done by providing a vector for the clusters
argument, with an integer value stating the cluster membership for each study.
See the values section below for the results returned from this function, when partitioning the data to clusters.
See vignette('RepFDR') for a complete example, on how to run RepFDR in clusters.
mat |
An |
Pi |
Vector of the estimated probabilities for each of the |
repfdr.mat.percluster |
Returned if |
repfdr.Pi.percluster |
Returned if |
ldr |
Returned if |
Ruth Heller, Shachar Kaufman, Shay Yaacoby, Barak Brill, Daniel Yekutieli.
Heller, R., & Yekutieli, D. (2014). Replicability analysis for genome-wide association studies. The Annals of Applied Statistics, 8(1), 481-498.
Heller, R., Yaacoby, S., & Yekutieli, D. (2014). repfdr: a tool for replicability analysis for genome-wide association studies. Bioinformatics, btu434.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 | #### Example 1: a simulation; each feature in each study has two association states,
#### null and positive, prior is known
#This example generates the Z scores for two studies, with 0.05 probability to have
# non - null signal in each study.
# The prior matrix is being pregenerated to show the optimal values.
# if this matrix was not supplied, the repfdr method would estimate it
# using an EM algorithm. See the next examples for estimating the prior as well using repfdr.
set.seed(1)
n = 2 #two studies
m=10000 # ten thounsand, SNPs
H_Study_1 = rbinom(m,1,prob = 0.05) #signal of 1, for SNPS with association in the first study
H_Study_2 = rbinom(m,1,prob = 0.05) #signal of 1, for SNPS with association in the second study
Zmat = matrix(rnorm(n*m),nrow = m) #generate matrix
#insert signal (mean shift of 3) for the first study
Zmat[which(H_Study_1==1),1] = Zmat[which(H_Study_1==1),1] + 4
#insert signal to the second study
Zmat[which(H_Study_2==1),2] = Zmat[which(H_Study_2==1),2] + 4
#estimate densities via ztobins:
ztobins_res = ztobins(Zmat,n.association.status = 2,plot.diagnostics = FALSE,n.bin= 100)
#writing out the prior explicitly. If this was not supplied,
#the repfdr would try to estimate this prior from the data.
Precomputed_Pi = matrix(NA,ncol = 3,nrow = 4)
Precomputed_Pi[,1] = c(0,1,0,1)
Precomputed_Pi[,2] = c(0,0,1,1)
Precomputed_Pi[,3] = c(0.95^2,0.95*0.05,0.95*0.05,0.05^2)
colnames(Precomputed_Pi) = c('Study 1','Study 2','Pi')
#run repfdr
repfdr_res = repfdr(ztobins_res$pdf.binned.z,
ztobins_res$binned.z.mat,
non.null = 'replication',
Pi.previous.result = Precomputed_Pi)
#The precomputed prior matrix. if this would not
repfdr_res$Pi
#local fdr0 and Fdr for each SNP
head(repfdr_res$mat)
Non_Null = which(H_Study_1 ==1 & H_Study_2 == 1)
Reported = which(repfdr_res$mat[,2] <= 0.05)
TP = length(intersect(Reported, Non_Null))
TP
FP = length(Reported) - TP
FP
FN = length(Non_Null - TP)
FN
#### Example 2: a simulation; each feature in each study has two association states,
#### null and positive, prior is estimated
## Not run:
# a) Replicablity analysis:
data(binned_zmat_sim) # this loads the binned z-scores as well as the (estimated) probabilities
# in each bin for each state
output.rep <- repfdr(pbz_sim, bz_sim, "replication")
BayesFdr.rep <- output.rep$mat[,"Fdr"]
Rej <- (BayesFdr.rep <= 0.05)
sum(Rej)
# which of the tests are true replicability findings? (we know this since the data was simulated)
data(hmat_sim)
true.rep <- apply(hmat_sim,1,function(y){ sum(y==1)>1 })
# Compute the false discovery proportion (FDP) for replicability:
sum(Rej * !true.rep) / sum(true.rep)
# we can use the previously calculated Pi for further computations (e.g meta-analysis):
Pi_sim <- output.rep$Pi
# b) meta-analysis:
output.meta <- repfdr(pbz_sim, bz_sim, "meta-analysis", Pi.previous.result = Pi_sim)
BayesFdr.meta <- output.meta$mat[,"Fdr"]
Rej <- (BayesFdr.meta <= 0.05)
sum(Rej)
# which of the tests are true association findings? (we know this since the data was simulated)
true.assoc <- rowSums(hmat_sim) >= 1
# Compute the false discovery proportion (FDP) for association:
sum(Rej * !true.assoc) / sum(true.assoc)
## End(Not run)
## Not run:
#### Example 3: SNPs data; each SNP in each study has three association states,
#### negative, null, or positive:
# load the bins of the z-scores and their probabilities.
download.file('http://www.math.tau.ac.il/~ruheller/repfdr_RData/binned_zmat.RData',
destfile = "binned_zmat.RData")
load(file = "binned_zmat.RData")
# can also be generated from SNPlocation - see ztobins documentation.
# load the prior probabilities for each association status vector.
data(Pi)
Pi # the proportions vector was computed using piem()
# with the following command: Pi <- piem(pbz, bz)$last.iteration
# a) replicablity analysis:
output.rep <- repfdr(pbz, bz, "replication",Pi.previous.result=Pi)
BayesFdr.rep <- output.rep$mat[,"Fdr"]
Rej <- sum(BayesFdr.rep <= 0.05)
sum(Rej)
# The posterior probabilities for the first five features with Bayes FDR at most 0.05:
post <- ldr(pbz,bz[order(BayesFdr.rep)[1:5],],Pi)
round(post,4)
# posteriors for a subset of the association status vectors can also be reported:
H <- hconfigs( dim(bz)[2], 3)
h.replicability = apply(H, 1, function(y) {sum(y == 1)> 1 | sum(y == -1) >1})
post <- ldr(pbz,bz[order(BayesFdr.rep)[1:5],],Pi,h.vecs= which(h.replicability==1))
round(post,4)
# b) meta-analysis:
output.meta <- repfdr(pbz, bz, "meta-analysis", Pi.previous.result = Pi)
BayesFdr.meta <- output.meta$mat[,"Fdr"]
Rej <- sum(BayesFdr.meta <= 0.05)
sum(Rej)
## End(Not run)
## manhattan plot (ploting can take a while):
# code for manhattan plot by Stephen Turner (see copyrights at the source code manhattan.r)
## Not run:
data(SNPlocations)
par(mfrow=c(2,1))
# Replication
manhattan(dataframe=cbind(SNPlocations,P=BayesFdr.rep),ymax=10.5,pch=20,
limitchromosomes=1:4,suggestiveline=-log(0.05,10),genomewideline=F,cex=0.25,
annotate=SNPlocations$SNP[BayesFdr.rep<=0.05],main="Replication")
# Association
manhattan(dataframe=cbind(SNPlocations,P=BayesFdr.meta),ymax=10.5,cex=0.25,
limitchromosomes=1:4,suggestiveline=-log(0.05,10),genomewideline=F,pch=20,
annotate=SNPlocations$SNP[BayesFdr.rep<=0.05],main="Meta-analysis")
par(mfrow=c(1,1))
## End(Not run)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.