mixtwice: Large-scale hypothesis testing by variance mixing
In wiscstatman/MixTwice: MixTwice--a package for large-scale hypothesis testing

View source: R/mixtwice.R

mixtwice

R Documentation

Large-scale hypothesis testing by variance mixing

Description

MixTwice deploys large-scale hypothesis testing in the case when testing units provide estimated effects and estimated standard errors. It produces empirical Bayesian local false discovery and sign rates for tests of zero effect.

Usage

mixtwice(thetaHat, s2, Btheta, Bsigma, df, prop)

Arguments

`thetaHat`	Estimated effect sizes (vector over testing units))
`s2`	Estimated squared standard errors of thetaHat (vector over testing units)
`Btheta`	Grid size parameter for effect distribution
`Bsigma2`	Grid size parameter for variance distribution
`df`	Degrees of freedom in chisquare associated with estimated standard error
`prop`	Proportion of units randomly selected to fit the distribution, with default, `prop = 1` (use all units to fit the distribution).

Details

mixtwice takes estimated effects and standard errors over a set of testing units. To compute local error-rate statistics, it finds nonparametric MLEs of the underlying distributions. It is similar to "ashr", except that mixtwice allows both a mixing distribution of underlying effects theta as well as a mixing distribution over underlying variance parameters. Furthermore, it treats the effect mixing distribution nonparametrically, but enforces the shape constraint that this distribution is unimodal with mode at theta=0. (We do not assume symmetry). The distribution of variance parameters is also treated nonparametrically, but with no shape constraints. The observations are assumed to be emitted from a normal distribution (on estimated effects) and an independent Chi-square distribution (on estimated squared standard errors).

Value

`grid.theta`	Support of the estimated mixing distribution on effects
`grid.sigma2`	Support of the estimated mixing distribution of variances
`mix.theta`	Estimated distribution of effect size, on grid.theta
`mix.sigma2`	Estimated distribution of variance, on grid.sigma2
`lfdr`	Local false discovery rate for each testing unit
`lfsr`	Local false sign rate for each testing unit

Note

cite the biorxiv/arxiv paper

Author(s)

Zihao Zheng, Michael A.Newton

References

Zheng et al. MixTwice: Large scale hypothesis testing for peptide arrays by variance mixing. Technical Report, October 2020.

Examples

### basic setting 

pi = 0.5 ## true value of null proportion

p = 1000; n = 10 ## number of testing units and sample size of each unit

p1=(1-pi)*p; p2=pi*p ## number of non-null and null

### Example 1: for a single group testing problem for zero effect

mu=c(rnorm(round(p1), mean=0, sd=1), rep(0, round(p2)))

sd=rep(1, p)

x=NULL

for (i in 1:(p1+p2)) {
  
  xx=rnorm(n, mu[i], sd[i])
  
  x=rbind(x,xx)
}


thetaHat = rowMeans(x) ## effect size of each testing unit

s2 = apply(x, 1, sd)^2/n ## estimated variance of effect size

mm1=mixtwice(thetaHat=thetaHat, s2=s2, Btheta = 15, Bsigma2 = 10, df=n-1)

## summarize and visualize the result

# estimated mixing distribution and true mixing distribution
plot(mm1$grid.theta, cumsum(mm1$mix.theta), type = "s", 
     xlab = "grid.theta", ylab = "ecdf of theta", lwd = 2)
lines(ecdf(mu),cex = 0.1, lwd = 0.5, lty = 2, col = "red")

plot(mm1$grid.sigma2, cumsum(mm1$mix.sigma2), type = "s",
     xlab = "grid.sigma2", ylab = "ecdf of sigma2", lwd = 2)

# effect size and estimated local false discovery rate

plot(thetaHat, mm1$lfdr, pch = ".", cex = 2, xlab = "estimated effect size", ylab = "local false discovery rate")

# true positive and false positive rate, under the level of 0.05

mean(mm1$lfsr[1:500]<=0.05) # true positive rate
mean(mm1$lfsr[501:1000]<=0.05) # false positive rate

# null proportion estimation

max(mm1$mix.theta)

### Example 2: for a two-group comparison testing problem for zero difference

mu0 = rep(0, p)
mu1=c(rnorm(round(p1), mean=0, sd=1), rep(0, round(p2)))

sd=rep(1, p)

x0<-x1<-NULL

for (i in 1:p) {
  
  xx0=rnorm(n, mu0[i], sd[i])
  
  x0=rbind(x0,xx0)
  
  xx1=rnorm(n, mu1[i], sd[i])
  
  x1=rbind(x1,xx1)
}


thetaHat = rowMeans(x1) - rowMeans(x0) ## effect size of each testing unit

s2 = apply(x1, 1, sd)^2/n + apply(x0, 1, sd)^2/n ## estimated variance of effect size

mm2=mixtwice(thetaHat=thetaHat, s2=s2, Btheta = 15, Bsigma2 = 10, df=2*n-2)

## summarize and visualize the result

# estimated mixing distribution and true mixing distribution
plot(mm2$grid.theta, cumsum(mm2$mix.theta), type = "s", 
     xlab = "grid.theta", ylab = "ecdf of theta", lwd = 2)
lines(ecdf(mu),cex = 0.1, lwd = 0.5, lty = 2, col = "red")

plot(mm2$grid.sigma2, cumsum(mm2$mix.sigma2), type = "s",
     xlab = "grid.sigma2", ylab = "ecdf of sigma2", lwd = 2)

# effect size and estimated local false discovery rate

plot(thetaHat, mm2$lfdr, pch = ".", cex = 2, xlab = "estimated effect size", ylab = "local false discovery rate")

# true positive and false positive rate, under the level of 0.05

mean(mm2$lfsr[1:500]<=0.05) # true positive rate
mean(mm2$lfsr[501:1000]<=0.05) # false positive rate

# null proportion estimation

pi_est = max(mm2$mix.theta)

wiscstatman/MixTwice documentation built on March 29, 2024, 12:28 p.m.