# FDRreg: False Discovery Rate Regression In FDRreg: False discovery rate regression

## Description

Estimate an empirical-Bayes false-discovery rate regression model for test statistics z and regressors X.

## Usage

 ```1 2``` ```FDRreg(z, covars, nulltype = 'empirical', type = 'linear', nmc = 10000, nburn = 500, nmids = 150, densknots = 10, regknots = 5) ```

## Arguments

 `z` An N dimensional vector; z_i is the test statistic for observation i. `covars` An N x P dimensional design matrix; x_i is the ith row. This is assumed not to have a column of ones representing an intercept; just like in lm() and glm(), this will be added by the fitting algorithm. `nulltype` Choices are 'empirical' for an empirical null using Efron's central-matching method, or 'theoretical' for a standard normal null. `type` Choices are 'linear' for a standard logistic regression, or 'additive' for an additive logit model, in which case each column of covars is expanded using a b-spline basis. `nmc` The number of MCMC iterations saved. Defaults to 10,000. `nburn` The number of initial MCMC iterations discarded as burn-in. Defaults to 500. `nmids` How many bins should be used in the estimation of the marginal density f(z)? Defaults to 150. `densknots` How many knots should be used to estimate the marginal density f(z) via spline-based Poisson regression? Defaults to 10; the function will warn you if it looks like you've used too few, using a simple deviance statistic. `regknots` Used only if type='additive'. How many knots should be used to estimate each partial regression function f_j(x_j)? Defaults to 5.

## Details

This model assumes that a z-statistic z arises from

f(z_i) = w_i f^1(z) + (1-w_i) f^0(z) ,

where f^1(z) and f^0(z) are the densities/marginal likelihoods under the alternative and null hypotheses, respectively, and where w_i is the prior probability that z_i is a signal (non-null case). Efron's method is used to estimate f(z) nonparametrically; f^0(z) may either be the theoretical (standard normal) null, or an empirical null which can be estimated using the middle 25 percent of the data. The prior probabilities w_i are estimated via logistic regression against covariates, using the Polya-Gamma Gibbs sampler of Polson, Scott, and Windle (JASA, 2013).

## Value

 `z` The test statistics provided as the argument z. `localfdr` The corresponding vector of local false discovery rates (lfdr) for the elements of z. localfdr[i] is simply 1 minus the fitted posterior probability that z[i] comes from the non-null (signal) population. It is important to remember that localfdr is not necessarily monotonic in z, because the regression model allows the prior probability that z[i] is a signal to change with covariates x[i]. `FDR` The corresponding vector of cut-level false discovery rates (FDR) for the elements of z. Used for extracting findings at a given FDR level. FDR[i] is the estimated false discovery rate for the cohort of test statistics whose local fdr's are at least as small as localfdr[i] — that is, the z[j]'s such that localfdr[j] <= localfdr[i]. `X` The design matrix used in the regression. This will include an added column for an intercept, along with the spline basis expansion if type='additive'. `grid` Length nmids: equally-spaced midpoints of the histogram bins used to estimate f(z) via Poisson spline regression. `breaks` Length nmids: the breakpoints of the histogram used to estimate f(z) via Poisson spline regression. `grid.fz` Length nmids: the estimated value of f(z) at the histogram midpoints. `grid.f0z` Length nmids: the estimated value of f^0(z), the assumed (either theoretical or empirical) null density at the histogram midpoints. `grid.zcounts` Length nmids: The number of z-scores that fell into each histogram bin. `dnull` The estimated (or assumed) null density at each of the observed z scores; dnull[i] corresponds to z[i]. `dmix` The estimated marginal density f(z) at each point z[i]. This should look like a good, smooth fit to the histogram of z. `empirical.null` A list with two members mu0 and sig0, representing the mean and standard deviation of the empirical null estimated using Efron's central-matching method. Always returned, but only used if nulltype='empirical'. `betasave` A matrix of posterior draws. Each row is a single posterior draw of the vector of regression coefficients corresponding to the columns of the returned X. `priorprob` The estimated prior probability of being a signa for each observation z_i. Here priorprob[i] = P(z_i is non-null). `postprob` The estimated posterior probabilities of being a signal each observation z_i: postprob[i] = P(z_i is non-null | data), and localfdr[i] = 1-postprob[i]. `fjindex` A list of indices of length ncol(covars), where covars is the matrix of covariates you fed in. Mainly useful if type='additive', in which case fjind[[j]] gives you a vector of indices telling you which columns of the returned X and betasave correspond to the basis expansion of the original design matrix covars[,j].

## References

J.G. Scott, R. Kelly, M.A. Smith, P. Zhou, and R.E. Kass (2013). False discovery rate regression: application to neural synchrony detection in primary visual cortex. arXiv:1307.3495 [stat.ME].

N.G. Polson, J.G. Scott, and J. Windle (2013. Bayesian inference for logistic models using Polya-Gamma latent variables. Journal of the American Statistical Association (Theory and Methods) 108(504): 1339-49 (2013). arXiv:1205.0310 [stat.ME].

Efron (2004). Large-scale simultaneous hypothesis testing: the choice of a null hypothesis. J. Amer. Statist. Assoc. 99, 96-104.

Efron (2005). Local false discovery rates. Preprint, Dept. of Statistics, Stanford University.

## Examples

 ``` 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48``` ```library(FDRreg) # Simulated data P = 2 N = 10000 betatrue = c(-3.5,rep(1/sqrt(P), P)) X = matrix(rnorm(N*P), N,P) psi = crossprod(t(cbind(1,X)), betatrue) wsuccess = 1/{1+exp(-psi)} # Some theta's are signals, most are noise gammatrue = rbinom(N,1,wsuccess) table(gammatrue) # Density of signals thetatrue = rnorm(N,3,0.5) thetatrue[gammatrue==0] = 0 z = rnorm(N, thetatrue, 1) hist(z, 100, prob=TRUE, col='lightblue', border=NA) curve(dnorm(x,0,1), add=TRUE, n=1001) ## Not run: # Fit the model fdr1 <- FDRreg(z, covars=X, nmc=2500, nburn=100, nmids=120, nulltype='theoretical') # Show the empirical-Bayes estimate of the mixture density # and the findings at a specific FDR level Q = 0.1 plotFDR(fdr1, Q=Q, showfz=TRUE) # Posterior distribution of the intercept hist(fdr1\$betasave[,1], 20) # Compare actual versus estimated prior probabilities of being a signal plot(wsuccess, fdr1\$priorprob) # Covariate effects plot(X[,1], log(fdr1\$priorprob/{1-fdr1\$priorprob}), ylab='Logit of prior probability') plot(X[,2], log(fdr1\$priorprob/{1-fdr1\$priorprob}), ylab='Logit of prior probability') # Local FDR plot(z, fdr1\$localfdr, ylab='Local false-discovery rate') # Extract findings at level FDR = Q myfindings = which(fdr1\$FDR <= Q) ## End(Not run) ```

FDRreg documentation built on May 2, 2019, 12:36 a.m.