marg1: Implements a profile likelihood based algorithm for...

Description Usage Arguments Details Value References

View source: R/myFUN.R

Description

This function estimates the signal proportion and the signal density by using the marginal distribution of Y, followed by a profile likelihood based approach. It returns the vector of estimated local false discovery rates and the corresponding rejection set at a prespecified level for the false discovery rate.

Usage

1
marg1(y, x, blambda = 1e-06/length(y), level = 0.05)

Arguments

y

The observed vector of z-scores.

x

The n\times p data matrix, where n must be equal to thelength of y. If you are interested in the intercept, you must add a column of 1's to x.

blambda

The tolerance threshold while implementing a quasi-Newton approach for estimating the signal proportion. Default is set to 1e-6/length(y). We recommend not changing it unless absolutely sure.

level

The level at which the false discovery rate is to be controlled. Should be a scalar in [0,1]. Default set to 0.05.

Details

Note that the marginal distribution of Y based on the aforementioned model is same as that in a standard two-groups model (Efron 2008, see References). Fixing \barπ = \mathbf{E}[π(X)], the signal density φ_1(\cdot) is estimated using the Rmosek optimization suite. The primary idea is to approximate the mixing distribution G(\cdot) using \max\{100,√{n}\} many components, each having a suitable Gaussian distribution. The signal proportion is then estimated using the BFGS algorithm. Finally, the algorithm chooses the best value of \barπ based on a profile likelihood approach.

Value

This function returns a list consisting of the following:

p

The estimated prior probabilities, i.e., \hatπ(\cdot) evaluated at the data points.

b

The estimates for the coefficient vector in the logistic function.

f1y

The vector of estimated signal density evaluated at the data points.

kwo

This is a list with four items - i. atoms: The vector of means for the Gaussian distributions used to approximate G(\cdot), ii. probs: The vector of probabilities for each Gaussian component used to approximate G(\cdot), iii. f1y: Same as f1y above, iv. ll: The average of the logarithmic values of f1y.

localfdr

The vector of estimated local false discovery rates evaluated at the data points.

den

The vector of estimated conditional densities evaluated at the data points.

ll

The log-likelihood evaluated at the estimated optima.

rejset

The vector of 1s and 0s where 1 indicates that the corresponding hypothesis is to be rejected.

pi0

The average of the entries of the vector p.

ll_list

The vector of profile log-likelihoods corresponding to a pre-determined set of grid points for \barπ. The highest element of this vector is the output in ll.

References

Deb, N., Saha, S., Guntuboyina, A. and Sen, B., 2018. Two-component Mixture Model in the Presence of Covariates. arXiv preprint arXiv:1810.07897.

Koenker, R. and Mizera, I., 2014. Convex optimization, shape constraints, compound decisions, and empirical Bayes rules. Journal of the American Statistical Association, 109(506), pp.674-685.

Efron, B., 2008. Microarrays, empirical Bayes and the two-groups model. Statistical science, pp.1-22.


NPMLEmix documentation built on Dec. 6, 2020, 9:06 a.m.