marg2: Implements a non-linear least squares based algorithm for...

Description Usage Arguments Details Value References Examples

View source: R/myFUN.R

Description

This function estimates the signal proportion and the signal density by using the conditional mean Y|X=x, followed by a non-linear least squares regression based approach. It returns the vector of estimated local false discovery rates and the corresponding rejection set at a prespecified level for the false discovery rate.

Usage

1
marg2(y, x, nlslambda = 1e-06/length(y), level = 0.05)

Arguments

y

The observed vector of z-scores.

x

The n\times p data matrix, where n must be equal to thelength of y. If you are interested in the intercept, you must add a column of 1's to x.

nlslambda

The tolerance threshold while implementing a quasi-Newton approach for the non-linear least squares problem. Default is set to 1e-6/length(y). We recommend not changing it unless absolutely sure.

level

The level at which the false discovery rate is to be controlled. Should be a scalar in [0,1]. Default set to 0.05.

Details

Note that the conditional mean of Y|X based on the aforementioned model is a non-linear function of the parameters, i.e., the logistic coefficients and the mean of the marginal distribution of Y, μ^* = \mathbf{E}[Y]. This is a non-convex optimization problem in the parameters and is solved by varying μ^* over a predetermined grid, and optimizing over the logistic coefficients. This is the estimate of π^*(\cdot) from the marg2() method. The estimate of φ_1(\cdot) is obtained as in the marg1() method by using the Rmosek optimization suite, and the same discrete approximation to the mixing distribution G(\cdot).

Value

This function returns a list consisting of the following:

p

The estimated prior probabilities, i.e., \hatπ(\cdot) evaluated at the data points.

b

The estimates for the coefficient vector in the logistic function.

f1y

The vector of estimated signal densities evaluated at the data points.

kwo

This is a list with four items - i. atoms: The vector of means for the Gaussian distributions used to approximate G(\cdot), ii. probs: The vector of probabilities for each Gaussian component used to approximate G(\cdot), iii. f1y: Same as f1y above, iv. ll: The average of the logarithmic values of f1y.

localfdr

The vector of estimated local false discovery rates evaluated at the data points.

den

The vector of estimated conditional densities evaluated at the data points.

ll

The log-likelihood evaluated at the estimated optima.

rejset

The vector of 1s and 0s where 1 indicates that the corresponding hypothesis is to be rejected.

pi0

The average of the entries of the vector p.

ll_list

The vector of profile log-likelihoods corresponding to a pre-determined set of grid points for μ^*. The highest element of this vector is the output in ll.

References

Deb, N., Saha, S., Guntuboyina, A. and Sen, B., 2018. Two-component Mixture Model in the Presence of Covariates. arXiv preprint arXiv:1810.07897.

Koenker, R. and Mizera, I., 2014. Convex optimization, shape constraints, compound decisions, and empirical Bayes rules. Journal of the American Statistical Association, 109(506), pp.674-685.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
require(NPMLEmix)
### Use example data ###
st=makedata(100,cbind(runif(100),runif(100)),c(0,1,-1),c(0,1),c(0.4,0.6),c(1,1))
### Use the default rejection level ###
defm2=marg2(st$y, cbind(1, st$xs))
### Use a new rejection level of 0.1 ###
nodefm2=marg2(st$y, cbind(1, st$xs), level = 0.1)
### Output the vector of prior probabilities ###
defm2$p
### Output the rejection set ###
nodefm2$rejset

NabarunD/NPMLEmix documentation built on June 19, 2020, 12:11 p.m.