R/ctgm-lfdr.R

Defines functions ctgm_lfdr

Documented in ctgm_lfdr

#---------------------------------------------------------------
# Compute oracle local FDR estimate by revealing all p-values
#---------------------------------------------------------------

#' Fitting Conditional Two-Groups Models on Unmasked P-Values
#'
#' \code{ctgm_lfdr} computes the oracle local FDR estimate, by using all p-values without masking.
#'
#' \code{ctgm_lfdr} implements the EM algorithm to fit pi(x) and mu(x) on unmasked p-values. Although it is not related to FDR control of AdaPT, it provides useful measures for post-hoc justification and other purposes.
#' For instance, one can use these local FDR estimates for prioritizing the hypotheses if strict FDR control is not required.
#'
#' In contrast to \code{adapt}, \code{cytm_lfdr} does not guarantee FDR control unless the model is correctly specified. It is recommended to use \code{ctgm_lfdr} only when FDR control is not required.
#'
#' \code{x} should have a type compatible to the fitting functions in \code{models}. For GLM and GAM, \code{x} should be a data.frame. For glmnet, \code{x} should be a matrix.
#'
#' \code{models} could either be an \code{adapt_model} object, if a single model is used, or a list of \code{adapt_model} objects, each of which corresponding to a model. Each element should be generated by \code{\link{gen_adapt_model}}. For glm/gam/glmnet, one can use the shortcut by running \code{\link{gen_adapt_model}} with name = "glm" or "gam" or "glmnet" but without specifying \code{pifun}, \code{mufun}, \code{pifun_init} and \code{mufun_init}. See examples below.
#'
#' When \code{type = "over"}, it yields a conservative estimate of local FDR
#' \deqn{lfdr(p) = (1 - \pi_{1} + \pi_{1}f_{1}(1)) / (1 - \pi_{1} + \pi_{1}f_{1}(p)).}
#' When \code{type = "raw"}, it yields the original local FDR.
#' \deqn{lfdr(p) = (1 - \pi_{1}) / (1 - \pi_{1} + \pi_{1}f_{1}(p)).}
#' The former is shown to be more stable and reliable because the weak identifiability in conditional mixture models.
#'
#' @param x covariates (i.e. side-information). Should be compatible to \code{models}. See Details
#' @param pvals a vector of values in [0, 1]. P-values
#' @param models an object of class "\code{adapt_model}" or a list of objects of class "adapt_model". See Details
#' @param dist an object of class "\code{\link{gen_exp_family}}". \code{\link{beta_family}()} as default
#' @param type a character. Either "over" or "raw" indicating the type of local FDR estimates. See Details
#' @param params0 a list in the form of list(pix = , mux = ). Initial values of pi(x) and mu(x). Both can be set as NULL
#' @param niter a positive integer. Number of EM iterations.
#' @param cr a string. The criterion for model selection with BIC as default. Also support AIC, AICC and HIC
#' @param verbose a logical values in the form of list(fit = , ms = ). Indicate whether the progress of model fitting and model selection is displayed
#'
#' @return
#' \itemize{
#' \item{lfdr}{a vector of values in [0, 1]. Local FDR estimates of each hypothesis.}
#' \item{model}{an \code{adapt_model} object. The selected model if multiple models are provided.}
#' }
#'
#' @examples
#' \donttest{
#' # Load estrogen data
#' data(estrogen)
#' pvals <- as.numeric(estrogen$pvals)
#' x <- data.frame(x = as.numeric(estrogen$ord_high))
#' dist <- beta_family()
#'
#' # Subsample the data for convenience
#' inds <- (x$x <= 5000)
#' pvals <- pvals[inds]
#' x <- x[inds,,drop = FALSE]
#'
#' # Generate models for function adapt
#' library("splines")
#' formulas <- paste0("ns(x, df = ", 6:10, ")")
#' models <- lapply(formulas, function(formula){
#'     piargs <- muargs <- list(formula = formula)
#'     gen_adapt_model(name = "glm", piargs = piargs, muargs = muargs)
#' })
#'
#' # Run ctgm_lfdr with two types of lfdr estimates
#' res_over <- ctgm_lfdr(x, pvals, models, type = "over")
#' res_raw <- ctgm_lfdr(x, pvals, models, type = "raw")
#'
#' # Compare two estimates
#' par(mfrow = c(2, 1))
#' hist(res_over$lfdr)
#' hist(res_raw$lfdr)
#' }
#'
#' @export
ctgm_lfdr <- function(x, pvals, models,
                      dist = beta_family(),
                      type = c("over", "raw"),
                      params0 = list(pix = NULL, mux = NULL),
                      niter = 50,
                      cr = "BIC",
                      verbose = TRUE
                      ){
    Mstep_type <- "weighted"
    lfdr_type <- type[1]
    tol <- 0

    res <- EM_mix_ms(x, pvals, rep(0, length(pvals)), dist, models, cr,
                     params0, niter, tol, verbose, Mstep_type)
    lfdr <- compute_lfdr_mix(pvals, dist, res$params, lfdr_type)
    return(list(lfdr = lfdr, model = res$model))
}

Try the adaptMT package in your browser

Any scripts or data that you put into this service are public.

adaptMT documentation built on May 1, 2019, 10:15 p.m.