R/radjust_sym.R
In radjust: Replicability Adjusted p-Values for Two Independent Studies with Multiple Endpoints

Documented in has_all_names has_some_names radjust_sym

#' @title Adjust p-values for Replicability across Two Independent Studies with Multiple Endpoints
#'
#' @description  Given two vectors of p-values from two independent studies, returns the adjusted p-values for
#'  false discovery rate control on replicability claims.
#'
#' @param pv1,pv2 numeric vectors of p-values. If \code{directional_rep_claim} is \code{TRUE}, they must be left-sided p-values.
#'  Can be either the p-values for the selected features from each study (the default input type),
#'  or the p-values for all the features from each study.
#'  Can be either of the same length (so the same location in each vector corresponds to the same feature) or with names (so the same name in each vector corresponds to the same feature).
#' @param w1 fraction between zero and one, of the relative weight for the p-values from study 1. Default value is 0.5
#'  (see Details for other values).
#' @param input_type whether \code{pv1} and \code{pv2} contain all the p-values from each study or only the selected ones (the default).
#' @param general_dependency \code{TRUE} or \code{FALSE}, indicating whether to correct for general dependency.
#' The recommended default value is \code{FALSE} (see Details).
#' @param directional_rep_claim  \code{TRUE} or \code{FALSE}, indicating whether to perform directional replicability analysis. The default value is \code{FALSE}. If \code{TRUE}, \code{pv1} and \code{pv2} should be left-sided p-values (see Details).
#' @param variant A character string specifying the chosen variant for a potential increase in the number of discoveries.
#' Must be one of \code{"non-adaptive-with-alpha-selection"} (default), \code{"adaptive"}, or \code{"non-adaptive"} (see Details).
#' @param alpha The threshold on p-values for selecting the features in each study and the significance level for replicability analysis (see Details).
#'
#' @return The function returns a list with the following elements:
#'
#'   \tabular{ll}{
#'     \code{call}  \tab  the function call. \cr
#'     \code{inputs}  \tab  a list with the function's input parameters (except \code{pv1} and \code{pv2}). \cr
#'     \code{results_table} \tab  a data frame with the features selected in both studies and their r-values (see description below).\cr
#'     \code{selected1}  \tab the features selected in study 1 (when the variant is either \code{"adaptive"} or \code{"non-adaptive-with-alpha-selection"}). \cr
#'     \code{selected2}  \tab the features selected in study 2, same as above. \cr
#'     \code{n_selected1} \tab  the number of selected features in study 1. \cr
#'     \code{n_selected2} \tab  the number of selected features in study 2. \cr
#'     \code{pi1}  \tab  the estimate of the true-nulls fraction in the study 1 among the selected in study 2, when \code{variant = "adaptive"}.\cr
#'     \code{pi2}  \tab  the estimate of the true-nulls fraction in the study 2 among the selected in study 1, when \code{variant = "adaptive"}.
#'   }
#'
#' The third element in the list, \code{results_table}, includes the following columns:
#'
#'   \tabular{lll}{
#'     \code{name}         \tab char.   \tab the name of the feature as extracted from the named vectors, or the location, if the input vectors are not named. \cr
#'     \code{p.value.1}    \tab numeric \tab the one-sided p-value from study 1 as inputed (denoted by \code{pv1}). When \code{directional_rep_claim==TRUE} the one-sided p-values in the direction of effect are presented (i.e, \code{min(pv1,1-pv1)}). \cr
#'     \code{p.value.2}    \tab numeric \tab the one-sided p-value from study 2 as inputed (denoted by \code{pv2}). When \code{directional_rep_claim==TRUE} the one-sided p-values in the direction of effect are presented (i.e, \code{min(pv2,1-pv2)}).  \cr
#'     \code{r.value}      \tab numeric \tab the replicability adjusted p-value (= r-value). \cr
#'     \code{Direction}    \tab char.   \tab the direction of the replicability claim, when \code{directional_rep_claim = TRUE}. \cr
#'     \code{Significant}  \tab char.   \tab indicates for which features replicability claims can be made at level \eqn{\alpha}, when \code{variant} is set to \code{"adaptive"} or \code{"non-adaptive-with-alpha-selection"}.
#'   }
#'
#' @details For FDR control at level \eqn{\alpha} on replicability claims, all features with \strong{\eqn{r}-value} at most \eqn{\alpha} are declared
#'  as replicated.
#'  In addition, the discoveries from study 1 among the replicability claims have an FDR control guarantee at level \eqn{w_{1}\alpha}{w1 * \alpha}.
#'  Similarly, the discoveries from study 2 among the replicability claims have an FDR control guarantee at level \eqn{(1-w_{1})\alpha}{(1-w1) * \alpha}.
#'
#'  Setting \code{w1} to a value different than half is appropriate for stricter FDR control in one of the studies.
#'  For example, if study two has a much larger sample size than study one (and both studies examine the same problem), then
#'  setting \eqn{w_{1} > 0.5}{w1 > 0.5} will provide a stricter FDR control for the larger study and greater power for the replicability analysis,
#'  see Bogomolov and Heller (2018) for details.
#'
#'  The theoretical FDR control guarantees assume independence within each vector of p-values. However, empirical
#'  investigations suggest that the method is robust to deviations from independence. In practice, we recommend using it whenever the
#'  Benjamini-Hochberg procedure is appropriate for use with single studies, as this procedure can be viewed as a two-dimensional
#'  Benjamini-Hochberg procedure which enjoys similar robustness properties. For general dependence, we provide the option to apply
#'  a more conservative procedure with theoretical FDR control guarantee for any type of dependence,
#'  by setting \code{general_dependency} to TRUE.
#'
#'  If \code{variant} is \code{"non-adaptive"}, then the non-adaptive  replicability analysis procedure of Bogomolov and Heller (2018)
#'  is applied on the input p-values \code{pv1} and \code{pv2}.
#'  If \code{variant} is \code{"non-adaptive-with-alpha-selection"}, then for a user specified \code{alpha} (default 0.05) only p-values from
#'  study one below \eqn{w_{1}\alpha}{w1 * \alpha} and from study
#'  two below \eqn{(1-w_{1})\alpha}{(1-w1) * \alpha} are considered for replicability analysis. This additional step prevents
#'  including in the selected sets features that cannot be discovered as replicability claims at the nominal FDR level
#'  \eqn{\alpha}, thus reducing the multiplicity adjustment necessary for replicability analysis. If \code{variant} is \code{"adaptive"}, then for a user specified \code{alpha}
#'  the adaptive replicability analysis procedure is applied on the dataset, see Bogomolov and Heller (2018) for details.
#'
#' The meaning of the replicability claim for a feature if \code{directional_rep_claim} is \code{FALSE}, is that both null hypotheses  are false (or both alternatives are true).  Setting \code{directional_rep_claim} to
#' \code{TRUE} is useful if the discoveries of interest are directional but the direction is unknown. For example, a directional replicability
#' claim for a feature is the claim that both associations examined for it are positive, or both associations examined for it
#' are negative, but not that one association is positive and the other negative. For  directional replicability analysis,
#' the input p-values \code{pv1} and \code{pv2} should be the left-sided input p-values
#' (left-sided is the choice without loss of generality, since we assume the left and right sided p-values sum to one for each null hypothesis).
#'
#' @references Bogomolov, M. and Heller, R. (2018). Assessing replicability of findings across two studies of multiple
#' features. Biometrika.
#'
#' @seealso \code{\link{radjust_pf}} for replicability analysis in primary and follow-up studies.
#'
#' @export
#'
#' @examples
#'
#' data(mice)
#' ## transform the two-sided p-values to one-sided in the same direction (left):
#' ## (we use the direction of the test statistic to do so and assume that it is continuous)
#'
#' pv1 <- ifelse(mice$dir_is_left1, mice$twosided_pv1/2, 1-mice$twosided_pv1/2)
#' pv2 <- ifelse(mice$dir_is_left2, mice$twosided_pv2/2, 1-mice$twosided_pv2/2)
#'
#' ## run the examples as in the article:
#'
#' mice_rv_adaptive <- radjust_sym(pv1, pv2, input_type = "all", directional_rep_claim = TRUE,
#'                                 variant = "adaptive", alpha=0.05)
#' print(mice_rv_adaptive)
#'
#' mice_rv_non_adpt_sel <- radjust_sym(pv1, pv2, input_type = "all", directional_rep_claim = TRUE,
#'                                     variant = "non-adaptive-with-alpha-selection", alpha=0.05)
#' print(mice_rv_non_adpt_sel)
#'
#' mice_rv_non_adpt <- radjust_sym(pv1, pv2, input_type = "selected", directional_rep_claim = TRUE,
#'                                 variant = "non-adaptive")
#' print(mice_rv_non_adpt)
#'

radjust_sym <- function(pv1,
                         pv2,
                         w1 = 0.5,
                         input_type = c("selected_features", "all_features"),
                         general_dependency = FALSE,
                         directional_rep_claim = FALSE,
                         variant = c("non-adaptive-with-alpha-selection", "adaptive", "non-adaptive"),
                         alpha = if (variant == "non-adaptive") NULL else 0.05
)
{

  ## ---- validate inputs ----
  stopifnot(is.vector(pv1, mode = "numeric"), pv1 <= 1, 0 < pv1,
            is.vector(pv2, mode = "numeric"), pv2 <= 1, 0 < pv2,
            is.vector(w1, mode = "numeric"),  w1 < 1, 0 < w1, length(w1) == 1)
  stopifnot(is.logical(general_dependency),
            is.logical(directional_rep_claim))

  variant <- match.arg(variant)
  if (variant == "non-adaptive" && !is.null(alpha))
    warning("variant == 'non-adaptive', alpha value is ignored")
  if (variant != "non-adaptive" && is.null(alpha))
    stop("variant != 'non-adaptive' but alpha value is not specified")
  input_type <- match.arg(input_type)

  ## ---- generate messages regarding the matching method (i.e by names or positions)----
  if (has_some_names(pv1) || has_some_names(pv2))
  {
    if (has_all_names(pv1) && has_all_names(pv2))
    {
      if (length(pv1) != length(pv2))
      message("Note:\tpv1 and pv2 differ in length and have names
\t-> matching features by names.")
      else
      message("Note:\tpv1 and pv2 have the same length and have names
\t-> matching features by names.")
    } else
        stop("Some of the names are missing. Check with names(pv1) or names(pv2)")
  } else {
    if (length(pv1) != length(pv2))
      stop("\tpv1 and pv2 differ in length and don't have names
\tCannot match features.")
    else
      message(
        "Note:\tpv1 and pv2 have the same length and don't have names
\t-> matching features by location."
      )
  }

  ## ---- do the matching (by names or positions) ----
  if (has_all_names(pv1) && has_all_names(pv2))
  {
    # The following will produce NAs where there is no match, so to get lengths and sums in the next sections we use sum( , na.rm = TRUE).
    pv_union_df <- merge(pv1, pv2 , by.x = names(pv1), by.y = names(pv2))
    pv1 <- pv_union_df$pv1
    names(pv1) <- pv_union_df$names.pv1
    pv2 <- pv_union_df$pv2
    names(pv2) <- pv_union_df$names.pv2
  } else
    names(pv1) <- names(pv2) <- (1:length(pv1)) # If there are no names, then we assign indices as names.
  # From now on pv1, pv2 have the same length.

  ## ---- handle directional claims case----
  if (directional_rep_claim)
  {
    message(
      "Note:\tDirectional replicability claim option is set to TRUE.
\tMake sure you have entered the *left* sided p-values."
    )

    # since the user enters left p-values only, we need the 'real' p-values for some of the calculations.
    pv1real  <- ifelse(pv1 <= 0.5, pv1, 1 - pv1)
    pv2real  <- ifelse(pv2 <= 0.5, pv2, 1 - pv2)

    rep_claim_direction <- ifelse(pv1 <= 0.5 & pv2 <= 0.5, "Left",ifelse(pv1 > 0.5 & pv2 > 0.5, "Right","Opposite"))

    pv1tag <- ifelse(pv2 <= 0.5, pv1, 1 - pv1) # as in the article
    pv2tag <- ifelse(pv1 <= 0.5, pv2, 1 - pv2)

  } else {
    pv1real  <- pv1 # we allow pv > 0.5 in the non-directional claims case
    pv2real  <- pv2
    pv1tag <- pv1
    pv2tag <- pv2
  }

  ## ---- generate the logical s1, s2 and s12 vectors - the selections in each study and in both ----
  # Note that NAs in p1 and p2 are kept NAs in the selections vectors
  if (variant == "non-adaptive")
  {
    if (input_type == "all_features")
      stop("The variant cannot be 'non-adaptive' when input_type equals to 'all_features'")
    s1  <- ifelse(is.na(pv1), NA, TRUE) # "select" all except NAs
    s2  <- ifelse(is.na(pv2), NA, TRUE)
    R1 <- sum(s1, na.rm = TRUE)
    R2 <- sum(s2, na.rm = TRUE)
    pi1 <- 1
    pi2 <- 1
  } else
   if (variant == "non-adaptive-with-alpha-selection")
    {
      s1 <- (pv1real <= w1 * alpha)
      s2 <- (pv2real <= (1 - w1) * alpha)
      R1 <- sum(s1, na.rm = TRUE)
      R2 <- sum(s2, na.rm = TRUE)
      pi1 <- 1
      pi2 <- 1
    } else
      if (variant == "adaptive")
      {
        s1 <- (pv1real <= alpha)
        s2 <- (pv2real <= alpha)

        R1 <- sum(s1, na.rm = TRUE)
        R2 <- sum(s2, na.rm = TRUE)

        # the fraction of true nulls in study 1 among the selected in 2
        pi1 <- (1 + sum(pv1tag[s2] > alpha, na.rm = TRUE)) / (R2 * (1 - alpha))
        # the fraction of true nulls in study 2 among the selected in 1
        pi2 <- (1 + sum(pv2tag[s1] > alpha, na.rm = TRUE)) / (R1 * (1 - alpha))
      }

  if (directional_rep_claim)
    s12 <- (s1 & s2 & rep_claim_direction != "Opposite")
  else
    s12 <- (s1 & s2)

   ## ---- estimate pi1 and pi2 (when adaptive method is selected) ----

  if (general_dependency)
  {
    d1 <- sum(1 / (1:R2))
    d2 <- sum(1 / (1:R1))
  }
  else
  {
    d1 <- 1
    d2 <- 1
  }

  ## ---- the adjustment procedure ----
  z <- pmax(pi1 * d1 * pv1tag * R2 / w1, pi2 * d2 * pv2tag * R1 / (1 - w1))[s12]
  oz <- order(z, decreasing = TRUE)
  ozr <- order(oz)
  rv <- cummin((z / rank(z, ties.method = "max"))[oz])[ozr]
  rv <- pmin(rv, 1)
  names(rv) <- names(pv1)[s12]

  ## ---- the table of the intersection ----
  tbl_s12 <- data.frame(
      `name` = names(pv1)[s12],
      `p_value1` = pv1tag[s12],
      `p_value2` = pv2tag[s12],
      `r_value`  = rv
    )

  if (directional_rep_claim) tbl_s12$`Direction` <- rep_claim_direction[s12]

  if (!(variant == "non-adaptive")) tbl_s12$`Significant` <- ifelse(rv <= alpha,"*","")

  ## ---- generate the output ----

  cl <- match.call()
  inputs <- list(
    w1 = w1,
    input_type = input_type,
    general_dependency = general_dependency,
    directional_rep_claim = directional_rep_claim,
    variant = variant,
    alpha = alpha)

  output <- list(
    call = cl,
    inputs = inputs,
    results_table = tbl_s12,
    selected1 = pv1real[s1],
    selected2 = pv2real[s2],
    n_selected1 = R1,
    n_selected2 = R2,
    pi1 = if (variant == "adaptive") pi1 else NULL,
    pi2 = if (variant == "adaptive") pi2 else NULL
  )

  class(output) <- "radjust"
  return(output)
  ## ---- END ----
}

#' @exportClass radjust

#' @export
print.radjust <- function (x, digits_df = max(3L, getOption("digits") - 1L), ...)
{
  if (!inherits(x, "radjust"))
    warning("calling print.radjust(<fake-radjust-object>) ...")
  cat("\n")
  cat("\tReplicability Analysis\n")
  cat("\nCall:\n", paste(deparse(x$call), sep = "\n", collapse = "\n"), "\n\n", sep = "")
  cat("Selection (", if (x$inputs$variant == "non-adaptive") "as-inputed"  else x$inputs$variant,"):\n", sep = "")
  cat(x$n_selected1," features selected in study 1.\n", sep = "")
  cat(x$n_selected2," features selected in study 2.\n", sep = "")
  cat(nrow(x$results_table)," features selected in both studies.\n", sep = "")
  if (x$inputs$variant == "adaptive")
  {
    cat("\nEstimates for fraction of nulls among the selected in the other study:\n", sep = "")
    cat(x$pi1," in study 1.\n", sep = "")
    cat(x$pi2," in study 2.\n", sep = "")
  }
  cat("\nFeatures selected in both studies:\n", sep = "")
  print.data.frame(x$results_table, row.names = F, digits = digits_df,...)
  if (x$inputs$variant != "non-adaptive")
    cat("\n", sum(x$results_table$Significant == "*")," features are discovered in the ",
        if (x$inputs$directional_rep_claim) "directional" else "",
        " replicability analysis (alpha = ",x$inputs$alpha,").\n", sep = "")
  invisible(x)
}

#' @title Check Whether a Vector Has All\\Some Names
#' @description NAs in the values are allowed.
#' @examples
#' radjust:::has_some_names(c(a = 1, b = 2, 3))
#' radjust:::has_some_names(c(a = 1, b = 2, NA))
#' radjust:::has_all_names(c(a = 1, b = 2, 3))
#' radjust:::has_all_names(c(a = 1, b = 2, NA))
#' @keywords internal
has_some_names <- function(x) {!is.null(names(x))}

#' @rdname has_some_names
#' @keywords internal
has_all_names <- function(x) {!is.null(names(x)) & all(names(x)!="")}
Any scripts or data that you put into this service are public.
radjust documentation built on May 2, 2019, 3:40 p.m.
rdrr.io home R language documentation Run R code online
CRAN packages Bioconductor packages R-Forge packages GitHub packages
Note that we can't provide technical support on individual packages. You should contact the package authors for that.
radjust
Replicability Adjusted p-Values for Two Independent Studies with Multiple Endpoints

R/radjust_sym.R
In radjust: Replicability Adjusted p-Values for Two Independent Studies with Multiple Endpoints

Defines functions radjust_sym has_some_names has_all_names

Documented in has_all_names has_some_names radjust_sym

Try the radjust package in your browser

R Package Documentation

Browse R Packages

We want your feedback!

radjust Replicability Adjusted p-Values for Two Independent Studies with Multiple Endpoints

R/radjust_sym.R In radjust: Replicability Adjusted p-Values for Two Independent Studies with Multiple Endpoints

Defines functions radjust_sym has_some_names has_all_names

Documented in has_all_names has_some_names radjust_sym

Try the radjust package in your browser

R Package Documentation

Browse R Packages

We want your feedback!

radjust
Replicability Adjusted p-Values for Two Independent Studies with Multiple Endpoints

R/radjust_sym.R
In radjust: Replicability Adjusted p-Values for Two Independent Studies with Multiple Endpoints