eval_reject_prob: Evaluate the rejection probability of a hypothesis test.
In Yu-Group/simChef: Intensive Computational Experiments Made Easy

View source: R/evaluator-lib-inference.R

eval_reject_prob

R Documentation

Evaluate the rejection probability of a hypothesis test.

Description

Evaluate the probability of rejecting the null hypothesis across various levels of significance (possibly for multiple hypothesis tests, one for each feature).

Usage

eval_reject_prob(
  fit_results,
  vary_params = NULL,
  nested_cols = NULL,
  feature_col = NULL,
  pval_col,
  group_cols = NULL,
  alphas = NULL,
  na_rm = FALSE
)

Arguments

`fit_results`	A tibble, as returned by `fit_experiment()`.
`vary_params`	A vector of `DGP` or `Method` parameter names that are varied across in the `Experiment`.
`nested_cols`	(Optional) A character string or vector specifying the name of the column(s) in `fit_results` that need to be unnested before evaluating results. Default is `NULL`, meaning no columns in `fit_results` need to be unnested prior to computation.
`feature_col`	A character string identifying the column in `fit_results` with the feature names or IDs.
`pval_col`	A character string identifying the column in `fit_results` with the estimated p-values data. Each element in this column should be an array of length `p`, where `p` is the number of features and the feature order aligns with that of `truth_col`.
`group_cols`	(Optional) A character string or vector specifying the column(s) to group rows by before evaluating metrics. This is useful for assessing within-group metrics.
`alphas`	(Optional) Vector of significance levels at which to evaluate the rejection probability. By default, `alphas` is `NULL`, which evaluates the full empirical cumulative distribution of the p-values, i.e., the rejection probability is evaluated at all possible significance levels.
`na_rm`	A `logical` value indicating whether `NA` values should be stripped before the computation proceeds.

Value

A grouped tibble containing both identifying information and the rejection probability results aggregated over experimental replicates. Specifically, the identifier columns include .dgp_name, .method_name, any columns specified by group_cols and vary_params, and the feature names given in feature_col if applicable. In addition, there are results columns .alpha and reject_prob, which respectively give the significance level and the estimated rejection probabilities (averaged across experimental replicates).

Examples

# generate example fit_results data for a feature selection problem
fit_results <- tibble::tibble(
  .rep = rep(1:2, times = 2),
  .dgp_name = c("DGP1", "DGP1", "DGP2", "DGP2"),
  .method_name = c("Method"),
  feature_info = lapply(
    1:4,
    FUN = function(i) {
      tibble::tibble(
        # feature names
        feature = c("featureA", "featureB", "featureC"),
        # true feature support
        true_support = c(TRUE, FALSE, TRUE),
        # estimated p-values
        pval = 10^(sample(-3:0, 3, replace = TRUE))
      )
    }
  )
)

# evaluate rejection probabilities for each feature across all possible values of alpha
eval_results <- eval_reject_prob(
  fit_results,
  nested_cols = "feature_info",
  feature_col = "feature",
  pval_col = "pval"
)

# evaluate rejection probability for each feature at specific values of alpha
eval_results <- eval_reject_prob(
  fit_results,
  nested_cols = "feature_info",
  feature_col = "feature",
  pval_col = "pval",
  alphas = c(0.05, 0.1)
)

Yu-Group/simChef documentation built on Feb. 27, 2025, 9:19 p.m.