R/RcppExports.R

Defines functions summarize_auc_results auc_parallel trap_roc

Documented in auc_parallel summarize_auc_results trap_roc

# Generated by using Rcpp::compileAttributes() -> do not edit by hand
# Generator token: 10BE3573-1514-4C36-9D1C-5A225CD40393

#' Compute Binned Classification Matrix for AUC Calculation
#'
#' @description Preprocesses prediction vectors by binning values and creating a matrix structure
#'              optimized for fast AUC computation. This function:
#'              1. Cleans and combines prediction vectors
#'              2. Performs range-based binning
#'              3. Computes background suitability data histogram
#'              4. Generates test prediction comparison matrix
#'
#' @param test_prediction Numeric vector (arma::vec) of prediction values for test data
#' @param prediction Numeric vector (arma::vec) of prediction values for background suitability data
#' @param n_bins Integer specifying number of bins to use for discretization (default = 1000)
#'
#' @return A numeric matrix where:
#'          - Rows correspond to test observations
#'          - Columns correspond to bins
#'          - Values represent bin indices in descending order (n_bins, n_bins-1, ..., 1)
#'
#' @details
#' This function prepares data for efficient AUC computation by:
#' 1. Cleaning: Removes non-finite values from background suitability predictions
#' 2. Combining: Merges background suitability and test predictions
#' 3. Binning: Discretizes values into [1, n_bins] range using:
#'        bin = floor((value - min) * (n_bins-1)/range) + 1
#' 4. Histogram: Computes background suitability data distribution across bins (parallelized)
#' 5. Matrix construction: Creates comparison matrix for test observations
#'
#' The output matrix enables fast computation of classification metrics by allowing
#' vectorized comparisons during AUC calculation.
#' This is the same rationale as the implementation applied to ntbox R package.
#'
#' @section Parallelization:
#' Uses OpenMP parallelization for:
#' - Binning computations
#' - Histogram counting
#'
#' @section Input Requirements:
#' - Input vectors must be non-empty
#' - background suitability predictions must contain at least one finite value
#' - Prediction range must be > 0 (non-constant values)
#'
#' @examples
#' # R usage example:
#' bg_pred <- runif(1000)
#' test_pred <- runif(500)
#' bin_matrix <- bigclass_matrix(test_pred, bg_pred, n_bins = 500)
#' dim(bin_matrix)  # 500 rows x 500 columns
#'
#' @seealso \code{\link{auc_parallel}} for the main AUC computation function
NULL

#' Compute AUC Metrics for single bootstrap iteration
#'
#' @description Calculates partial and complete AUC metrics for a single bootstrap sample.
#' This function is the computational core for bootstrap AUC estimation.
#'
#' @param big_classpixels Numeric matrix where:
#'          - Rows represent test observations
#'          - Columns represent bins
#'          - Values are bin indices (n_bins, n_bins-1, ... 1)
#' @param fractional_area Numeric vector of cumulative fractional areas (x-axis values for ROC)
#' @param test_prediction Numeric vector of binned test predictions (output from binning process)
#' @param n_samp Integer specifying number of test observations to sample
#' @param error_sens Double specifying sensitivity threshold for partial AUC (1 - error_rate)
#' @param compute_full_auc Boolean indicating whether to compute complete AUC
#'
#' @return A numeric matrix with 1 row and 4 columns containing:
#' \itemize{
#'   \item Column 1: Complete AUC (NA if compute_full_auc = FALSE)
#'   \item Column 2: Partial AUC for model (sensitivity > error_sens)
#'   \item Column 3: Partial AUC for random model (reference)
#'   \item Column 4: Ratio of model AUC to random AUC (model/reference)
#' }
#'
#' @details
#' The function performs these steps:
#' 1. Randomly samples test predictions (without replacement)
#' 2. Computes omission matrix by comparing bin indices with sampled predictions
#' 3. Calculates sensitivity as 1 - mean omission rate per bin
#' 4. Filters bins where sensitivity exceeds threshold
#' 5. Computes partial AUC for model and random reference
#' 6. Optionally computes complete AUC using all bins
#' 7. Calculates AUC ratio (model/reference)
#'
#' Special cases:
#' - Returns matrix of NAs if < 2 bins meet sensitivity threshold
#' - Returns zeros if either partial AUC is 0 (to prevent division by zero)
#'
#' @section Parallelization:
#' Uses OpenMP parallelization for:
#' - Omission matrix computation
#'
#' @section Algorithm Notes:
#' - Partial AUC focuses on high-sensitivity region (error_sens to 1.0)
#' - Random reference AUC is the theoretical AUC for uniform predictions
#' - Binning enables efficient vectorized comparisons
#'
#' @seealso \code{\link{auc_parallel}} for the main bootstrap function,
#'          \code{\link{trap_roc}} for AUC calculation method
NULL

#' Execute parallel bootstrap iterations for AUC Calculation
#'
#' @description Coordinates the parallel execution of multiple bootstrap iterations for AUC metrics computation.
#' This function serves as the parallel driver for the main AUC calculation workflow.
#'
#' @param big_classpixels Numeric matrix of bin comparison values (from \code{\link{bigclass_matrix}})
#' @param fractional_area Numeric vector of cumulative fractional areas (x-axis values)
#' @param test_prediction Numeric vector of binned test predictions
#' @param n_samp Integer specifying number of test observations to sample per iteration
#' @param error_sens Double specifying sensitivity threshold for partial AUC
#' @param n_iterations Integer specifying number of bootstrap iterations
#' @param compute_full_auc Boolean indicating whether to compute complete AUC
#'
#' @return A numeric matrix with `n_iterations` rows and 4 columns containing:
#' \itemize{
#'   \item auc_complete: Complete AUC (NA when compute_full_auc = FALSE)
#'   \item auc_pmodel: Partial AUC for the model
#'   \item auc_prand: Partial AUC for random model
#'   \item ratio: Ratio of model AUC to random AUC
#' }
#'
#' @details
#' This function manages the bootstrap process by:
#' 1. Creating a results matrix to store outputs from all iterations
#' 2. Using OpenMP to parallelize iterations across available cores
#' 3. For each iteration:
#'    - Calls \code{\link{calc_aucDF_arma}} to compute AUC metrics
#'    - Stores results in the output matrix
#'
#' @section Parallel Execution:
#' - Iterations are distributed across available CPU cores
#' - Each thread computes one bootstrap iteration independently
#' - Thread-safe through:
#'   * Private result storage per iteration
#'   * Atomic writes to results matrix
#'
#' @section Performance Notes:
#' - Scaling is approximately linear with core count
#' - Memory overhead is minimal (shared input data, private result rows)
#' - Critical for efficient bootstrap implementation
#'
NULL

#' Calculate Area Under Curve (AUC) using trapezoidal rule
#'
#' @description Computes the area under a curve using the trapezoidal rule of numerical integration.
#'
#' @param x Numeric vector (arma::vec) of x-coordinates (should be sorted in increasing order)
#' @param y Numeric vector (arma::vec) of y-coordinates corresponding to x-coordinates
#'
#' @return A numerical value representing the computed area under the curve as a double precision value.
#'
#' @details
#' The trapezoidal rule approximates the area under the curve by dividing it into trapezoids.
#' For each pair of adjacent points (x[i], y[i]) and (x[i+1], y[i+1]), it calculates the area of the trapezoid formed.
#' The total AUC is the sum of all these individual trapezoid areas.
#'
#' Special cases:
#' - Returns 0 if there are fewer than 2 points (no area can be calculated)
#' - Handles both increasing and decreasing x values (though typically x should be increasing for ROC curves)
#'
#' @examples
#' # R code example:
#' x <- c(0, 0.5, 1, 1.5, 2)
#' y <- c(0, 0.7, 0.9, 0.95, 1)
#' trap_roc(x, y)
#'
#' @seealso \code{\link{integrate}} for R's built-in integration functions
#' @export
trap_roc <- function(x, y) {
    .Call('_fpROC_trap_roc', PACKAGE = 'fpROC', x, y)
}

#' Parallel AUC and partial AUC calculation with optimized memory usage
#'
#' @description Computes bootstrap estimates of partial and complete AUC using parallel processing and optimized binning.
#'
#' @param test_prediction Numeric vector of test prediction values
#' @param prediction Numeric vector of model predictions (background suitability data)
#' @param threshold Percentage threshold for partial AUC calculation (default = 5.0)
#' @param sample_percentage Percentage of test data to sample in each iteration (default = 50.0)
#' @param iterations Number of bootstrap iterations (default = 500)
#' @param compute_full_auc Boolean indicating whether to compute complete AUC (default = TRUE)
#' @param n_bins Number of bins for discretization (default = 500)
#'
#' @return A numeric matrix with `iterations` rows and 4 columns containing:
#' \itemize{
#'   \item auc_complete: Complete AUC (NA when compute_full_auc = FALSE)
#'   \item auc_pmodel: Partial AUC for the model (sensitivity > 1 - threshold/100)
#'   \item auc_prand: Partial AUC for random model (reference)
#'   \item ratio: Ratio of model AUC to random AUC (model/reference)
#' }
#'
#' @details
#' This function implements a highly optimized AUC calculation pipeline:
#' 1. Cleans input data (removes non-finite values)
#' 2. Combines background and test predictions
#' 3. Performs range-based binning (discretization)
#' 4. Computes cumulative distribution of background predictions
#' 5. Runs bootstrap iterations in parallel:
#'    - Samples test predictions
#'    - Computes sensitivity-specificity curves
#'    - Calculates partial and complete AUC
#'
#' Key optimizations:
#' - OpenMP parallelization for binning and bootstrap
#' - Vectorized operations using Armadillo
#'
#' @section Partial AUC:
#' The partial AUC focuses on the high-sensitivity region defined by:
#' Sensitivity > 1 - (threshold/100)
#'
#' @examples
#' # Basic usage with random data
#' set.seed(123)
#' bg_pred <- runif(1000)   # bg predictions
#' test_pred <- runif(500)     # Test predictions
#'
#' # Compute only partial AUC metrics (500 iterations)
#' results <- auc_parallel(test_pred, bg_pred,
#'                             threshold = 5.0,
#'                             iterations = 100)  # Reduced for example
#'
#' # View first 5 iterations
#' head(results, 5)
#'
#' # Summarize results (assume complete AUC was not computed)
#' summary <- summarize_auc_results(results, has_complete_auc = FALSE)
#'
#' # Interpretation:
#' # - auc_pmodel: Model's partial AUC (higher is better)
#' # - auc_prand: Random model's partial AUC
#' # - ratio: Model AUC / Random AUC (>1 indicates better than random)
#'
#' # Compute both partial and complete AUC
#' full_results <- auc_parallel(test_pred, bg_pred,
#'                                  compute_full_auc = TRUE,
#'                                  iterations = 100)
#'
#'
#' @seealso \code{\link{summarize_auc_results}} for results processing,
#'          \code{\link{trap_roc}} for integration method
#' @export
auc_parallel <- function(test_prediction, prediction, threshold = 5.0, sample_percentage = 50.0, iterations = 500L, compute_full_auc = TRUE, n_bins = 500L) {
    .Call('_fpROC_auc_parallel', PACKAGE = 'fpROC', test_prediction, prediction, threshold, sample_percentage, iterations, compute_full_auc, n_bins)
}

#' Summarize Bootstrap AUC Results
#'
#' Computes aggregated statistics from bootstrap AUC iterations. This function processes
#' the raw output of \code{\link{auc_parallel}} to produce meaningful summary metrics of the
#' partial ROC test.
#'
#' @param auc_results Numeric matrix output from \code{\link{auc_parallel}}
#'        (dimensions: n_iterations x 4)
#' @param has_complete_auc Boolean indicating whether complete AUC was computed in the
#'        bootstrap iterations (affects first summary column)
#'
#' @return A numeric matrix with 1 row and 5 columns containing:
#' \itemize{
#'   \item mean_complete_auc: Mean of complete AUC values (NA if not computed)
#'   \item mean_pauc: Mean of partial AUC values for the model
#'   \item mean_pauc_rand: Mean of partial AUC values for random model (reference)
#'   \item mean_auc_ratio: Mean of AUC ratios (model/random)
#'   \item prop_ratio_gt1: Proportion of iterations where ratio > 1 (performance better than random)
#' }
#'
#' @details
#' This function:
#' 1. Filters iterations with non-finite ratio values (handles bootstrap failures)
#' 2. Computes means for each AUC metric across valid iterations
#' 3. Calculates proportion of iterations where model outperforms random (ratio > 1).
#'    This way of computing the the p-value of the test.
#'
#' Special handling:
#' - Returns all NAs if no valid iterations exist
#' - First column (complete AUC) depends on \code{has_complete_auc} parameter
#' - Handles NaN/Inf values safely by filtering
#'
#' @section Interpretation Guide:
#' - \code{mean_auc_ratio > 1}: Model generally outperforms random predictions
#' - \code{prop_ratio_gt1 = 1.9}: 90% of iterations showed better-than-random performance
#' - \code{mean_pauc}: Absolute performance measure (higher = better discrimination)
#'
#' @examples
#' # Basic usage with simulated results
#' set.seed(123)
#' # Simulate bootstrap output (100 iterations x 4 metrics)
#' auc_matrix <- cbind(
#'   complete = rnorm(100, 0.85, 0.05),  # Complete AUC
#'   pmodel   = rnorm(100, 0.15, 0.03),  # Partial model AUC
#'   prand    = rnorm(100, 0.08, 0.02),  # Partial random AUC
#'   ratio    = rnorm(100, 1.9, 0.4)     # Ratio
#' )
#'
#' # Summarize results (assuming complete AUC was computed)
#' summary <- summarize_auc_results(auc_matrix, has_complete_auc = TRUE)
#'
#' # Typical output interpretation:
#' # - mean_complete_auc: 0.85 (good overall discrimination)
#' # - mean_pauc: 0.15 (absolute partial AUC)
#' # - mean_pauc_rand: 0.08 (random expectation)
#' # - mean_pAUCratio: 1.9 (model 90% better than random)
#' # - p_value: 0.98 (98% of iterations showed model > random)
#'
#' # Real-world usage with actual AUC function output
#' \donttest{
#' # First run bootstrap AUC calculation
#' bg_pred <- runif(1000)
#' test_pred <- runif(500)
#' auc_output <- auc_parallel(
#'   test_prediction = test_pred,
#'   prediction = bg_pred,
#'   iterations = 100
#' )
#'
#' # Then summarize results (complete AUC not computed in this case)
#' summary <- summarize_auc_results(auc_output, has_complete_auc = FALSE)
#'
#' # Print summary statistics
#' colnames(summary) <- c("mean_complete_auc", "mean_pauc",
#'                       "mean_pauc_rand", "mean_pAUCratio", "p_value")
#' print(summary)
#'
#' # Expected output structure:
#' #      mean_complete_auc mean_pauc mean_pauc_rand mean_pAUCratio    p_value
#' # [1,]               NA     0.152          0.083       1.83           0.94
#' }
#'
#' @seealso \code{\link{auc_parallel}} for generating the input matrix
#' @export
summarize_auc_results <- function(auc_results, has_complete_auc) {
    .Call('_fpROC_summarize_auc_results', PACKAGE = 'fpROC', auc_results, has_complete_auc)
}

Try the fpROC package in your browser

Any scripts or data that you put into this service are public.

fpROC documentation built on Aug. 8, 2025, 6:47 p.m.