knitr::opts_chunk$set( echo = TRUE, message = FALSE, warning = FALSE, collapse = TRUE, comment = "#>" )
In clinical practice and research, evaluating diagnostic test performance typically requires a "gold standard" reference test to determine the true disease status. However, in many situations, a perfect gold standard test is unavailable, expensive, invasive, or unethical to perform. The nogoldstandard
module in the ClinicoPath package provides methods for analyzing the performance of multiple diagnostic tests without requiring a gold standard reference.
This vignette demonstrates how to use the nogoldstandard
module to:
When no perfect reference test exists, researchers typically face challenging options:
Statistical methods for "no gold standard" analysis offer an alternative approach, allowing estimation of test performance metrics by using the patterns of agreement and disagreement among multiple imperfect tests.
The nogoldstandard
module implements five different approaches:
Latent Class Analysis (LCA): Assumes disease status is an unobserved (latent) variable and models the relationship between this latent variable and the observed test results.
Composite Reference: Creates a reference standard by considering the majority result across all tests as the "true" status.
All Tests Positive: Considers disease present only when all tests are positive (high specificity approach).
Any Test Positive: Considers disease present when any test is positive (high sensitivity approach).
Bayesian Analysis: Uses prior distributions and an EM algorithm to estimate parameters, potentially incorporating prior knowledge.
The nogoldstandard
module is part of the ClinicoPath package:
# Install from CRAN install.packages("ClinicoPath") # Or the development version # install.packages("devtools") devtools::install_github("sbalci/ClinicoPath")
Let's create a sample dataset with 4 diagnostic tests performed on 200 patients:
set.seed(123) n <- 200 # Number of patients # True disease status (unknown in practice) true_status <- rbinom(n, 1, 0.3) # 30% prevalence # Create 4 imperfect tests with different sensitivity/specificity test1 <- ifelse(true_status == 1, rbinom(n, 1, 0.90), # 90% sensitivity rbinom(n, 1, 0.10)) # 90% specificity test2 <- ifelse(true_status == 1, rbinom(n, 1, 0.75), # 75% sensitivity rbinom(n, 1, 0.05)) # 95% specificity test3 <- ifelse(true_status == 1, rbinom(n, 1, 0.85), # 85% sensitivity rbinom(n, 1, 0.15)) # 85% specificity test4 <- ifelse(true_status == 1, rbinom(n, 1, 0.70), # 70% sensitivity rbinom(n, 1, 0.05)) # 95% specificity # Convert to categorical format test1 <- factor(ifelse(test1 == 1, "pos", "neg")) test2 <- factor(ifelse(test2 == 1, "pos", "neg")) test3 <- factor(ifelse(test3 == 1, "pos", "neg")) test4 <- factor(ifelse(test4 == 1, "pos", "neg")) # Create data frame data <- data.frame( caseID = 1:n, test1 = test1, test2 = test2, test3 = test3, test4 = test4 ) # View the first few rows head(data)
Now let's analyze this data using the Latent Class Analysis method:
library(ClinicoPath) result <- nogoldstandard( data = data, test1 = "test1", test1Positive = "pos", test2 = "test2", test2Positive = "pos", test3 = "test3", test3Positive = "pos", test4 = "test4", test4Positive = "pos", test5 = NULL, method = "latent_class" )
The result will contain:
One of the key features of the nogoldstandard
module is the ability to compute confidence intervals using bootstrap resampling. This provides a measure of uncertainty around the estimated parameters.
Bootstrap resampling involves:
The enhanced version of the nogoldstandard
module includes detailed progress reporting during bootstrap analysis:
result_with_ci <- nogoldstandard( data = data, test1 = "test1", test1Positive = "pos", test2 = "test2", test2Positive = "pos", test3 = "test3", test3Positive = "pos", test4 = "test4", test4Positive = "pos", method = "latent_class", bootstrap = TRUE, # Enable bootstrap nboot = 1000, # Number of bootstrap samples alpha = 0.05 # For 95% confidence intervals )
When running this analysis, you'll see progress updates in the console:
=== Bootstrap Analysis === Starting bootstrap with 1000 iterations for latent_class method Estimating confidence intervals for prevalence 50/1000 (5.0%) - 50 successful, 0 errors - 12.3 sec elapsed, ~234.7 sec remaining 100/1000 (10.0%) - 100 successful, 0 errors - 24.8 sec elapsed, ~223.2 sec remaining ... 950/1000 (95.0%) - 942 successful, 8 errors - 236.5 sec elapsed, ~12.5 sec remaining 1000/1000 (100.0%) - 991 successful, 9 errors - 249.3 sec elapsed, ~0.0 sec remaining === Bootstrap Complete === Total time: 249.3 seconds (4.01 iterations/sec) Successful iterations: 991 (99.1%) Failed iterations: 9 (0.9%) Confidence interval (95.0%): [0.2145, 0.2987]
The time required for bootstrap analysis depends on several factors:
| Dataset Size | 100 Iterations | 1,000 Iterations | 10,000 Iterations | |--------------|---------------|-----------------|-------------------| | Small (<100 obs, 2-3 tests) | 5-30 sec | 30 sec - 5 min | 5-50 min | | Medium (100-1,000 obs, 3-4 tests) | 30 sec - 2 min | 3-20 min | 30 min - 3 hrs | | Large (>1,000 obs, 5 tests) | 1-5 min | 10-60 min | 1-8 hrs |
The analysis method also affects duration: - Latent Class Analysis: Slowest (multiple model fitting attempts) - Bayesian Analysis: Moderate (iterative EM algorithm) - Composite/All/Any Test: Fastest (simple calculations)
Let's compare the results from different analysis methods:
# Run analysis with each method methods <- c("latent_class", "composite", "all_positive", "any_positive", "bayesian") results <- list() for (method in methods) { results[[method]] <- nogoldstandard( data = data, test1 = "test1", test1Positive = "pos", test2 = "test2", test2Positive = "pos", test3 = "test3", test3Positive = "pos", test4 = "test4", test4Positive = "pos", method = method ) } # Extract prevalence estimates prevalence_estimates <- sapply(results, function(x) { x$prevalence$asDF()$estimate[1] }) print(prevalence_estimates)
Different methods may produce different estimates. In general:
The estimated prevalence represents the proportion of the population expected to have the disease. Different methods will yield different prevalence estimates:
For each test, the module estimates:
Sensitivity and specificity can be used to:
The test agreement plot shows the proportion of agreement between each pair of tests. This visualization helps identify:
The nogoldstandard
module can handle missing test results. The enhanced implementation:
If you have prior knowledge about disease prevalence or test characteristics, the Bayesian method can incorporate this information:
# Example of Bayesian analysis with informative priors would be shown here # This feature would require customization of the Bayesian method code
The default Latent Class Analysis assumes conditional independence between tests given the true disease status. If this assumption is violated, results may be biased. Extensions to handle conditional dependence are:
When possible, validate results against:
Below is the implementation of the bootstrap function with progress reporting:
.calculateBootstrapCI = function(data, method, nboot, alpha, type, test_index = NULL) { # Simple bootstrap implementation with progress indicators n <- nrow(data) boot_results <- numeric(nboot) # Show starting message message("\n=== Bootstrap Analysis ===") message(sprintf("Starting bootstrap with %d iterations for %s method", nboot, method)) message(sprintf("Estimating confidence intervals for %s", type)) if (!is.null(test_index)) { message(sprintf("Test index: %d", test_index)) } # Progress tracking variables start_time <- Sys.time() last_update <- start_time update_interval <- max(1, floor(nboot / 20)) # Update ~20 times during process success_count <- 0 error_count <- 0 for (b in 1:nboot) { # Resample data boot_indices <- sample(n, n, replace = TRUE) boot_data <- data[boot_indices, ] # Run analysis on bootstrap sample boot_result <- NULL tryCatch({ if (method == "latent_class") { boot_result <- private$.runLCA(boot_data, names(data), NULL) } else if (method == "composite") { boot_result <- private$.runComposite(boot_data) } else if (method == "all_positive") { boot_result <- private$.runAllPositive(boot_data) } else if (method == "any_positive") { boot_result <- private$.runAnyPositive(boot_data) } else if (method == "bayesian") { boot_result <- private$.runBayesian(boot_data) } success_count <- success_count + 1 }, error = function(e) { # Count errors but continue bootstrap error_count <- error_count + 1 }) # Extract relevant statistic if (!is.null(boot_result)) { if (type == "prevalence") { boot_results[b] <- boot_result$prevalence } else if (type == "sensitivity" && !is.null(test_index)) { boot_results[b] <- boot_result$sensitivities[test_index] } else if (type == "specificity" && !is.null(test_index)) { boot_results[b] <- boot_result$specificities[test_index] } } else { boot_results[b] <- NA } # Show progress updates current_time <- Sys.time() if (b %% update_interval == 0 || b == nboot || as.numeric(difftime(current_time, last_update, units = "secs")) > 10) { elapsed <- as.numeric(difftime(current_time, start_time, units = "secs")) percent_done <- b / nboot * 100 est_total <- elapsed / percent_done * 100 est_remaining <- est_total - elapsed message(sprintf(" %d/%d (%.1f%%) - %d successful, %d errors - %.1f sec elapsed, ~%.1f sec remaining", b, nboot, percent_done, success_count, error_count, elapsed, est_remaining)) last_update <- current_time } } # Show final statistics total_time <- as.numeric(difftime(Sys.time(), start_time, units = "secs")) message("\n=== Bootstrap Complete ===") message(sprintf("Total time: %.1f seconds (%.2f iterations/sec)", total_time, nboot/total_time)) message(sprintf("Successful iterations: %d (%.1f%%)", success_count, success_count/nboot*100)) message(sprintf("Failed iterations: %d (%.1f%%)", error_count, error_count/nboot*100)) # Calculate percentile CI boot_results <- boot_results[!is.na(boot_results)] if (length(boot_results) > 0) { ci <- quantile(boot_results, c(alpha/2, 1-alpha/2), na.rm=TRUE) message(sprintf("Confidence interval (%.1f%%): [%.4f, %.4f]", (1-alpha)*100, ci[1], ci[2])) return(list(lower = ci[1], upper = ci[2])) } else { message("WARNING: No valid bootstrap results obtained. Returning NA.") return(list(lower = NA, upper = NA)) } }
Hui SL, Walter SD. Estimating the error rates of diagnostic tests. Biometrics. 1980;36(1):167-171.
Joseph L, Gyorkos TW, Coupal L. Bayesian estimation of disease prevalence and the parameters of diagnostic tests in the absence of a gold standard. Am J Epidemiol. 1995;141(3):263-272.
Albert PS, Dodd LE. A cautionary note on the robustness of latent class models for estimating diagnostic error without a gold standard. Biometrics. 2004;60(2):427-435.
Collins J, Huynh M. Estimation of diagnostic test accuracy without full verification: a review of latent class methods. Stat Med. 2014;33(24):4141-4169.
Dendukuri N, Joseph L. Bayesian approaches to modeling the conditional dependence between multiple diagnostic tests. Biometrics. 2001;57(1):158-167.
sessionInfo()
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.