knitr::opts_chunk$set(echo = TRUE, cache = FALSE)
suppressPackageStartupMessages( library(tidyverse) ) suppressPackageStartupMessages( library(knitr) ) suppressPackageStartupMessages( library(furrr) ) suppressPackageStartupMessages( library(future) ) suppressPackageStartupMessages( library(simaerep) )
Perhaps there might be scenarios in which one prefers to use a parametric test vs the non-parametric bootstrap-based resampling method to calculate AE under-reporting. simaerep
provides a function that does bootstrap resampling of the entire study preserving the actual site parameters such as number of patients and visit_med75.
Using this pool we can check whether the p-values calculated by poisson.test()
represent accurate probabilities.
We simulate three studies with varying ae per visit rate and then use all functions we already introduced.
df_visit1 <- sim_test_data_study( n_pat = 1000, n_sites = 100, frac_site_with_ur = 0.2, ur_rate = 0.5, max_visit_mean = 20, max_visit_sd = 4, ae_per_visit_mean = 0.5 ) df_visit1$study_id <- "ae_per_visit: 0.5" df_visit2 <- sim_test_data_study( n_pat = 1000, n_sites = 100, frac_site_with_ur = 0.2, ur_rate = 0.5, max_visit_mean = 20, max_visit_sd = 4, ae_per_visit_mean = 0.2 ) df_visit2$study_id <- "ae_per_visit: 0.2" df_visit3 <- sim_test_data_study( n_pat = 1000, n_sites = 100, frac_site_with_ur = 0.2, ur_rate = 0.5, max_visit_mean = 20, max_visit_sd = 4, ae_per_visit_mean = 0.05 ) df_visit3$study_id <- "ae_per_visit: 0.05" df_visit <- bind_rows(df_visit1, df_visit2, df_visit3) df_site <- site_aggr(df_visit) df_sim_sites <- sim_sites(df_site, df_visit) df_visit df_site df_sim_sites
sim_studies() reproduces each study a hundred times using bootstrap resampling maintaining the number of patients and the visit_med75 of each site.
set.seed(1) plan(multisession, workers = 6) df_sim_studies <- sim_studies(df_visit, df_site, r = 100, parallel = TRUE, poisson_test = TRUE, prob_lower = TRUE, .progress = FALSE) df_sim_studies
get_ecd_values() uses the p-value distribution in the dataframe returned from sim_studies() to train a empirical cumulative distribution function for each study which is then used to calculate the probability of a specific p-value or lower from the poisson.test p-value returned from sim_sites()
Note: Dots on the edge of the graph that are cut off have zero value for the corresponding axis.
df_check_pval <- get_ecd_values(df_sim_studies, df_sim_sites, val_str = "pval") df_check_pval df_check_pval %>% ggplot(aes(log(pval, base = 10), log(pval_ecd, base = 10))) + geom_point(alpha = 0.5, size = 2) + facet_wrap(~ study_id) + geom_abline(slope = 1, linetype = 2) + coord_cartesian( xlim = c(-5,0), ylim = c(-5,0) ) + theme(aspect.ratio = 1)
We can see that the p-values from poisson.test (x-axis) more or less match the probability with which they are represented in the resampled studies (y-axis), but there is some skewness to their relationship.
Note: Dots on the edge of the graph that are cut off have zero value for the corresponding axis. For the tie simulations using 1000 repeats the smallest value greater zero we get for prob_low is 0.001 (1e-3).
df_check_prob <- get_ecd_values(df_sim_studies, df_sim_sites, val_str = "prob_low") df_check_prob df_check_prob %>% ggplot(aes(log(prob_low, base = 10), log(prob_low_ecd, base = 10))) + geom_point(alpha = 0.5, size = 2) + facet_wrap(~ study_id) + geom_abline(slope = 1, linetype = 2) + coord_cartesian( xlim = c(-5,0), ylim = c(-5,0) ) + theme(aspect.ratio = 1)
Here we see that the probabilities from the site simulations (x-axis) perfectly match probability with which they are represented in the resampled studies (y-axis).
We see that the bootstrap method gives us more accurate probabilities than the p-values derived from poisson.test even though the AE generation in the sample data follows a strict poisson process. For real clinical trial data with different types of studies for which it is not clear whether AE generation is truly based on a pure poisson process we recommend to use the bootstrap method. If poisson.test shall be used we recommend checking the applicability as we just demonstrated on clinical data set.
plan(sequential)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.