knitr::opts_chunk$set(echo = TRUE, cache = FALSE, message = FALSE)
suppressPackageStartupMessages(library(dplyr)) suppressPackageStartupMessages(library(tibble)) suppressPackageStartupMessages(library(tidyr)) suppressPackageStartupMessages(library(simaerep))
With the latest 0.6.0
release we have added an alternative version of the {simaerep} algorithm that was coded using solely dbplyr
compatible table operations.
r
timesThis comes with the following advantages and disadvantages:
dbplyr
compatibility means that code execution can be done in a database back-end as opposed to in-memory.set.seed(1) df_visit <- sim_test_data_study( n_pat = 1000, # number of patients in study n_sites = 100, # number of sites in study frac_site_with_ur = 0.05, # fraction of sites under-reporting ur_rate = 0.4, # rate of under-reporting ae_per_visit_mean = 0.5 # mean AE per patient visit ) df_visit$study_id <- "A"
Here we use the standard version of the algorithm.
aerep_trad <- simaerep(df_visit) plot(aerep_trad)
To use the patient-level matching algorithm we set inframe=TRUE
and visit_med75=FALSE
.
The original algorithm uses fixed seeds before sampling while the inframe method does not. In order to obtain consistent results we need to manually set a seed.
set.seed(1) aerep_inframe <- simaerep( df_visit, inframe = TRUE, visit_med75 = FALSE )
The plot shows that for all sites 10/10 patients were used and none were excluded. We also observe that the site average has become more noisy as less patients are used to calculate the averages for the higher visit numbers.
plot(aerep_inframe)
The inframe method includes this noisier data but does not compare average event counts but event per visit rates.
We can find events_per_visit_site
and events_per_visit_study
in df_eval
. The latter is the average event rate
obtained in the simulation in which each patient has been resampled according to its maximum visit.
aerep_inframe$df_eval
We can also force the inframe method to use the visit_med75 this will pre-filter df_visit
, which adds an extra step
and decreases performance.
set.seed(1) aerep_inframe_visit_med75 <- simaerep( df_visit, inframe = TRUE, visit_med75 = TRUE ) plot(aerep_inframe_visit_med75)
We can demonstrate the database-backend compatibility by using a connection to a in memory duckdb
database. In order to set the number of replications we need to create a new table in our back-end that has one column with as many rows as the desired replications.
A lazy reference to this table can then be passed to the r
parameter.
con <- DBI::dbConnect(duckdb::duckdb(), dbdir = ":memory:") df_r <- tibble(rep = seq(1, 1000)) dplyr::copy_to(con, df_visit, "visit") dplyr::copy_to(con, df_r, "r") tbl_visit <- tbl(con, "visit") tbl_r <- tbl(con, "r") aerep <- simaerep(tbl_visit, r = tbl_r, visit_med75 = FALSE)
When inspecting df_eval
we see that it is still a lazy table object.
aerep$df_eval
We can convert it to sql code. The cte
option makes the sql code more readable.
sql_eval <- dbplyr::sql_render(aerep$df_eval, sql_options = dbplyr::sql_options(cte = TRUE)) stringr::str_trunc(sql_eval, 500)
We can take that code and wrap it in a CREATE TABLE
statement
sql_create <- glue::glue("CREATE TABLE eval AS ({sql_eval})") DBI::dbExecute(con, sql_create)
Retrieve the new table from the database.
tbl_eval <- tbl(con, "eval") tbl_eval
We plot the results from the {simaerep} object.
plot(aerep)
Or more efficiently by using plot_study()
when we have already written the simaerep
results into the database.
Here we avoid that the results are being recalculated just for the sake of creating a plot. However this requires
that we save df_site
to the database as well.
sql_site <- dbplyr::sql_render(aerep$df_site) DBI::dbExecute(con, glue::glue("CREATE TABLE site AS ({sql_site})")) tbl_site <- tbl(con, "site") plot_study(tbl_visit, tbl_site, tbl_eval, study = "A")
DBI::dbDisconnect(con)
Here we perform some examplary tests to illustrate the increase in-memory calculation time of the inframe calculation method.
This is the calculation time for the default settings
system.time({simaerep(df_visit, inframe = FALSE, visit_med75 = TRUE, under_only = TRUE, progress = FALSE)})
inframe calculation time is higher
system.time({simaerep(df_visit, inframe = TRUE, visit_med75 = FALSE, under_only = TRUE)})
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.