The rfacts package is an R interface to the Fixed and Adaptive Clinical Trial Simulator (FACTS) on Unix-like systems. It programmatically invokes FACTS to run clinical trial simulations, and it aggregates simulation output data into tidy data frames. These capabilities provide end-to-end automation for large-scale simulation workflows, and they enhance computational reproducibility. For more information, please visit the documentation website.
rfacts
is not a product of nor supported by Berry
Consultants. The code base of
rfacts
is completely independent from that of
FACTS, and the former only
invokes the latter though dynamic system calls.
rfacts
only works on Unix-like systems.rfacts
requires paths to pre-compiled versions of Mono, FLFLL, and
the FACTS Linux engines. See the installation instructions below and
the configuration
guide.To install the latest release from CRAN, open R and run the following.
install.packages("rfacts")
To install the latest development version:
install.packages("remotes")
remotes::install_github("EliLillyCo/rfacts")
Next, set the RFACTS_PATHS
environment variable appropriately. For
instructions, please see the configuration
guide.
First, create a *.facts
XML file using the
FACTS GUI. The rfacts
package has several built-in examples, included with permission from
Berry Consultants LLC.
library(rfacts)
# get_facts_file_example() returns the path to
# an example a FACTS file from rfacts itself.
# For your own FACTS files you create yourself in the FACTS GUI,
# you can skip get_facts_file_example().
facts_file <- get_facts_file_example("contin.facts")
basename(facts_file)
#> [1] "contin.facts"
Then, run trial simulations with run_facts()
. By default, the results
are written to a temporary directory. Set the output_path
argument to
customize the path.
out <- run_facts(
facts_file,
n_sims = 2,
verbose = FALSE
)
out
#> [1] "/tmp/RtmpFv78Hj/file427b6fd72e42"
head(get_csv_files(out))
#> [1] "/tmp/RtmpFv78Hj/file427b6fd72e42/contin/acc1_drop1_resp1_params/patients00001.csv"
#> [2] "/tmp/RtmpFv78Hj/file427b6fd72e42/contin/acc1_drop1_resp1_params/patients00002.csv"
#> [3] "/tmp/RtmpFv78Hj/file427b6fd72e42/contin/acc1_drop1_resp1_params/weeks_freq_ignore_00001.csv"
#> [4] "/tmp/RtmpFv78Hj/file427b6fd72e42/contin/acc1_drop1_resp1_params/weeks_freq_ignore_00002.csv"
#> [5] "/tmp/RtmpFv78Hj/file427b6fd72e42/contin/acc1_drop1_resp1_params/weeks_freq_locf_00001.csv"
#> [6] "/tmp/RtmpFv78Hj/file427b6fd72e42/contin/acc1_drop1_resp1_params/weeks_freq_locf_00002.csv"
Use read_patients()
to read and aggregate all the patients*.csv
files. rfacts
has several such functions, including read_weeks()
and
read_mcmc()
.
read_patients(out)
#> # A tibble: 2,400 x 15
#> facts_file facts_scenario facts_sim facts_id facts_output facts_csv
#> <chr> <chr> <int> <chr> <chr> <chr>
#> 1 contin.fa… acc1_drop1_re… 1 file427… patients /tmp/Rtm…
#> 2 contin.fa… acc1_drop1_re… 1 file427… patients /tmp/Rtm…
#> 3 contin.fa… acc1_drop1_re… 1 file427… patients /tmp/Rtm…
#> 4 contin.fa… acc1_drop1_re… 1 file427… patients /tmp/Rtm…
#> 5 contin.fa… acc1_drop1_re… 1 file427… patients /tmp/Rtm…
#> 6 contin.fa… acc1_drop1_re… 1 file427… patients /tmp/Rtm…
#> 7 contin.fa… acc1_drop1_re… 1 file427… patients /tmp/Rtm…
#> 8 contin.fa… acc1_drop1_re… 1 file427… patients /tmp/Rtm…
#> 9 contin.fa… acc1_drop1_re… 1 file427… patients /tmp/Rtm…
#> 10 contin.fa… acc1_drop1_re… 1 file427… patients /tmp/Rtm…
#> # … with 2,390 more rows, and 9 more variables: facts_header <chr>,
#> # subject <int>, region <int>, date <dbl>, dose <int>, lastvisit <int>,
#> # dropout <int>, baseline <lgl>, visit_1 <dbl>
run_facts()
has two sequential stages:
run_flfll()
: generate the *.param
files and the folder structure
for the FACTS Linux engines.run_engine()
: execute the instructions in the *.param
files to
conduct trial simulations and produce CSV output.out <- run_flfll(facts_file, verbose = FALSE)
run_engine(facts_file, param_files = out, n_sims = 4, verbose = FALSE)
read_patients(out)
#> # A tibble: 4,800 x 15
#> facts_file facts_scenario facts_sim facts_id facts_output facts_csv
#> <chr> <chr> <int> <chr> <chr> <chr>
#> 1 contin.fa… acc1_drop1_re… 1 file427… patients /tmp/Rtm…
#> 2 contin.fa… acc1_drop1_re… 1 file427… patients /tmp/Rtm…
#> 3 contin.fa… acc1_drop1_re… 1 file427… patients /tmp/Rtm…
#> 4 contin.fa… acc1_drop1_re… 1 file427… patients /tmp/Rtm…
#> 5 contin.fa… acc1_drop1_re… 1 file427… patients /tmp/Rtm…
#> 6 contin.fa… acc1_drop1_re… 1 file427… patients /tmp/Rtm…
#> 7 contin.fa… acc1_drop1_re… 1 file427… patients /tmp/Rtm…
#> 8 contin.fa… acc1_drop1_re… 1 file427… patients /tmp/Rtm…
#> 9 contin.fa… acc1_drop1_re… 1 file427… patients /tmp/Rtm…
#> 10 contin.fa… acc1_drop1_re… 1 file427… patients /tmp/Rtm…
#> # … with 4,790 more rows, and 9 more variables: facts_header <chr>,
#> # subject <int>, region <int>, date <dbl>, dose <int>, lastvisit <int>,
#> # dropout <int>, baseline <lgl>, visit_1 <dbl>
run_engine()
automatically detects the Linux engine required for your
FACTS file. If you know the engine in advance, you can use a specific
engine function such as run_engine_contin()
or run_engine_dichot()
.
out <- run_flfll(facts_file, verbose = FALSE)
run_engine_contin(param_files = out, n_sims = 4, verbose = FALSE)
read_patients(out)
#> # A tibble: 4,800 x 15
#> facts_file facts_scenario facts_sim facts_id facts_output facts_csv
#> <chr> <chr> <int> <chr> <chr> <chr>
#> 1 contin.fa… acc1_drop1_re… 1 file427… patients /tmp/Rtm…
#> 2 contin.fa… acc1_drop1_re… 1 file427… patients /tmp/Rtm…
#> 3 contin.fa… acc1_drop1_re… 1 file427… patients /tmp/Rtm…
#> 4 contin.fa… acc1_drop1_re… 1 file427… patients /tmp/Rtm…
#> 5 contin.fa… acc1_drop1_re… 1 file427… patients /tmp/Rtm…
#> 6 contin.fa… acc1_drop1_re… 1 file427… patients /tmp/Rtm…
#> 7 contin.fa… acc1_drop1_re… 1 file427… patients /tmp/Rtm…
#> 8 contin.fa… acc1_drop1_re… 1 file427… patients /tmp/Rtm…
#> 9 contin.fa… acc1_drop1_re… 1 file427… patients /tmp/Rtm…
#> 10 contin.fa… acc1_drop1_re… 1 file427… patients /tmp/Rtm…
#> # … with 4,790 more rows, and 9 more variables: facts_header <chr>,
#> # subject <int>, region <int>, date <dbl>, dose <int>, lastvisit <int>,
#> # dropout <int>, baseline <lgl>, visit_1 <dbl>
If you are unsure which engine function to use, call
get_facts_engine()
get_facts_engine(facts_file)
#> [1] "run_engine_contin"
If we take control of the simulation process, we can pick and choose which FACTS simulation scenarios to run and read.
# Example FACTS file built into rfacts.
facts_file <- get_facts_file_example("contin.facts")
# Set up the files for the scenarios.
param_files <- run_flfll(facts_file, verbose = FALSE)
# Each scenario has its own folder with internal parameter files.
scenarios <- get_param_dirs(param_files) # not in rfacts <= 1.0.0
scenarios
#> [1] "/tmp/RtmpFv78Hj/file427b68486ae6/contin/acc1_drop1_resp1_params"
#> [2] "/tmp/RtmpFv78Hj/file427b68486ae6/contin/acc1_drop1_resp2_params"
#> [3] "/tmp/RtmpFv78Hj/file427b68486ae6/contin/acc2_drop1_resp1_params"
#> [4] "/tmp/RtmpFv78Hj/file427b68486ae6/contin/acc2_drop1_resp2_params"
# Let's pick one of those scenarios and run the simulations.
scenario <- scenarios[1]
run_engine_contin(scenario, n_sims = 2, verbose = FALSE)
read_patients(scenario)
#> # A tibble: 600 x 15
#> facts_file facts_scenario facts_sim facts_id facts_output facts_csv
#> <chr> <chr> <int> <chr> <chr> <chr>
#> 1 contin.fa… acc1_drop1_re… 1 file427… patients /tmp/Rtm…
#> 2 contin.fa… acc1_drop1_re… 1 file427… patients /tmp/Rtm…
#> 3 contin.fa… acc1_drop1_re… 1 file427… patients /tmp/Rtm…
#> 4 contin.fa… acc1_drop1_re… 1 file427… patients /tmp/Rtm…
#> 5 contin.fa… acc1_drop1_re… 1 file427… patients /tmp/Rtm…
#> 6 contin.fa… acc1_drop1_re… 1 file427… patients /tmp/Rtm…
#> 7 contin.fa… acc1_drop1_re… 1 file427… patients /tmp/Rtm…
#> 8 contin.fa… acc1_drop1_re… 1 file427… patients /tmp/Rtm…
#> 9 contin.fa… acc1_drop1_re… 1 file427… patients /tmp/Rtm…
#> 10 contin.fa… acc1_drop1_re… 1 file427… patients /tmp/Rtm…
#> # … with 590 more rows, and 9 more variables: facts_header <chr>,
#> # subject <int>, region <int>, date <dbl>, dose <int>, lastvisit <int>,
#> # dropout <int>, baseline <lgl>, visit_1 <dbl>
rfacts makes it straightforward to parallelize across simulations.
First, use run_flfll()
to create a directory of param files. Be sure
to supply an output_path
that all the parallel workers can access
(e.g. no tempfile()
s).
library(rfacts)
facts_file <- get_facts_file_example("contin.facts")
param_files <- file.path(getwd(), "param_files")
run_flfll(facts_file, param_files)
#> [1] "/home/c240390/projects/rfacts/param_files"
Next, write a custom function that accepts the param files, runs a single simulation for each param file, and returns the important data in memory. Be sure to set a unique seed for each simulation iteration.
sim_once <- function(iter, param_files) {
# Copy param files to a temp file in order to
# (1) Avoid race conditions in parallel processing, and
# (2) Make things run faster: temp files are on local node storage.
out <- tempfile()
fs::dir_copy(path = param_files, new_path = out)
# Run the engine once per param file.
run_engine_contin(out, n_sims = 1L, seed = iter)
# Return aggregated patients files.
read_patients(out) # Reads fast because `out` is a tempfile().
}
At this point, we should test this function locally without parallel computing.
library(dplyr)
# All the patients files were named patients00001.csv,
# so do not trust the facts_sim column.
# For data post-processing, use the facts_id column instead.
lapply(seq_len(4), sim_once, param_files = param_files) %>%
bind_rows()
#> # A tibble: 4,800 x 15
#> facts_file facts_scenario facts_sim facts_id facts_output facts_csv
#> <chr> <chr> <int> <chr> <chr> <chr>
#> 1 contin.fa… acc1_drop1_re… 1 file427… patients /tmp/Rtm…
#> 2 contin.fa… acc1_drop1_re… 1 file427… patients /tmp/Rtm…
#> 3 contin.fa… acc1_drop1_re… 1 file427… patients /tmp/Rtm…
#> 4 contin.fa… acc1_drop1_re… 1 file427… patients /tmp/Rtm…
#> 5 contin.fa… acc1_drop1_re… 1 file427… patients /tmp/Rtm…
#> 6 contin.fa… acc1_drop1_re… 1 file427… patients /tmp/Rtm…
#> 7 contin.fa… acc1_drop1_re… 1 file427… patients /tmp/Rtm…
#> 8 contin.fa… acc1_drop1_re… 1 file427… patients /tmp/Rtm…
#> 9 contin.fa… acc1_drop1_re… 1 file427… patients /tmp/Rtm…
#> 10 contin.fa… acc1_drop1_re… 1 file427… patients /tmp/Rtm…
#> # … with 4,790 more rows, and 9 more variables: facts_header <chr>,
#> # subject <int>, region <int>, date <dbl>, dose <int>, lastvisit <int>,
#> # dropout <int>, baseline <lgl>, visit_1 <dbl>
Parallel computing happens when we call sim_once()
repeatedly over
several parallel workers. A powerful and convenient parallel computing
solution is clustermq
. Here
is a sketch of how to use it with rfacts
. mclapply()
from the
parallel
package is a quick and dirty alternative.
# Configure clustermq to use our grid and your template file.
# If you are using a scheduler like SGE, you need to write a template file
# like clustermq.tmpl. To learn how, visit
# https://mschubert.github.io/clustermq/articles/userguide.html#configuration-1
options(clustermq.scheduler = "sge", clustermq.template = "clustermq.tmpl")
# Run the computation.
library(clustermq)
patients <- Q(
fun = sim_once,
iter = seq_len(50),
const = list(params = params),
pkgs = c("fs", "rfacts"),
n_jobs = 4
) %>%
bind_rows()
# Show aggregated patient data.
patients
Alternatives to clustermq
include parallel::mclapply()
,
furrr::future_map()
, and future.apply::future_lapply()
.
Various get_facts_*()
functions interrogate FACTS files.
get_facts_scenarios(facts_file)
#> [1] "acc1_drop1_resp1" "acc1_drop1_resp2" "acc2_drop1_resp1" "acc2_drop1_resp2"
get_facts_version(facts_file)
#> [1] "6.2.5.22668"
get_facts_versions()
#> [1] "6.3.1" "6.2.5" "6.0.0.1"
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.