knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.path = "man/figures/README-", out.width = "100%" )
This package provides a grammar for data preparation and evaluation of fixed-origin and rolling-origin prediction models using data collected at irregular intervals.
You can install the GitHub version of gpmodels with:
remotes::install_github('ML4LHS/gpmodels')
Start by loading and package and defining your time_frame()
. A time_frame
is simply a list with the class time_frame
and contains all the key information needed to describe both your fixed dataset (such as demographics, one row per patient) and your temporal dataset (one row per observation linked to a timestamp).
library(gpmodels)
library(magrittr) library(lubridate) future::plan('multisession') unlink(file.path(tempdir(), 'gpmodels_dir', '*.*')) tf = time_frame(fixed_data = sample_fixed_data, temporal_data = sample_temporal_data %>% dplyr::filter(id %in% 1:100), fixed_id = 'id', fixed_start = 'admit_time', fixed_end = 'dc_time', temporal_id = 'id', temporal_time = 'time', temporal_variable = 'variable', temporal_category = 'category', temporal_value = 'value', step = hours(6), max_length = days(7), # optional parameter to limit to first 7 days of hospitalization output_folder = file.path(tempdir(), 'gpmodels_dir'), create_folder = TRUE)
names(tf) tf$step tf$step_units tf$fixed_data_dict tf$temporal_data_dict
tf = tf %>% pre_dummy_code()
tf$fixed_data_dict tf$temporal_data_dict
The default method writes output to the folder defined in your time_frame
. When you write your output to file, you are allowed to chain together add_predictors()
and add_outcomes()
functions. This is possble because these functions invisibly return a time_frame
.
If, however, you set output_file
to FALSE
, then your actual output is returned (rather than the time_frame
) so you cannot chain functions.
tf %>% add_rolling_predictors(variables = 'cr', # Note: You can supply a vector of variables lookback = hours(12), window = hours(6), stats = c(mean = mean, min = min, max = max, median = median, length = length)) %>% add_baseline_predictors(variables = 'cr', # add baseline creatinine lookback = days(90), offset = hours(10), stats = c(min = min)) %>% add_growing_predictors(variables = 'cr', # cumulative max creatinine since admission stats = c(max = max)) %>% add_rolling_predictors(category = 'med', # Note: category is always a regular expression lookback = days(7), stats = c(sum = sum)) %>% add_rolling_outcomes(variables = 'cr', lookahead = hours(24), stats = c(max = max))
You can provide combine_output()
with a set of data frames separated by commas. Or, you can provide a vector of file names using the files
argument. If you leave files
blank, it will automatically find all the .csv
files from the output_folder
of your time_frame
.
This resulting frame is essentially ready for modeling (using tidymodels
, for example). Make sure to keep individual patients in the same fold if you divide this dataset into multiple folds.
model_data = combine_output(tf) head(model_data)
If you want to simply test time_frame
, you may prefer not to write your output to file. You can accomplish this by setting output_file
to FALSE
.
tf %>% add_rolling_predictors(variables = 'cr', lookback = hours(12), window = hours(6), stats = c(mean = mean, min = min, max = max, median = median, length = length), output_file = FALSE) %>% head()
tf %>% add_rolling_predictors(variables = c('cr', 'med_aspirin'), lookback = weeks(1), stats = c(length = length), output_file = FALSE) %>% head()
tf %>% add_rolling_predictors(category = 'lab|med', lookback = hours(12), stats = c(length = length), output_file = FALSE) %>% head()
benchmark_results = list() # future::plan('multisession') benchmark_results[['multisession']] = microbenchmark::microbenchmark( tf %>% add_rolling_predictors(variable = 'cr', lookback = hours(48), window = hours(6), stats = c(mean = mean, min = min, max = max, median = median, length = length)), times = 1 )
tf_with_chunks = tf tf_with_chunks$chunk_size = 20 benchmark_results[['multisession with chunk_size 20']] = microbenchmark::microbenchmark( tf_with_chunks %>% add_rolling_predictors(variable = 'cr', lookback = hours(48), window = hours(6), stats = c(mean = mean, min = min, max = max, median = median, length = length)), times = 1 )
future::plan('sequential') benchmark_results[['sequential']] = microbenchmark::microbenchmark( tf %>% add_rolling_predictors(variable = 'cr', lookback = hours(48), window = hours(6), stats = c(mean = mean, min = min, max = max, median = median, length = length)), times = 1 )
benchmark_results
unlink(file.path(tempdir(), 'gpmodels_dir', '*.*'))
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.