cohort: Create NONMEM ready datasets for simulated clinical PK trials

View source: R/cohort.R

cohortR Documentation

Create NONMEM ready datasets for simulated clinical PK trials

Description

The cohort() function is used to generate NONMEM-ready datasets for clinical PK trial simulations, either by sampling from real datasets or generating synthetic data.

Usage

cohort(
  data = NULL,
  include = NULL,
  n = NULL,
  obs_times = NULL,
  dose_times = NULL,
  amt = NULL,
  param = NULL,
  original_id = TRUE,
  pop_size = NULL,
  replace = FALSE,
  keep = NULL,
  tad = FALSE
)

Arguments

data

A data frame or data frame extension containing only one row per individual.

include

A character string in the form of a logical R statement, to specify the inclusion criteria for this cohort. For example, if "WT" and "HT" are variables corresponding to weight in kg and height in cm, respectively, you can sample only individuals below 50kg and below 150cm by writing: include = "WT < 50 & HT < 150"

n

Optionally, the number of patients enroll. When using existing data, cohort() will randomly sample n patients from your data.

obs_times

A numeric vector of observation times.

dose_times

A numeric vector of dosing times.

amt

Either a numeric fixed dose, or a function that computes a dose based on variables in your data (or which were specified in param). If using a function, it must be Vectorized, and the names of the variables in the function must match the names of the variables in data (or param). This function should be defined in the environment in which cohort is called.

param

If creating synthetic data, this is where you specify the distribution and parameters to use for random sampling. Supply a list with named fields, each of which corresponds to a variable. The value of each field should be another list, containing (in order): -the name of an R stats function for random sampling, e.g. "rnorm", "runif", "rlnorm", etc. -the arguments to the above function (except "n", which you will have already specified). For example, to create a normally-distributed random variable called "WT" with mean 16.3 and standard deviation 2.5, and a binomially distributed random variable called "HIV" with p = 0.34, write: param = list("WT" = list("rnorm", 16.3, 2.5), "HIV" = list("rbinom", 1, 0.34))

original_id

When TRUE, the default, cohort() will keep the same IDs as the input data. To create new IDs starting with 1, use original_id = TRUE.

pop_size

Optional. When generating synthetic data, cohort() will generate a population of size pop_size * n, and then randomly sample n individuals from it. The default value is 10.

replace

Optional. Whether to sample with replacement. Default: FALSE.

keep

Optional. Character vector of column names that you do not want converted to numeric.

tad

Optional. Whether to calculate time after dose (TAD).

Value

A tibble::tibble() in NONMEM format with dosing and observation records..

Real Data

To sample from an existing dataset, pass a dataframe or file name to data, and leave param unspecified.

Data will be filtered by the criteria specified in include, which must use only variable names present in the data. n individuals will then be sampled randomly.

Synthetic Data

To generate synthetic data, leave data unspecified and pass the details of the distributions from which you wish to sample to param. Currently, all variables are assumed to be independent.

Dosing and Observation Events

After sampling is finished, cohort() will create duplicate rows for each individual corresponding to the timepoints specified in dose_times and obs_times, and create a column "EVID" distinguishing between dosing and observation events.

Finally, the function (or fixed amount) passed to amt is used to calculate the dose for each individual at each dosing event. If using a function, it must be Vectorized, defined in the calling environment, and its arguments must match the names of variables used in your data. See Example 6 for details.

Author(s)

Sandy Floren

Examples

pop_example

# 1. Sampling 20 individuals, above 10 kg and below 120 cm, with a fixed dose of
# 200 mg, observing every 4 hours for one day and dosing at times 0, 5, and 12.
# Note that the data has columns called "WT" and "HT".

inc <- "WT > 10 & HT < 120"
ot <- seq(0, 24, by = 4)
dt <- c(0, 5, 12)

cohort(
  pop_example,
  include = inc,
  n = 20,
  obs_times = ot,
  dose_times = dt,
  amt = 200
)


# 2. Simulating data. We assume WT and HT are normally distributed random
# variables, with means and standard deviations of 16 and 3.4 for WT and 132
# and 13.6 for HT.

p1 <-
  list("WT" = list("rnorm", 16, 3.4),
       "HT" = list("rnorm", 132, 13.6))

cohort(
  param = p1,
  include = inc,
  n = 20,
  pop_size = 1000,
  obs_times = ot,
  dose_times = dt,
  amt = 200,
  original_id = FALSE
)

# 3. As in (2), except we now define a dosing function.

dose_fun <- function(WT) {
  ifelse(WT < 16, 150,
         ifelse(WT < 20, 200, 250))
}

cohort(
  param = p1,
  include = inc,
  n = 20,
  pop_size = 500,
  obs_times = ot,
  dose_times = dt,
  original_id = FALSE,
  amt = dose_fun
)

saviclab/savictools documentation built on Dec. 7, 2023, 11:56 p.m.