simulate: Simulate Counterfactual Outcomes Under Intervention

View source: R/simulate.R

simulateR Documentation

Simulate Counterfactual Outcomes Under Intervention

Description

This internal function simulates a new dataset containing covariates, outcome probabilities, competing event probabilities (if any), outcomes, and competing events (if any) based on an observed dataset and a user-specified intervention.

Usage

simulate(
  o,
  fitcov,
  fitY,
  fitD,
  ymodel_predict_custom,
  yrestrictions,
  compevent_restrictions,
  restrictions,
  outcome_name,
  compevent_name,
  time_name,
  intvars,
  interventions,
  int_times,
  histvars,
  histvals,
  histories,
  comprisk,
  ranges,
  outcome_type,
  subseed,
  obs_data,
  time_points,
  parallel,
  covnames,
  covtypes,
  covparams,
  covpredict_custom,
  basecovs,
  max_visits,
  baselags,
  below_zero_indicator,
  min_time,
  show_progress,
  pb,
  int_visit_type,
  sim_trunc,
  ...
)

Arguments

o

Integer specifying the index of the current intervention.

fitcov

List of model fits for the time-varying covariates.

fitY

Model fit for the outcome variable.

fitD

Model fit for the competing event variable, if any.

ymodel_predict_custom

Function obtaining predictions from the custom outcome model specified in ymodel_fit_custom. See the vignette "Using Custom Outcome Models in gfoRmula" for details.

yrestrictions

List of vectors. Each vector containins as its first entry a condition and its second entry an integer. When the condition is TRUE, the outcome variable is simulated according to the fitted model; when the condition is FALSE, the outcome variable takes on the value in the second entry.

compevent_restrictions

List of vectors. Each vector containins as its first entry a condition and its second entry an integer. When the condition is TRUE, the competing event variable is simulated according to the fitted model; when the condition is FALSE, the competing event variable takes on the value in the second entry.

restrictions

List of vectors. Each vector contains as its first entry a covariate for which a priori knowledge of its distribution is available; its second entry a condition under which no knowledge of its distribution is available and that must be TRUE for the distribution of that covariate given that condition to be estimated via a parametric model or other fitting procedure; its third entry a function for estimating the distribution of that covariate given the condition in the second entry is false such that a priori knowledge of the covariate distribution is available; and its fourth entry a value used by the function in the third entry. The default is NA.

outcome_name

Character string specifying the name of the outcome variable in obs_data.

compevent_name

Character string specifying the name of the competing event variable in obs_data.

time_name

Character string specifying the name of the time variable in obs_data.

intvars

List, whose elements are vectors of character strings. The kth vector in intvars specifies the name(s) of the variable(s) to be intervened on in each round of the simulation under the kth intervention in interventions.

interventions

List, whose elements are lists of vectors. Each list in interventions specifies a unique intervention on the relevant variable(s) in intvars. Each vector contains a function implementing a particular intervention on a single variable, optionally followed by one or more "intervention values" (i.e., integers used to specify the treatment regime).

int_times

List, whose elements are lists of vectors. The kth list in int_times corresponds to the kth intervention in interventions. Each vector specifies the time points in which the relevant intervention is applied on the corresponding variable in intvars. When an intervention is not applied, the simulated natural course value is used. By default, this argument is set so that all interventions are applied in all time points.

histvars

List of vectors. The kth vector specifies the names of the variables for which the kth history function in histories is to be applied.

histvals

List of length two. The first element is a numeric vector specifying the lags used in the model statements (e.g., if lag1_varname and lag2_varname were included in the model statements, this vector would be c(1,2)). The second element is a numeric vector specifying the lag averages used in the model statements.

histories

Vector of history functions to apply to the variables specified in histvars.

comprisk

Logical scalar indicating the presence of a competing event.

ranges

List of vectors. Each vector contains the minimum and maximum values of one of the covariates in covnames.

outcome_type

Character string specifying the "type" of the outcome. The possible "types" are: "survival", "continuous_eof", and "binary_eof".

subseed

Integer specifying the seed for this simulation.

obs_data

Data table containing the observed data.

time_points

Number of time points to simulate.

parallel

Logical scalar indicating whether to parallelize simulations of different interventions to multiple cores.

covnames

Character string specifying the name of the competing event variable in obs_data.

covtypes

Vector of character strings specifying the "type" of each time-varying covariate included in covnames. The possible "types" are: "binary", "normal", "categorical", "bounded normal", "zero-inflated normal", "truncated normal", "absorbing", "categorical time", and "custom".

covparams

List of vectors, where each vector contains information for one parameter used in the modeling of the time-varying covariates (e.g., model statement, family, link function, etc.). Each vector must be the same length as covnames and in the same order. If a parameter is not required for a certain covariate, it should be set to NA at that index.

covpredict_custom

Vector containing custom prediction functions for time-varying covariates that do not fall within the pre-defined covariate types. It should be in the same order as covnames. If a custom prediction function is not required for a particular covariate, then that index should be set to NA.

basecovs

Vector of character strings specifying the names of baseline covariates in obs_data.

max_visits

A vector of one or more values denoting the maximum number of times a binary covariate representing a visit process may be missed before the individual is censored from the data (in the observed data) or a visit is forced (in the simulated data). Multiple values exist in the vector when the modeling of more than covariate is attached to a visit process.

baselags

Logical scalar for specifying the convention used for lagi and lag_cumavgi terms in the model statements when pre-baseline times are not included in obs_data and when the current time index, t, is such that t < i. If this argument is set to FALSE, the value of all lagi and lag_cumavgi terms in this context are set to 0 (for non-categorical covariates) or the reference level (for categorical covariates). If this argument is set to TRUE, the value of lagi and lag_cumavgi terms are set to their values at time 0. The default is FALSE.

below_zero_indicator

Logical scalar indicating whether the observed data set contains rows for time t < 0.

min_time

Numeric scalar specifying lowest value of time t in the observed data set.

show_progress

Logical scalar indicating whether to print a progress bar for the number of bootstrap samples completed in the R console. This argument is only applicable when parallel is set to FALSE and bootstrap samples are used (i.e., nsamples is set to a value greater than 0). The default is TRUE.

pb

Progress bar R6 object. See progress_bar for further details.

int_visit_type

Vector of logicals. The kth element is a logical specifying whether to carry forward the intervened value (rather than the natural value) of the treatment variables(s) when performing a carry forward restriction type for the kth intervention in interventions. When the kth element is set to FALSE, the natural value of the treatment variable(s) in the kth intervention in interventions will be carried forward. By default, this argument is set so that the intervened value of the treatment variable(s) is carried forward for all interventions.

sim_trunc

Logical scalar indicating whether to truncate simulated covariates to their range in the observed data set. This argument is only applicable for covariates of type "normal", "bounded normal", "truncated normal", and "zero-inflated normal".

...

Other arguments, which are passed to the functions in covpredict_custom.

Value

A data table containing simulated data under the specified intervention.


gfoRmula documentation built on Oct. 1, 2024, 9:06 a.m.