bootstrap_helper: Bootstrap Observed Data and Simulate Under All Interventions

View source: R/bootstrap.R

bootstrap_helperR Documentation

Bootstrap Observed Data and Simulate Under All Interventions

Description

This internal function bootstraps the observed data (i.e., resamples the observed data set with replacement to construct bootstrap confidence intervals and standard errors). Then, the function simulates data using the resampled dataset to estimate the survival outcome, binary end-of-follow-up outcome, or continuous end-of-follow-up outcome.

Usage

bootstrap_helper(
  r,
  time_points,
  obs_data,
  bootseeds,
  outcome_type,
  intvars,
  interventions,
  int_times,
  ref_int,
  covparams,
  covnames,
  covtypes,
  covfits_custom,
  covpredict_custom,
  basecovs,
  histvars,
  histvals,
  histories,
  ymodel,
  ymodel_fit_custom,
  ymodel_predict_custom,
  yrestrictions,
  compevent_restrictions,
  restrictions,
  comprisk,
  compevent_model,
  time_name,
  outcome_name,
  compevent_name,
  ranges,
  parallel,
  ncores,
  max_visits,
  hazardratio,
  intcomp,
  boot_diag,
  nsimul,
  baselags,
  below_zero_indicator,
  min_time,
  show_progress,
  pb,
  int_visit_type,
  sim_trunc,
  ...
)

Arguments

r

Integer specifying the index of the current iteration of the bootstrap.

time_points

Number of time points to simulate.

obs_data

Data table containing the observed data.

bootseeds

Vector of integers specifying the seeds. One seed is used to initialize each bootstrap iteration.

outcome_type

Character string specifying the "type" of the outcome. The possible "types" are: "survival", "continuous_eof", and "binary_eof".

intvars

List, whose elements are vectors of character strings. The kth vector in intvars specifies the name(s) of the variable(s) to be intervened on in each round of the simulation under the kth intervention in interventions.

interventions

List, whose elements are lists of vectors. Each list in interventions specifies a unique intervention on the relevant variable(s) in intvars. Each vector contains a function implementing a particular intervention on a single variable, optionally followed by one or more "intervention values" (i.e., integers used to specify the treatment regime).

int_times

List, whose elements are lists of vectors. The kth list in int_times corresponds to the kth intervention in interventions. Each vector specifies the time points in which the relevant intervention is applied on the corresponding variable in intvars. When an intervention is not applied, the simulated natural course value is used. By default, this argument is set so that all interventions are applied in all time points.

ref_int

Integer denoting the intervention to be used as the reference for calculating the risk ratio and risk difference. 0 denotes the natural course, while subsequent integers denote user-specified interventions in the order that they are named in interventions.

covparams

List of vectors, where each vector contains information for one parameter used in the modeling of the time-varying covariates (e.g., model statement, family, link function, etc.). Each vector must be the same length as covnames and in the same order. If a parameter is not required for a certain covariate, it should be set to NA at that index.

covnames

Vector of character strings specifying the names of the time-varying covariates in obs_data.

covtypes

Vector of character strings specifying the "type" of each time-varying covariate included in covnames. The possible "types" are: "binary", "normal", "categorical", "bounded normal", "zero-inflated normal", "truncated normal", "absorbing", "categorical time", and "custom".

covfits_custom

Vector containing custom fit functions for time-varying covariates that do not fall within the pre-defined covariate types. It should be in the same order covnames. If a custom fit function is not required for a particular covariate (e.g., if the first covariate is of type "binary" but the second is of type "custom"), then that index should be set to NA.

covpredict_custom

Vector containing custom prediction functions for time-varying covariates that do not fall within the pre-defined covariate types. It should be in the same order as covnames. If a custom prediction function is not required for a particular covariate, then that index should be set to NA.

basecovs

Vector of character strings specifying the names of baseline covariates in obs_data.

histvars

List of vectors. The kth vector specifies the names of the variables for which the kth history function in histories is to be applied.

histvals

List of length two. The first element is a numeric vector specifying the lags used in the model statements (e.g., if lag1_varname and lag2_varname were included in the model statements, this vector would be c(1,2)). The second element is a numeric vector specifying the lag averages used in the model statements.

histories

Vector of history functions to apply to the variables specified in histvars.

ymodel

Model statement for the outcome variable.

ymodel_fit_custom

Function specifying a custom outcome model. See the vignette "Using Custom Outcome Models in gfoRmula" for details.

ymodel_predict_custom

Function obtaining predictions from the custom outcome model specified in ymodel_fit_custom. See the vignette "Using Custom Outcome Models in gfoRmula" for details.

yrestrictions

List of vectors. Each vector containins as its first entry a condition and its second entry an integer. When the condition is TRUE, the outcome variable is simulated according to the fitted model; when the condition is FALSE, the outcome variable takes on the value in the second entry.

compevent_restrictions

List of vectors. Each vector containins as its first entry a condition and its second entry an integer. When the condition is TRUE, the competing event variable is simulated according to the fitted model; when the condition is FALSE, the competing event variable takes on the value in the second entry.

restrictions

List of vectors. Each vector contains as its first entry a covariate for which a priori knowledge of its distribution is available; its second entry a condition under which no knowledge of its distribution is available and that must be TRUE for the distribution of that covariate given that condition to be estimated via a parametric model or other fitting procedure; its third entry a function for estimating the distribution of that covariate given the condition in the second entry is false such that a priori knowledge of the covariate distribution is available; and its fourth entry a value used by the function in the third entry. The default is NA.

comprisk

Logical scalar indicating the presence of a competing event.

compevent_model

Model statement for the competing event variable.

time_name

Character string specifying the name of the time variable in obs_data.

outcome_name

Character string specifying the name of the outcome variable in obs_data.

compevent_name

Character string specifying the name of the competing event variable in obs_data.

ranges

List of vectors. Each vector contains the minimum and maximum values of one of the covariates in covnames.

parallel

Logical scalar indicating whether to parallelize simulations of different interventions to multiple cores.

ncores

Integer specifying the number of cores to use in parallel simulation.

max_visits

A vector of one or more values denoting the maximum number of times a binary covariate representing a visit process may be missed before the individual is censored from the data (in the observed data) or a visit is forced (in the simulated data). Multiple values exist in the vector when the modeling of more than covariate is attached to a visit process. A value of NA should be provided when there is no visit process.

hazardratio

Logical scalar indicating whether the hazard ratio should be computed between two interventions.

intcomp

List of two numbers indicating a pair of interventions to be compared by a hazard ratio. The default is NA, resulting in no hazard ratio calculation.

boot_diag

Logical scalar indicating whether to return the coefficients, standard errors, and variance-covariance matrices of the parameters of the fitted models in the bootstrap samples. The default is FALSE.

nsimul

Number of subjects for whom to simulate data. By default, this argument is set equal to the number of subjects in obs_data.

baselags

Logical scalar for specifying the convention used for lagi and lag_cumavgi terms in the model statements when pre-baseline times are not included in obs_data and when the current time index, t, is such that t < i. If this argument is set to FALSE, the value of all lagi and lag_cumavgi terms in this context are set to 0 (for non-categorical covariates) or the reference level (for categorical covariates). If this argument is set to TRUE, the value of lagi and lag_cumavgi terms are set to their values at time 0. The default is FALSE.

below_zero_indicator

Logical scalar indicating whether the observed data set contains rows for time t < 0.

min_time

Numeric scalar specifying lowest value of time t in the observed data set.

show_progress

Logical scalar indicating whether to print a progress bar for the number of bootstrap samples completed in the R console. This argument is only applicable when parallel is set to FALSE and bootstrap samples are used (i.e., nsamples is set to a value greater than 0). The default is TRUE.

pb

Progress bar R6 object. See progress_bar for further details.

int_visit_type

Vector of logicals. The kth element is a logical specifying whether to carry forward the intervened value (rather than the natural value) of the treatment variables(s) when performing a carry forward restriction type for the kth intervention in interventions. When the kth element is set to FALSE, the natural value of the treatment variable(s) in the kth intervention in interventions will be carried forward. By default, this argument is set so that the intervened value of the treatment variable(s) is carried forward for all interventions.

sim_trunc

Logical scalar indicating whether to truncate simulated covariates to their range in the observed data set. This argument is only applicable for covariates of type "normal", "bounded normal", "truncated normal", and "zero-inflated normal".

...

Other arguments

Value

A list with the following components:

Result

Matrix containing risks over time under the natural course and under each user-specific intervention.

ResultRatio

Matrix containing risk ratios over time under the natural course and under each user-specific intervention.

ResultDiff

Matrix containing risk differences over time under the natural course and under each user-specific intervention.

bootcoeffs

List of the coefficients of the fitted models. If the argument boot_diag is set to FALSE, a value of NA is given.

bootstderrs

List of the standard errors of the coefficients of the fitted models. If the argument boot_diag is set to FALSE, a value of NA is given.


CausalInference/gfoRmula documentation built on Oct. 1, 2024, 8:36 p.m.