gformula_binary_eof: Estimation of Binary End-of-Follow-Up Outcome Under the...

Description Usage Arguments Value References See Also Examples

View source: R/gformula.R

Description

Based on an observed data set, this internal function estimates the outcome probability at end-of-follow-up under multiple user-specified interventions using the parametric g-formula. See Lin et al. (2019) for further details concerning the application and implementation of the parametric g-formula.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
gformula_binary_eof(
  obs_data,
  id,
  time_name,
  covnames,
  covtypes,
  covparams,
  covfits_custom = NA,
  covpredict_custom = NA,
  histvars = NULL,
  histories = NA,
  basecovs = NA,
  outcome_name,
  ymodel,
  intvars = NULL,
  interventions = NULL,
  int_times = NULL,
  int_descript = NULL,
  ref_int = 0,
  visitprocess = NA,
  restrictions = NA,
  yrestrictions = NA,
  baselags = FALSE,
  nsimul = NA,
  sim_data_b = FALSE,
  seed,
  nsamples = 0,
  parallel = FALSE,
  ncores = NA,
  ci_method = "percentile",
  threads,
  model_fits = FALSE,
  boot_diag = FALSE,
  show_progress = TRUE,
  ...
)

Arguments

obs_data

Data table containing the observed data.

id

Character string specifying the name of the ID variable in obs_data.

time_name

Character string specifying the name of the time variable in obs_data.

covnames

Vector of character strings specifying the names of the time-varying covariates in obs_data.

covtypes

Vector of character strings specifying the "type" of each time-varying covariate included in covnames. The possible "types" are: "binary", "normal", "categorical", "bounded normal", "zero-inflated normal", "truncated normal", "absorbing", "categorical time", and "custom".

covparams

List of vectors, where each vector contains information for one parameter used in the modeling of the time-varying covariates (e.g., model statement, family, link function, etc.). Each vector must be the same length as covnames and in the same order. If a parameter is not required for a certain covariate, it should be set to NA at that index.

covfits_custom

Vector containing custom fit functions for time-varying covariates that do not fall within the pre-defined covariate types. It should be in the same order covnames. If a custom fit function is not required for a particular covariate (e.g., if the first covariate is of type "binary" but the second is of type "custom"), then that index should be set to NA. The default is NA.

covpredict_custom

Vector containing custom prediction functions for time-varying covariates that do not fall within the pre-defined covariate types. It should be in the same order as covnames. If a custom prediction function is not required for a particular covariate, then that index should be set to NA. The default is NA.

histvars

List of vectors. The kth vector specifies the names of the variables for which the kth history function in histories is to be applied.

histories

Vector of history functions to apply to the variables specified in histvars. The default is NA.

basecovs

Vector of character strings specifying the names of baseline covariates in obs_data. These covariates are not simulated using a model but rather carry their value over all time points from the first time point of obs_data. These covariates should not be included in covnames. The default is NA.

outcome_name

Character string specifying the name of the outcome variable in obs_data.

ymodel

Model statement for the outcome variable.

intvars

List, whose elements are vectors of character strings. The kth vector in intvars specifies the name(s) of the variable(s) to be intervened on in each round of the simulation under the kth intervention in interventions.

interventions

List, whose elements are lists of vectors. Each list in interventions specifies a unique intervention on the relevant variable(s) in intvars. Each vector contains a function implementing a particular intervention on a single variable, optionally followed by one or more "intervention values" (i.e., integers used to specify the treatment regime).

int_times

List, whose elements are lists of vectors. The kth list in int_times corresponds to the kth intervention in interventions. Each vector specifies the time points in which the relevant intervention is applied on the corresponding variable in intvars. When an intervention is not applied, the simulated natural course value is used. By default, this argument is set so that all interventions are applied in all time points.

int_descript

Vector of character strings, each describing an intervention. It must be in same order as the entries in interventions.

ref_int

Integer denoting the intervention to be used as the reference for calculating the end-of-follow-up mean ratio and mean difference. 0 denotes the natural course, while subsequent integers denote user-specified interventions in the order that they are named in interventions. The default is 0.

visitprocess

List of vectors. Each vector contains as its first entry the covariate name of a visit process; its second entry the name of a covariate whose modeling depends on the visit process; and its third entry the maximum number of consecutive visits that can be missed before an individual is censored. The default is NA.

restrictions

List of vectors. Each vector contains as its first entry a covariate for which a priori knowledge of its distribution is available; its second entry a condition under which no knowledge of its distribution is available and that must be TRUE for the distribution of that covariate given that condition to be estimated via a parametric model or other fitting procedure; its third entry a function for estimating the distribution of that covariate given the condition in the second entry is false such that a priori knowledge of the covariate distribution is available; and its fourth entry a value used by the function in the third entry. The default is NA.

yrestrictions

List of vectors. Each vector containins as its first entry a condition and its second entry an integer. When the condition is TRUE, the outcome variable is simulated according to the fitted model; when the condition is FALSE, the outcome variable takes on the value in the second entry. The default is NA.

baselags

Logical scalar for specifying the convention used for lagi and lag_cumavgi terms in the model statements when pre-baseline times are not included in obs_data and when the current time index, t, is such that t < i. If this argument is set to FALSE, the value of all lagi and lag_cumavgi terms in this context are set to 0 (for non-categorical covariates) or the reference level (for categorical covariates). If this argument is set to TRUE, the value of lagi and lag_cumavgi terms are set to their values at time 0. The default is FALSE.

nsimul

Number of subjects for whom to simulate data. By default, this argument is set equal to the number of subjects in obs_data.

sim_data_b

Logical scalar indicating whether to return the simulated data set. If bootstrap samples are used (i.e., nsamples is set to a value greater than 0), this argument must be set to FALSE. The default is FALSE.

seed

Starting seed for simulations and bootstrapping.

nsamples

Integer specifying the number of bootstrap samples to generate. The default is 0.

parallel

Logical scalar indicating whether to parallelize simulations of different interventions to multiple cores.

ncores

Integer specifying the number of CPU cores to use in parallel simulation. This argument is required when parallel is set to TRUE. In many applications, users may wish to set this argument equal to parallel::detectCores() - 1.

ci_method

Character string specifying the method for calculating the bootstrap 95% confidence intervals, if applicable. The options are "percentile" and "normal".

threads

Integer specifying the number of threads to be used in data.table. See setDTthreads for further details.

model_fits

Logical scalar indicating whether to return the fitted models. Note that if this argument is set to TRUE, the output of this function may use a lot of memory. The default is FALSE.

boot_diag

Logical scalar indicating whether to return the coefficients, standard errors, and variance-covariance matrices of the parameters of the fitted models in the bootstrap samples. The default is FALSE.

show_progress

Logical scalar indicating whether to print a progress bar for the number of bootstrap samples completed in the R console. This argument is only applicable when parallel is set to FALSE and bootstrap samples are used (i.e., nsamples is set to a value greater than 0). The default is TRUE.

...

Other arguments, which are passed to the functions in covpredict_custom.

Value

An object of class gformula_binary_eof. The object is a list with the following components:

result

Results table containing the estimated outcome probability for all interventions (inculding natural course) at the last time point. If bootstrapping was used, the results table includes the bootstrap end-of-follow-up mean ratio, standard error, and 95% confidence interval.

coeffs

A list of the coefficients of the fitted models.

stderrs

A list of the standard errors of the coefficients of the fitted models.

vcovs

A list of the variance-covariance matrices of the parameters of the fitted models.

rmses

A list of root mean square error (RMSE) values of the fitted models.

fits

A list of the fitted models for the time-varying covariates and outcome. If model_fits is set to FALSE, a value of NULL is given.

sim_data

A list of data tables of the simulated data. Each element in the list corresponds to one of the interventions. If the argument sim_data_b is set to FALSE, a value of NA is given.

bootcoeefs

A list, where the kth element is a list containing the coefficients of the fitted models corresponding to the kth bootstrap sample. If boot_diag is set to FALSE, a value of NULL is given.

bootstderrs

A list, where the kth element is a list containing the standard errors of the coefficients of the fitted models corresponding to the kth bootstrap sample. If boot_diag is set to FALSE, a value of NULL is given.

bootvcovs

A list, where the kth element is a list containing the variance-covariance matrices of the parameters of the fitted models corresponding to the kth bootstrap sample. If boot_diag is set to FALSE, a value of NULL is given.

...

Some additional elements.

The results for the g-formula simulation under various interventions for the last time point are printed with the print.gformula_binary_eof function. To generate graphs comparing the mean estimated and observed covariate values over time, use the plot.gformula_binary_eof function.

References

McGrath S, Lin V, Zhang Z, Petito LC, Logan RW, Hernán MA, and JG Young. gfoRmula: An R package for estimating the effects of sustained treatment strategies via the parametric g-formula. Patterns. 2020;1:100008.

Robins JM. A new approach to causal inference in mortality studies with a sustained exposure period: application to the healthy worker survivor effect. Mathematical Modelling. 1986;7:1393–1512. [Errata (1987) in Computers and Mathematics with Applications 14, 917.-921. Addendum (1987) in Computers and Mathematics with Applications 14, 923-.945. Errata (1987) to addendum in Computers and Mathematics with Applications 18, 477.].

See Also

gformula

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
## Estimating the effect of threshold interventions on the mean of a binary
## end of follow-up outcome

id <- 'id_num'
time_name <- 'time'
covnames <- c('cov1', 'cov2', 'treat')
outcome_name <- 'outcome'
histories <- c(lagged, cumavg)
histvars <- list(c('treat', 'cov1', 'cov2'), c('cov1', 'cov2'))
covtypes <- c('binary', 'zero-inflated normal', 'normal')
covparams <- list(covmodels = c(cov1 ~ lag1_treat + lag1_cov1 + lag1_cov2 + cov3 +
                                  time,
                                cov2 ~ lag1_treat + cov1 + lag1_cov1 + lag1_cov2 +
                                  cov3 + time,
                                treat ~ lag1_treat + cumavg_cov1 +
                                  cumavg_cov2 + cov3 + time))
ymodel <- outcome ~  treat + cov1 + cov2 + lag1_cov1 + lag1_cov2 + cov3
intvars <- list('treat', 'treat')
interventions <- list(list(c(static, rep(0, 7))),
                      list(c(threshold, 1, Inf)))
int_descript <- c('Never treat', 'Threshold - lower bound 1')
nsimul <- 10000
ncores <- 2

gform_bin_eof <- gformula_binary_eof(obs_data = binary_eofdata, id = id,
                                     time_name = time_name,
                                     covnames = covnames,
                                     outcome_name = outcome_name,
                                     covtypes = covtypes,
                                     covparams = covparams,
                                     ymodel = ymodel,
                                     intvars = intvars,
                                     interventions = interventions,
                                     int_descript = int_descript,
                                     histories = histories, histvars = histvars,
                                     basecovs = c("cov3"), seed = 1234,
                                     parallel = TRUE, nsamples = 5,
                                     nsimul = nsimul, ncores = ncores)
gform_bin_eof

gfoRmula documentation built on July 13, 2021, 9:07 a.m.