R/ICE_stratify.R

Defines functions ice_strat

#' ICE Stratifying Treatment History Estimator including Both Classical, Hazard-Based, and Doubly Robust Version for A Single
#' Intervention Strategy
#'
#' This function estimates the risk over time for survival outcome using the given observed data set following
#' a single user-defined intervention strategy by the parametric g-formula iterative conditional expectation (ICE)
#' estimator. This function provides the classical ICE stratifying treatment history method,
#' the hazard based ICE stratifying treatment history method, and doubly robust weighted ICE method.
#'
#' @param data a data frame containing the observed data in long format.
#' @param K a numerical value that indicates the total number of time points.
#' @param id a character string indicating the ID variable name in \code{data}.
#' @param time_name a character string indicating the time variable name in \code{data}.
#' @param outcome_name a character string indicating the outcome variable name in \code{data}.
#' @param censor_name a character string indicating the censor variable name in \code{data}.
#' Could be \code{NULL} if there is no censoring variable in \code{data}. Default is \code{NULL}.
#' @param competing_name a character string indicating the competing variable name in \code{data}.
#' Could be \code{NULL} if there is no competing variable in \code{data}. Default is \code{NULL}.
#' @param total_effect a logical indicating how the competing event is handled. TRUE for total effect. FALSE for direct effect.
#' @param outcome_model a formula specifying the model statement for the outcome model.
#' @param censor_model a formula specifying the model statement for the censoring model in IP weighted natural course risk.
#' Could be \code{NULL} if censoring variable \code{censor_name} is \code{NULL}. Default is \code{NULL}.
#' @param competing_model a formula specifying the model statement for the competing model in hazard based ICE estimator.
#' Could be \code{NULL} for classical ICE estimator. Default is \code{NULL}.
#' @param hazard_model a formula specifying the model statement for the hazard model for hazard-based ICE estimator. Default is \code{NULL}.
#' @param interventions a list of functions indicating intervention strategy for the intervention variables in \code{intervention_names}.
#' Interventions could be specified using the following functions:
#' \itemize{
#' \item{Always treat:} {\code{static(1)} }
#' \item{Never treat:} {\code{static(0)} }
#' \item{Natural course:} {\code{natural_course()} }
#' \item{Dynamic treat:} {\code{dynamic(condition, strategy_before, strategy_after, absorb)} }
#' \item{Threshold Intervention:} {\code{threshold(upper_bound, lower_bound)}}
#' \item{Grace period:} {\code{grace_period(type, nperiod, condition)} }
#' }
#' For more details, please read the corresponding documentation of each intervention function.
#' If the length of the \code{interventions} list is larger than 1, then the function will process them as multiple treatments simultaneously.
#' The user defined intervention functions must return the intervened value as a vector in the length of the number of rows in \code{data}.
#' The user defined intervention functions in \code{interventions} must be in corresponding order of \code{intervention_names}.
#' The same logic applies for the built in intervention functions.
#' @param intervention_names a list of character strings indicating the intervention variable names in correspondence with the intervention strategies in \code{interventions}.
#' Length of the vector must be in the same size of \code{interventions}.
#' @param compute_nc_risk a logical to indicate whether to compute observed natural course risk. TRUE for observed natural course risk computation. FALSE for no natural course risk computation. Default is TRUE.
#' @param hazard_based a logical indicating whether to use hazard-based ICE estimator or classical ICE estimator. TRUE for hazard-based estimator. FALSE for classical estimator.
#' @param weighted a logical indicating whether to use weighted ICE estimtor. TRUE for doubly robust weighted ICE estimator. FALSE for singly robust ICE estimator. Note, the doubly robust version uses the
#' classical stratified ICE. Default is FALSE.
#' @param treat_model a list of formulae specifying the model statement for the corresponding treatment variable used in the doubly robust ICE estimator. The length of list must match with
#' the length of \code{obs_treatment_names}. Multiple treatments passed in \code{obs_treatment_names} are allowed and must follow the corresponding order as in \code{treat_model_covar}.
#' @param obs_treatment_names a list of character strings specifying the treatment variables to be used in the model for observed treatments of the weighted ICE estimator.
#' @param intervention_times a list of numerics indicating the time points to which the specified intervention is applied. Default to be \code{NULL}.
#' @param intervention_description a character string specifying the description of the specified intervention strategy.
#' @param verbose a logical specifying whether progress of the algorithm is printed.
#' 
#' @return A list containing the following components:
#' \item{gformula_risk_last_time} {The estimated risk for the specified intervention(s) at the last time step.}
#' \item{gformula_risk} {A table containing the estimated risk for the specified intervention(s) at each time step.}
#' \item{weight_h} {A list containing the inverse probability weighted estimates of the observed risk. Could be \code{NULL} if \code{compute_nc_risk} is FALSE.}
#' \item{ipw_model} {A model object for the probability of censoring used in the inverse probability weighted estimate of the observed risk. Could be \code{NULL} if \code{compute_nc_risk} is FALSE.}
#' \item{np_model} {A model object for the probability of having competing event used in the inverse probability weighted estimate of the observed risk. Could be \code{NULL} if \code{compute_nc_risk} is FALSE.}
#' \item{outcome_init}{A list containing the fitted models with the summary, standard errors of the coefficients, variance-covariance matrices of the parameters, and the root mean square error (RMSE) values for the outcome model of the first iteration in the ICE algorithm.}
#' \item{comp_init}{A list containing the fitted models with the summary, standard errors of the coefficients, variance-covariance matrices of the parameters, and the root mean square error (RMSE) values for the competing model of the first iteration in the ICE algorithm (if applicable). Could be \code{NULL} if \code{competing_name} is \code{NULL}.}
#' \item{outcome_by_step}{A list containing the fitted models with the summary, standard errors of the coefficients, variance-covariance matrices of the parameters, and the root mean square error (RMSE) values for the outcome model of each subsequent iteration in the ICE algorithm.}
#' \item{comp_by_step}{A list containing the fitted models with the summary, standard errors of the coefficients, variance-covariance matrices of the parameters, and the root mean square error (RMSE) values for the competing model of each subsequent iteration in the ICE algorithm (if applicable). Could be \code{NULL} if \code{competing_name} is \code{NULL}.}
#' \item{hazard_by_step}{A list containing the fitted models with the summary, standard errors of the coefficients, variance-covariance matrices of the parameters, and the root mean square error (RMSE) values for the hazard model of each iteration in the ICE algorithm (if applicable). Could be \code{NULL} if \code{hazard_based} is FALSE.}
#' @import magrittr reshape2 nnet speedglm dplyr stringr
#' @importFrom stats reshape
#' @importFrom stats quasibinomial
#' @importFrom stats fitted
#' @importFrom data.table data.table
#'
#' @references Wen L, Young JG, Robins JM, Hernán MA. Parametric g-formula implementations for causal survival analyses. Biometrics. 2021;77(2):740-753.
#' @references McGrath S, Lin V, Zhang Z, Petito LC, Logan RW, Hernán MA, and JG Young. gfoRmula: An R package for estimating the effects of sustained treatment strategies via the parametric g-formula. Patterns. 2020;1:100008.
#' @references Young JG, Herńan MA, Robins JM. Identification, estimation and approximation of risk under interventions that depend on the natural value of treatment using observational data. Epidemiologic Methods. 2014;3(1):1-19.
#' @references Young JG, Vatsa R, Murray EJ, Hernán MA. Interval-cohort designs and bias in the estimation of per-protocol effects: a simulation study. Trials. 2019;20(1):552.
#' @references Díaz, I, Williams, N, Hoffman, KL, & Schenck, EJ. Nonparametric causal effects based on longitudinal modified treatment policies. Journal of the American Statistical Association. 2021;118(542), 846–857.
#' @references Young JG, Stensrud MJ, Tchetgen Tchetgen EJ, Hernán MA. A causal framework for classical statistical estimands in failure-time settings with competing events. Statistics in medicine. 2020;39(8):1199-1236.
#' @references Wen L, Hernán MA, Robins JM. Multiply robust estimators of causal effects for survival outcomes. Scandinavian journal of statistics, theory and applications. 2022;49(3):1304-1328.
#' @references Haneuse S, Rotnitzky A. Estimation of the effect of interventions that modify the received treatment. Statistics in medicine. 2013;32(30):5260-5277.
#' @references McGrath S, Young JG, Hernán MA. Revisiting the g-null Paradox. Epidemiology. 2022;33(1):114-120.
#' 
#' @noRd
#'
#' @examples
#'
#' # Estimate risks for the always treat intervention
#' # (i.e. constantly treat over all time points)
#' # Hazard based stratified ICE, direct effect for competing event
#'
#' ice_static <- ice_strat(data = data, K = 5, id = "id",
#' time_name = "t0", outcome_name = "Y",
#' competing_name = "D", censor_name = "C",
#' total_effect = FALSE,
#' outcome_model = Y ~ L1 + L2,
#' censor_model = C ~ L1 + L2,
#' competing_model = D ~ L1 + L2,
#' hazard_model = Y ~ L1 + L2,
#' interventions = list(static(1)),
#' intervention_names = list("A"),
#' hazard_based = TRUE,
#' obs_treatment_names = list("A"),
#' intervention_description = "Always Treat")
#'
#' ice_static
#'
#' # Estimate risks for both the observed inverse probability weighted
#' # natural course risk and natural course strategy
#' # Classical stratified ICE, total effect for competing event
#'
#' ice_natural_course <- ice_strat(data = data, K = 5, id = "id",
#' time_name = "t0", outcome_name = "Y",
#' competing_name = "D", censor_name = "C",
#' total_effect = TRUE,
#' outcome_model = Y ~ L1 + L2,
#' censor_model = C ~ L1 + L2,
#' interventions = list(natural_course()),
#' intervention_names = list("A"),
#' compute_nc_risk = TRUE,
#' hazard_based = FALSE,
#' obs_treatment_names = list("A"),
#' intervention_description = "Natural Course")
#'
#' ice_natural_course
#'
#' # Estimate risks for the dynamic treat intervention based on the covariate L1 > 0
#' # (i.e. treat when L1 > 0 and absorbing once one initiates treatment;
#' # not treat otherwise)
#' # Classical stratified ICE, direct effect for competing event
#'
#' ice_dynamic <- ice_strat(data = data, K = 5, id = "id",
#' time_name = "t0", outcome_name = "Y",
#' competing_name = "D", censor_name = "C",
#' total_effect = FALSE,
#' outcome_model = Y ~ L1 + L2,
#' censor_model = C ~ L1 + L2,
#' interventions = list(dynamic("L1 > 0", static(0), static(1), absorb = TRUE)),
#' intervention_names = list("A"),
#' hazard_based = FALSE,
#' obs_treatment_names = list("A"),
#' intervention_description = "Dynamic Treat")
#'
#' ice_dynamic
#'
#' # Estimate risks for the always treat intervention
#' # (i.e. constantly treat over all time points)
#' # Weighted ICE, competing event as total effect,
#' # with L1 included in the treatment model for A
#' # treatment model: A ~ L1
#'
#' ice_weighted <- ice_strat(data = data, K = 5, id = "id",
#' time_name = "t0", outcome_name = "Y",
#' competing_name = "D", censor_name = "C",
#' total_effect = TRUE,
#' outcome_model = Y ~ L1 + L2,
#' censor_model = C ~ L1 + L2,
#' competing_model = D ~ L1 + L2,
#' interventions = list(static(1)),
#' intervention_names = list("A"),
#' weighted = TRUE, treat_model = list(A ~ L1),
#' obs_treatment_names = list("A"),
#' intervention_description = "Always Treat")
#'
#' ice_weighted
#'

ice_strat <- function(data, K, id, time_name, outcome_name,
                      censor_name = NULL, competing_name = NULL, total_effect,
                      outcome_model, censor_model = NULL, competing_model = NULL,
                      hazard_model = NULL,
                      interventions, intervention_names, intervention_times = NULL,
                      compute_nc_risk = TRUE, hazard_based, weighted = FALSE,
                      treat_model, obs_treatment_names,
                      intervention_description, verbose = TRUE) {

  ## 0. some pre-processing

  outcome_varname <- outcome_name
  competing_varname <- competing_name
  censor_varname <- censor_name
  intervention_varnames <- intervention_names
  time0 <- time_name
  obs_treatment_varnames <- obs_treatment_names
  all_treat_vars <- unlist(obs_treatment_varnames)

  fit_all <- fit_summary <- fit_stderr <- fit_vcov <- fit_rmse <- c()

  if (is.null(intervention_times) | (length(intervention_times[[1]]) == 0)) {
    intervention_times <- list()
    for (i in 1:length(intervention_varnames[[1]])) {
      intervention_times <- c(intervention_times, list(0:(K-1)))
    }
    intervention_times <- list(intervention_times)
  }

  if (weighted) {
    hazard_based <- FALSE
    treat_model_covar <- list()
    for (l in 1:length(treat_model)) {
      treat_model_covar <- c(treat_model_covar, list(str_remove_all(unlist(str_split(as.character(treat_model[[l]])[3], "[+]")), " ")))
    }
  }

  outcome_covar <- str_remove_all(unlist(str_split(as.character(outcome_model)[3], "[+]")), " ")
  censor_covar <- str_remove_all(unlist(str_split(as.character(censor_model)[3], "[+]")), " ")
  competing_covar <- str_remove_all(unlist(str_split(as.character(competing_model)[3], "[+]")), " ")
  hazard_covar <- str_remove_all(unlist(str_split(as.character(hazard_model)[3], "[+]")), " ")


  natural_course_type <- ifelse(is.null(censor_varname), "mean", "ipw")

  if (!is.null(competing_varname) & total_effect & hazard_based) {

    data[, outcome_varname] <- ifelse(data[, competing_varname] == 1, 0, data[, outcome_varname])

  }
  
  ## first create lag terms
  
  outcome_covar_lags <- outcome_covar[str_detect(outcome_covar, "lag[0-9999]_")]
  censor_covar_lags <- censor_covar[str_detect(censor_covar, "lag[0-9999]_")]
  competing_covar_lags <- competing_covar[str_detect(competing_covar, "lag[0-9999]_")]
  hazard_covar_lags <- hazard_covar[str_detect(hazard_covar, "lag[0-9999]_")]
  
  all_lags <- c(outcome_covar_lags, censor_covar_lags, competing_covar_lags, hazard_covar_lags)
  all_lags <- unique(all_lags)
  all_lags <- na.omit(all_lags)
  
  if (length(all_lags) > 0) {
  
  data$factor_id <- as.factor(data[, id])
  
  for (i in 1:length(all_lags)) {
    ilag <- all_lags[i]
    
    if (str_detect(ilag, "^[a-zA-Z._]+\\(")) {
      ilag <- str_split(ilag, ",")[[1]][1]
      ilag <- str_split(ilag, "\\(")[[1]][2]
      ilag <- str_split(ilag, "\\)")[[1]][1]
    }
    
    lag_components <- str_split(ilag, "_")
    lag_unit <- as.numeric(str_replace_all(lag_components[[1]][1], "lag", ""))
    lag_var <- lag_components[[1]][2]
    
    
    group_dta <- group_by(data, "factor_id")
    group_dta[, "lag_var_new"] <- data[, lag_var]
    group_dta <- group_dta %>%
      mutate(lagged_var = dplyr::lag(group_dta$lag_var_new, n = lag_unit, default = 0))
    
    data[, ilag] <- group_dta$lagged_var
    
  }
  }
  
  ## need to rebuild covars by calling the transform functions
  ## outcome covar
  outcome_covar_new <- c()
  data_add <- data
  for (i in 1:length(outcome_covar)) {
    
    icovar <- outcome_covar[i]
    
    column_name <- get_column_name_covar(icovar)
    
    if (!column_name %in% colnames(data)) {
      old_ncol <- ncol(data_add)
      data_add <- data %>% mutate(eval(parse(text = icovar)))
      new_ncol <- ncol(data_add)
      
      new_column <- data_add[, new_ncol]
      new_column <- as.matrix(new_column)
      
      ## create new columns
      
      if (ncol(new_column) > 1) {
        multiple_varname <- paste0(paste0(column_name, "."), 1:(ncol(new_column)))
        data[, multiple_varname] <- new_column
        outcome_covar_new <- c(outcome_covar_new, multiple_varname)
      } else {
        data[, column_name] <- as.vector(new_column)
        outcome_covar_new <- c(outcome_covar_new, column_name)
      }
    } else {
      outcome_covar_new <- c(outcome_covar_new, column_name)
    }
  }
  
  ## censor covar
  censor_covar_new <- c()
  if (!is.null(censor_varname) & !is.null(censor_model)) {
  data_add <- data
  for (i in 1:length(censor_covar)) {
    
    icovar <- censor_covar[i]
    
    column_name <- get_column_name_covar(icovar)
    
    if (!column_name %in% colnames(data)) {
      old_ncol <- ncol(data_add)
      data_add <- data %>% mutate(eval(parse(text = icovar)))
      new_ncol <- ncol(data_add)
      
      new_column <- data_add[, new_ncol]
      new_column <- as.matrix(new_column)
      
      ## create new columns
      
      if (ncol(new_column) > 1) {
        multiple_varname <- paste0(paste0(column_name, "."), 1:(ncol(new_column)))
        data[, multiple_varname] <- new_column
        censor_covar_new <- c(censor_covar_new, multiple_varname)
      } else {
        data[, column_name] <- as.vector(new_column)
        censor_covar_new <- c(censor_covar_new, column_name)
      }
    } else {
      censor_covar_new <- c(censor_covar_new, column_name)
    }
  }
  }
  
  ## competing covar
  competing_covar_new <- c()
  if (!is.null(competing_varname) & !is.null(competing_model)) {
  data_add <- data
  for (i in 1:length(competing_covar)) {
    
    icovar <- competing_covar[i]
    
    column_name <- get_column_name_covar(icovar)
    
    if (!column_name %in% colnames(data)) {
      old_ncol <- ncol(data_add)
      data_add <- data %>% mutate(eval(parse(text = icovar)))
      new_ncol <- ncol(data_add)
      
      new_column <- data_add[, new_ncol]
      new_column <- as.matrix(new_column)
      
      ## create new columns
      
      if (ncol(new_column) > 1) {
        multiple_varname <- paste0(paste0(column_name, "."), 1:(ncol(new_column)))
        data[, multiple_varname] <- new_column
        competing_covar_new <- c(competing_covar_new, multiple_varname)
      } else {
        data[, column_name] <- as.vector(new_column)
        competing_covar_new <- c(competing_covar_new, column_name)
      }
    } else {
      competing_covar_new <- c(competing_covar_new, column_name)
    }
  }
  }
  
  ## hazard covar
  hazard_covar_new <- c()
  
  if (hazard_based) {
    data_add <- data
    for (i in 1:length(hazard_covar)) {
      
      icovar <- hazard_covar[i]
      
      column_name <- get_column_name_covar(icovar)
      
      if (!column_name %in% colnames(data)) {
        old_ncol <- ncol(data_add)
        data_add <- data %>% mutate(eval(parse(text = icovar)))
        new_ncol <- ncol(data_add)
        
        new_column <- data_add[, new_ncol]
        new_column <- as.matrix(new_column)
        
        ## create new columns
        
        if (ncol(new_column) > 1) {
          multiple_varname <- paste0(paste0(column_name, "."), 1:(ncol(new_column)))
          data[, multiple_varname] <- new_column
          hazard_covar_new <- c(hazard_covar_new, multiple_varname)
        } else {
          data[, column_name] <- as.vector(new_column)
          hazard_covar_new <- c(hazard_covar_new, column_name)
        }
      } else {
        hazard_covar_new <- c(hazard_covar_new, column_name)
      }
    }
    
    hazard_covar <- hazard_covar_new
  }
  
  
  
  ## replace the old covar
  outcome_covar <- outcome_covar_new
  censor_covar <- censor_covar_new
  competing_covar <- competing_covar_new


  ## 1. compute the IPW hazard for natural course
  np_model <- c()
  if (compute_nc_risk) {

    obs_treatment_varname <- intervention_varnames[[1]]
    
    censor_covar_nc <- censor_covar
    
    if (!is.null(competing_varname)) {
      
      if (is.null(competing_model)) {
        competing_covar_nc <- outcome_covar
      } else {
        competing_covar_nc <- competing_covar
      }

      if (total_effect == FALSE) {
        competing_formula <- as.formula(paste0(competing_varname, "~",
                                               paste0(competing_covar_nc, collapse = "+")))
      competing_fit <- speedglm(competing_formula, data = data, family = binomial())

      ## add in this competing fit

      fit_np <- fit_np_summary <- fit_np_stderr <- fit_np_vcov <- fit_np_rmse <- c()
      
      this_np_fit <- list(competing_fit)
      this_np_summary <- list(get_summary(competing_fit))
      this_np_stderr <- list(get_stderr(competing_fit))
      this_np_vcov <- list(get_vcov(competing_fit))
      this_np_rmse <- list(get_rmse(competing_fit))
      
      fit_np <- c(fit_np, this_np_fit)
      fit_np_summary <- c(fit_np_summary, this_np_summary)
      fit_np_stderr <- c(fit_np_stderr, this_np_stderr)
      fit_np_vcov <- c(fit_np_vcov, this_np_vcov)
      fit_np_rmse <- c(fit_np_rmse, this_np_rmse)
      
      np_model <- list(fit = fit_np, 
                       summary = fit_np_summary, 
                       stderr = fit_np_stderr, 
                       vcov = fit_np_stderr, 
                       rmse = fit_np_rmse)

      }
    }

    nc_compute <- natural_course_ipweighted(data, id, censor_varname,
                                            K, time0, outcome_varname, censor_covar_nc,
                                            competing_varname, competing_fit,
                                            total_effect)

    risk_weighted <- nc_compute$risk_weighted
    logit_censor <- nc_compute$logit_censor
  } else {
    risk_weighted <- logit_censor <- NULL
  }

  ## 2. compute the intervention for each treatment variable

  if (is.null(censor_varname)) {
    data$C <- 0
    censor_varname <- "C"
    null_censor <- TRUE
  } else {
    null_censor <- FALSE
  }

  abar_all <- list()
  data <- as.data.frame(data)

  assign.global(data, "interv_data")

  for (i in 1:length(intervention_varnames[[1]])) {
    
    treat <- i
    treatment_varname <- intervention_varnames[[1]][[treat]]

    intervention_f <- interventions[[1]][[treat]]

    if (any(str_detect(as.character(substitute(interventions[[treat]])), "grace_period"))) {
      my.arrayofA <- paste0("interv_it_", treatment_varname)
    } else {
      interv_it <- intervention_f
      interv_data[, paste0("interv_it_", treatment_varname, "_", treat)] <- interv_it
      my.arrayofA <- paste0("interv_it_", treatment_varname, "_", treat)
    }
    
    ## check if this treatment variable is lagged
    
    lag_treat_ind <- str_detect(unique(c(outcome_covar, censor_covar, competing_covar)), paste0("lag[0-9999]_", treatment_varname))

    if (any(lag_treat_ind)) {
      
      stop("Lagging treatment variables in stratified ICE is invalid. It is valid in pooled ICE.")
    #   
    #   lag_treat_names <- unique(c(outcome_covar, censor_covar, competing_covar))[lag_treat_ind]
    #   # lag_treat_names_list <- list(treatment_varname = lag_treat_names)
    #   all_treat_lags <- c(all_treat_lags, lag_treat_names)
    #   
    #   lags <- sapply(str_split(lag_treat_names, "_"), function(x) {as.numeric(str_replace(x[1], "lag", ""))})
    #   
    #   for (i in 1:length(lags)) {
    #     lag_unit <- lags[i]
    #     data_interv_tmp <- interv_data
    #     data_interv_tmp$lag_var <- interv_data[, paste0("interv_it_", treatment_varname, "_", treat)]
    #     
    #     data_interv_tmp <- data_interv_tmp %>%
    #       mutate(lagged_treatment = dplyr::lag(lag_var, n = lag_unit, default = 0))
    #     
    #     interv_data[, paste0("interv_it_", lag_treat_names[i], "_", treat)] <- data_interv_tmp$lagged_treatment
    #     abar_lag <- paste0("interv_it_", lag_treat_names[i], "_", treat)
    #     treat_lag_abar <- c(treat_lag_abar, list(abar_lag))
    #   }
    #   
    }

    abar_all[[treat]] <- my.arrayofA
  }
  
  ## add lag variables as intervention variables
  
  # if (length(all_treat_lags) > 0) {
  #   intervention_varnames <- list(c(intervention_varnames[[1]], as.list(all_treat_lags)))
  #   intervention_times <- list(c(intervention_times[[1]], rep_list(0:(K-1), length(all_treat_lags))))
  #   abar_all <- c(abar_all, treat_lag_abar)
  # }

  data <- interv_data

  ## 3a. if weighted ICE, then

  if (weighted & hazard_based == FALSE) {
    ## weighted ICE
    ntreatment = length(obs_treatment_varnames)
    data[, "pred_obs_prod"] <- 1
    
    for (i in 1:ntreatment) {
      treatment_i <- obs_treatment_varnames[[i]]
      nA_level <- length(unique(data[, treatment_i]))
      acovar <- treat_model_covar[[i]]
      
      acovar_lags <- unique(acovar[str_detect(acovar, "lag[0-9999]_")])
      
      if (length(acovar_lags) > 0) {
        
        data$factor_id <- as.factor(data[, id])
        
        for (i in 1:length(acovar_lags)) {
          ilag <- acovar_lags[i]
          
          if (str_detect(ilag, "^[a-zA-Z._]+\\(")) {
            ilag <- str_split(ilag, ",")[[1]][1]
            ilag <- str_split(ilag, "\\(")[[1]][2]
            ilag <- str_split(ilag, "\\)")[[1]][1]
          }
          
          lag_components <- str_split(ilag, "_")
          lag_unit <- as.numeric(str_replace_all(lag_components[[1]][1], "lag", ""))
          lag_var <- lag_components[[1]][2]
          
          
          group_dta <- group_by(data, "factor_id")
          group_dta[, "lag_var_new"] <- data[, lag_var]
          group_dta <- group_dta %>%
            mutate(lagged_var = dplyr::lag(group_dta$lag_var_new, n = lag_unit, default = 0))
          
          data[, ilag] <- group_dta$lagged_var
          
        }
      }
      
      ## need to rebuild covars by calling the transform functions
      ## treatment model covar
      acovar_new <- c()
      data_add <- data
      for (i in 1:length(acovar)) {
        
        icovar <- acovar[i]
        
        column_name <- get_column_name_covar(icovar)
        
        if (!column_name %in% colnames(data)) {
          old_ncol <- ncol(data_add)
          data_add <- data %>% mutate(eval(parse(text = icovar)))
          new_ncol <- ncol(data_add)
          
          new_column <- data_add[, new_ncol]
          new_column <- as.matrix(new_column)
          
          ## create new columns
          
          if (ncol(new_column) > 1) {
            multiple_varname <- paste0(paste0(column_name, "."), 1:(ncol(new_column)))
            data[, multiple_varname] <- new_column
            acovar_new <- c(acovar_new, multiple_varname)
          } else {
            data[, column_name] <- as.vector(new_column)
            acovar_new <- c(acovar_new, column_name)
          }
        } else {
          acovar_new <- c(acovar_new, column_name)
        }
      }
      
      acovar <- acovar_new

      aformula <- as.formula(paste0(treatment_i, "~", paste0(acovar, collapse = "+")))

      cformula <- as.formula(paste0(censor_varname, "~", paste0(c(censor_covar, treatment_i), collapse = "+")))

      if (!is.null(competing_varname) & total_effect == FALSE) {
        dformula <- as.formula(paste0(competing_varname, "~", paste0(c(competing_covar, treatment_i), collapse = "+")))
      }

      if (null_censor) {
        data$pred_c = 0
      } else {
        cfit = speedglm(cformula, family = binomial(), data = data)
        data$pred_c = predict(cfit, newdata = data, type="response")
      }

      if (!is.null(competing_varname) & total_effect == FALSE) {
        dfit = speedglm(dformula, family = binomial(), data = data)
        data$pred_d = predict(dfit, newdata = data, type="response")

        data$pred_d = (1- data$pred_d) * (1-data$pred_c)

      }


      if (nA_level > 2) {
        afit = multinom(aformula, data = data, trace = FALSE)
        predicted_df <- fitted(afit)
        A_names <- sort(unique(data[, treatment_i]))
        for (i in 1:nA_level) {
          A_name <- A_names[i]
          data[, paste0("pred_obs_", i)] <- predicted_df[, i]
          match_rows <- which(data[, treatment_i] == A_name)
          data[match_rows, "pred_obs_all"] <- data[match_rows, paste0("pred_obs_", i)]
          data <- data %>% dplyr::select(-c(paste0("pred_obs_", i)))

        }

        pred_obsa <- 1- data$pred_c

        if (!is.null(competing_varname) & total_effect == FALSE) {
          pred_da = ifelse(data$pred_d != 0, (data$pred_d) * pred_obsa, pred_obsa)
          data[, paste0("pred_obs_", treatment_i)] = pred_da
        } else {
          data[, paste0("pred_obs_", treatment_i)] = pred_obsa
        }

        data <- data %>% dplyr::select(-c("pred_obs_all"))

        } else {
          afit = speedglm(aformula, family = binomial(), data = data)
          


          pred_obs = predict(afit, newdata = data, type="response")
          pred_obs = ifelse(data[, treatment_i]==1, pred_obs, 1-pred_obs)

          pred_obsa <- 1- data$pred_c

          if (!is.null(competing_varname) & total_effect == FALSE) {
            pred_da = ifelse(data$pred_d != 0, (data$pred_d) * pred_obsa, pred_obsa)
            data[, paste0("pred_obs_", treatment_i)] = pred_da
          } else {
            data[, paste0("pred_obs_", treatment_i)] = pred_obsa
          }

        }


      data[, "pred_obs_prod"] <- data[, "pred_obs_prod"] * data[, paste0("pred_obs_", treatment_i)]

      data <- data %>% dplyr::select(-c(paste0("pred_obs_", treatment_i)))
    }



    dffullwide <- reshape(data, idvar = id, timevar = time0, direction = "wide", sep = "_")


    for (i in 1:(K-1)) {
      if (i == 1) {
        dffullwide[, paste0(outcome_varname, "_", i)] <- ifelse(dffullwide[, paste0(outcome_varname, "_", i-1)] == 1, 1, dffullwide[, paste0(outcome_varname, "_", i)])
      } else {
        dffullwide[, paste0(outcome_varname, "_", i)] <- ifelse(!is.na(dffullwide[, paste0(outcome_varname, "_", i-1)]) & dffullwide[, paste0(outcome_varname, "_", i-1)] == 1,
                                                                    1, dffullwide[, paste0(outcome_varname, "_", i)])
      }

    }

      for (t in 0:(K-1)) {
        dffullwide[, paste0("pred_obs_", t)] <- dffullwide[ ,paste0("pred_obs_", "prod", "_", t)]
      }



    for (t in 0:(K-1)) {
      if (t == 0) {
        dffullwide$pi0 = dffullwide$pred_obs_0
      } else {
        dffullwide[, paste0("pi", t)] <- dffullwide[, paste0("pi", t-1)] * dffullwide[, paste0("pred_obs", "_", t)]
      }

    }
  } else {
    dffullwide <- reshape(data, idvar = id, timevar = time0, direction = "wide", sep = "_")
  }

  ## 4. reshape data and fit outcome model
  tmpdata = as.data.frame(dffullwide)
  formula_full <- as.formula(paste0(outcome_varname,"~", paste0(c(outcome_covar), collapse = "+")))
  yfitog = speedglm(formula_full, family = binomial(), data = data) #This is from the data generation mechanism

  paramtmp = (yfitog)$coef

  ## add in this outcome fit
  
  this_outcome_fit <- yfitog
  this_outcome_summary <- get_summary(yfitog)
  this_outcome_stderr <- get_stderr(yfitog)
  this_outcome_vcov <- get_vcov(yfitog)
  this_outcome_rmse <- get_rmse(yfitog)
  
  outcome_init <- list(fit = this_outcome_fit, 
                       summary = this_outcome_summary, 
                       stderr = this_outcome_stderr, 
                       vcov = this_outcome_vcov, 
                       rmse = this_outcome_rmse)
  
  fit_outcome <- fit_outcome_summary <- fit_outcome_stderr <- fit_outcome_vcov <- fit_outcome_rmse <- c()
  fit_comp <- fit_comp_summary <- fit_comp_stderr <- fit_comp_vcov <- fit_comp_rmse <- c()

  ## 5. prepare data for regression at each time step
  for (i in 1:(K - 1)) {

    C_lag <- paste0(censor_varname, "_", i - 1)
    Y_lag <- paste0(outcome_varname, "_", i - 1)

    C <- paste0(censor_varname, "_", i)
    Y <- paste0(outcome_varname, "_", i)


    if (i == 1) {

      tmpdata[, C] <- ifelse(is.na(tmpdata[, Y_lag]) & tmpdata[, C_lag] == 1, 1, tmpdata[, C])
      tmpdata[, Y] <- ifelse(tmpdata[, Y_lag] == 1, 1, tmpdata[, Y])

    } else {

      tmpdata[, C] <- ifelse(is.na(tmpdata[, Y_lag]) & tmpdata[, C_lag] == 1, 1, tmpdata[, C])
      tmpdata[, Y] <- ifelse(!is.na(tmpdata[, Y_lag]) & tmpdata[, Y_lag] == 1, 1, tmpdata[, Y])

    }

  }

  for (i in 1:K) {

    C_lag <- paste0(censor_varname, "_", K - i)
    Y_lag <- paste0(outcome_varname, "_", K - i)

    C <- paste0(censor_varname, "_", K - i + 1)
    Y <- paste0(outcome_varname, "_", K - i + 1)

    tmpdata[, Y] <- tmpdata[, Y_lag]
    tmpdata[, C] <- tmpdata[, C_lag]
  }

  tmpdata$Y_0 = tmpdata$C_0 = NULL

  hazard_by_step <- c()
  comp_by_step <- c()
  
  if (hazard_based) {
    #### get outcome model for each time point ######

    outcome_pred_times <- list()
    fit_haz <- fit_haz_summary <- fit_haz_stderr <- fit_haz_vcov <- fit_haz_rmse <- c()

    for (i in 1:(K)) {

      stratify_dta_all <- c()
      no_interv_treat <- c()
      interv_treat <- c()
      for (treat in 1:length(intervention_varnames[[1]])) {
        intervention_variable <- intervention_varnames[[1]][[treat]]
        abar <- abar_all[[treat]][[1]]
        int_time <- intervention_times[[1]][[treat]]

        ## change here the index of competing variable
        if (!is.null(competing_varname) & total_effect == FALSE) {
          stratify_dta <- c(paste0(censor_varname, "_", i), paste0(competing_varname, "_", i-1),
                            sapply(0:(i-1), function(x){paste0(intervention_variable, "_", x)}))
        } else {
          stratify_dta <- c(paste0(censor_varname, "_", i),
                            sapply(0:(i-1), function(x){paste0(intervention_variable, "_", x)}))
        }



        if (is.character(abar)) {
          abar_names_extra <- sapply(0:(i-1), function(x){paste0(abar, "_", x)})

          if (!is.null(competing_varname) & total_effect == FALSE) {
            stratify_dta <- sapply(stratify_dta,
                                   function(x){
                                     idx <- which(stratify_dta == x) - 2
                                     if (grepl(censor_varname, x)) {
                                       paste0("(!is.na(", x,"))&", x, "==", 0)
                                     } else if (grepl(competing_varname, x)) {
                                       paste0("(!is.na(", x,"))&", x, "==", 0)
                                     } else if (grepl(outcome_varname, x)) {
                                       paste0("(!is.na(", x,"))&", x, "==", 0)
                                     } else {paste0(x, "==", abar_names_extra[idx])}})
          } else {
            stratify_dta <- sapply(stratify_dta,
                                   function(x){
                                     idx <- which(stratify_dta == x) - 1
                                     if (grepl(censor_varname, x)) {
                                       paste0("(!is.na(", x,"))&", x, "==", 0)
                                     } else {paste0(x, "==", abar_names_extra[idx])}})
          }


        } else {
          if (!is.null(competing_varname) & total_effect == FALSE) {
            stratify_dta <- sapply(stratify_dta,
                                   function(x){
                                     if (grepl(censor_varname, x)) {
                                       paste0("(!is.na(", x,"))&", x, "==", 0)
                                     } else if (grepl(competing_varname, x)) {
                                       paste0("(!is.na(", x,"))&", x, "==", 0)
                                     } else if (grepl(outcome_varname, x)) {
                                       paste0("(!is.na(", x,"))&", x, "==", 0)
                                     } else {paste0(x, "==", abar)}})
          } else {
            stratify_dta <- sapply(stratify_dta,
                                   function(x){
                                     if (grepl(censor_varname, x)) {
                                       paste0("(!is.na(", x,"))&", x, "==", 0)
                                     } else {paste0(x, "==", abar)}})
          }

        }
          stratify_dta_all <- c(stratify_dta_all, stratify_dta)
      if ((i-1) %in% int_time) {
          interv_treat <- c(interv_treat, intervention_variable)
      } else {
          no_interv_treat <- c(no_interv_treat, intervention_variable)
        }
      }

      stratify_dta_combine <- paste0(stratify_dta_all, collapse = "&")

      pred_data <- tmpdata %>% dplyr::filter(eval(parse(text = stratify_dta_combine)))

      treat_as_covar <- all_treat_vars[!all_treat_vars %in% interv_treat]

      outcome_covar_t <- c(hazard_covar, treat_as_covar)

      covar_outcome_t <- paste0(outcome_covar_t, "_", i-1)



      outcome_formula <- as.formula(paste0(outcome_varname, "_", i, "~",
                                           paste0(covar_outcome_t, collapse = "+")))

      tmp_fit <- speedglm(outcome_formula, family = binomial(), data = pred_data)

      outcome_pred_times[[i]] <- tmp_fit

      this_haz_fit <- list(tmp_fit)
      this_haz_summary <- list(get_summary(tmp_fit))
      this_haz_stderr <- list(get_stderr(tmp_fit))
      this_haz_vcov <- list(get_vcov(tmp_fit))
      this_haz_rmse <- list(get_rmse(tmp_fit))

      names(this_haz_fit) <- names(this_haz_summary) <- names(this_haz_stderr) <- names(this_haz_vcov) <- names(this_haz_rmse) <- paste0("hazard.", "Time", i)

      fit_haz <- c(fit_haz, this_haz_fit)
      fit_haz_summary <- c(fit_haz_summary, this_haz_summary)
      fit_haz_stderr <- c(fit_haz_stderr, this_haz_stderr)
      fit_haz_vcov <- c(fit_haz_vcov, this_haz_vcov)
      fit_haz_rmse <- c(fit_haz_rmse, this_haz_rmse)

    }

    
    hazard_by_step <- list(fit = fit_haz, 
                            summary = fit_haz_summary, 
                            stderr = fit_haz_stderr, 
                            vcov = fit_haz_vcov, 
                            rmse = fit_haz_rmse)
    
    if (!is.null(competing_varname) & total_effect == TRUE) {
      comp_pred_times <- list()
      fit_comp <- fit_comp_summary <- fit_comp_stderr <- fit_comp_vcov <- fit_comp_rmse <- c()

      for (i in 1:K) {

        stratify_dta_all <- c()
        no_interv_treat <- c()
        interv_treat <- c()
        for (treat in 1:length(intervention_varnames[[1]])) {
          intervention_variable <- intervention_varnames[[1]][[treat]]
          abar <- abar_all[[treat]][[1]]
          int_time <- intervention_times[[1]][[treat]]

          if (i == 1) {
            stratify_dta <- c(paste0(censor_varname, "_", i), paste0(competing_varname, "_", i-1),
                              sapply(0:(i-1), function(x){paste0(intervention_variable, "_", x)}))
          } else {
            stratify_dta <- c(paste0(censor_varname, "_", i), paste0(competing_varname, "_", i-2),
                              sapply(0:(i-1), function(x){paste0(intervention_variable, "_", x)}))
          }

          if (is.character(abar)) {
            abar_names_extra <- sapply(0:(i-1), function(x){paste0(abar, "_", x)})

            stratify_dta <- sapply(stratify_dta,
                                   function(x){
                                     idx <- which(stratify_dta == x) - 2
                                     if (grepl(censor_varname, x)) {
                                       paste0("(!is.na(", x,"))&", x, "==", 0)
                                     } else if (grepl(competing_varname, x)) {
                                       paste0("(!is.na(", x,"))&", x, "==", 0)
                                     } else if (grepl(outcome_varname, x)) {
                                       paste0("(!is.na(", x,"))&", x, "==", 0)
                                     } else {paste0(x, "==", abar_names_extra[idx])}})


          } else {
            stratify_dta <- sapply(stratify_dta,
                                   function(x){
                                     if (grepl(censor_varname, x)) {
                                       paste0("(!is.na(", x,"))&", x, "==", 0)
                                     } else if (grepl(competing_varname, x)) {
                                       paste0("(!is.na(", x,"))&", x, "==", 0)
                                     } else if (grepl(outcome_varname, x)) {
                                       paste0("(!is.na(", x,"))&", x, "==", 0)
                                     } else {paste0(x, "==", abar)}})

          }
          stratify_dta_all <- c(stratify_dta_all, stratify_dta)

          if ((i-1) %in% int_time) {
          interv_treat <- c(interv_treat, intervention_variable)
          } else {
            no_interv_treat <- c(no_interv_treat, intervention_variable)
          }
        }


        stratify_dta_combine <- paste0(stratify_dta_all, collapse = "&")
        pred_data <- tmpdata %>% dplyr::filter(eval(parse(text = stratify_dta_combine)))

        treat_as_covar <- all_treat_vars[!all_treat_vars %in% interv_treat]

        competing_covar_t <- c(outcome_covar, treat_as_covar)

        covar_competing_t <- paste0(competing_covar_t, "_", i-1)

        comp_formula <- as.formula(paste0(competing_varname, "_", i-1, "~",
                                          paste0(covar_competing_t, collapse = "+")))

        tmp_fit <- speedglm(comp_formula, family = binomial(), data = pred_data)

        comp_pred_times[[i]] <- tmp_fit

        this_comp_fit <- list(tmp_fit)
        this_comp_summary <- list(get_summary(tmp_fit))
        this_comp_stderr <- list(get_stderr(tmp_fit))
        this_comp_vcov <- list(get_vcov(tmp_fit))
        this_comp_rmse <- list(get_rmse(tmp_fit))

        names(this_comp_fit) <- names(this_comp_summary) <- names(this_comp_stderr) <- names(this_comp_vcov) <- names(this_comp_rmse) <- paste0("competing.", "Time", i)

        fit_comp <- c(fit_comp, this_comp_fit)
        fit_comp_summary <- c(fit_comp_summary, this_comp_summary)
        fit_comp_stderr <- c(fit_comp_stderr, this_comp_stderr)
        fit_comp_vcov <- c(fit_comp_vcov, this_comp_vcov)
        fit_comp_rmse <- c(fit_comp_rmse, this_comp_rmse)

      }
      
      comp_by_time <- list(fit = fit_comp, 
                             summary = fit_comp_summary, 
                             stderr = fit_comp_stderr, 
                             vcov = fit_comp_vcov, 
                             rmse = fit_comp_rmse)

    }
  }


  ## 6. regression at each time point
  
  if (verbose) {
    message("Running ", intervention_description[[1]], "... \n")
  }
  
  meany <- matrix(NA, ncol = K + 1, nrow = length(my.arrayofA))

  for (i in 1:K) {
    t <- K - i + 1
    it_run <- TRUE
    it <- t
    
    if (verbose) {
      message(paste0("Running Time ", t, "...", "\n"))
    }

    if (hazard_based) {
      this_fit <- outcome_pred_times[[t]]
      if (t == 1) {
        fitdata <- tmpdata
        fitdata[, paste0("y", t, "pred")] <- predict(this_fit, newdata = fitdata, type="response")
        it_run <- FALSE
      }

      it <- t - 1
    }

    if (it_run) {
      for (j in 1:it) {

        q <- it - j

        interv_treat <- c()

        for (treat in 1:length(intervention_varnames[[1]])) {
          intervention_variable <- intervention_varnames[[1]][[treat]]
          int_time <- intervention_times[[1]][[treat]]

          times_extra <- 0:q

          if (q %in% int_time) {
            interv_treat <- c(interv_treat, intervention_variable)
          }
        }


        if (q == it-1) {
          stratify_names_extra_all <- c()
          for (treat in 1:length(intervention_varnames[[1]])) {
            intervention_variable <- intervention_varnames[[1]][[treat]]
            abar <- abar_all[[treat]][[1]]

            times_extra <- 0:q

            if (!is.null(competing_varname) & total_effect == FALSE) {
              stratify_names_extra <- c(paste0(censor_varname, "_", q+1), paste0(competing_varname, "_", q),
                                        sapply(times_extra, function(x){paste0(intervention_variable, "_", x)}))
            } else {
              stratify_names_extra <- c(paste0(censor_varname, "_", q+1),
                                        sapply(times_extra, function(x){paste0(intervention_variable, "_", x)}))
            }

            if (is.character(abar)) {
              abar_names_extra <- sapply(times_extra, function(x){paste0(abar, "_", x)})

              if (!is.null(competing_varname) & total_effect == FALSE) {
                stratify_names_extra <- sapply(stratify_names_extra,
                                               function(x){
                                                 idx <- which(stratify_names_extra == x) - 2
                                                 if (grepl(censor_varname, x)) {
                                                   paste0("(!is.na(", x,"))&", x, "==", 0)
                                                 } else if (grepl(competing_varname, x)) {
                                                   paste0("(!is.na(", x,"))&", x, "==", 0)
                                                 } else if (grepl(outcome_varname, x)) {
                                                   paste0("(!is.na(", x,"))&", x, "==", 0)
                                                 } else {
                                                   paste0(x, "==", abar_names_extra[idx])}})
              } else {

                stratify_names_extra <- sapply(stratify_names_extra,
                                               function(x){
                                                 idx <- which(stratify_names_extra == x) - 1
                                                 if (grepl(censor_varname, x)) {
                                                   paste0("(!is.na(", x,"))&", x, "==", 0)
                                                 } else {
                                                     paste0(x, "==", abar_names_extra[idx])}})
              }

            } else {
              if (!is.null(competing_varname) & total_effect == FALSE) {
                stratify_names_extra <- sapply(stratify_names_extra,
                                               function(x){
                                                 if (grepl(censor_varname, x)) {
                                                   paste0("(!is.na(", x,"))&", x, "==", 0)
                                                 } else if (grepl(competing_varname, x)) {
                                                   paste0("(!is.na(", x,"))&", x, "==", 0)
                                                 } else if (grepl(outcome_varname, x)) {
                                                   paste0("(!is.na(", x,"))&", x, "==", 0)
                                                 } else {
                                                     paste0(x, "==", abar)}})
              } else {

                stratify_names_extra <- sapply(stratify_names_extra,
                                               function(x){
                                                 if (grepl(censor_varname, x)) {
                                                   paste0("(!is.na(", x,"))&", x, "==", 0)
                                                 } else {
                                                     paste0(x, "==", abar)}})
              }
            }

            stratify_names_extra_all <- c(stratify_names_extra_all, stratify_names_extra)
          }
          if (q == t - 1) {
            stratify_names_extra_all <- c(stratify_names_extra_all, paste0("(!is.na(", outcome_varname, "_", t,"))"))
          }

          stratify_names_extra_combine <- paste0(stratify_names_extra_all, collapse = "&")
          fitdata <- tmpdata %>% dplyr::filter(eval(parse(text = stratify_names_extra_combine)))
          if (hazard_based) {
            fitdata[, paste0("y", t, "pred")] <- predict(this_fit, newdata = fitdata, type="response")
          }

        }
        
        treat_as_covar <- all_treat_vars[!all_treat_vars %in% interv_treat]

        if ((length(treat_as_covar) > 0)) {
          warn_mssg <- paste(paste0(intervention_description, ":"), "Add", paste0(treat_as_covar, collapse = ", "), "to model at time", q, ".")
          warning(warn_mssg)
        }
        outcome_covar_t <- c(outcome_covar, treat_as_covar)
        covar_temp <- paste0(outcome_covar_t, "_", q)


        if (hazard_based) {
          temp_formula <- as.formula(paste0("y", t, "pred", "~",
                                            paste0(covar_temp, collapse = "+")))
        } else {
          if (q == t - 1) {
            temp_formula <- as.formula(paste0(outcome_varname, "_", t, "~",
                                              paste0(covar_temp, collapse = "+")))

          } else {
            temp_formula <- as.formula(paste0("y", t, "pred", "~",
                                              paste0(covar_temp, collapse = "+")))

          }
        }

        if (weighted & hazard_based == FALSE) {
          fitdata$w <- 1/fitdata[, paste0("pi", q)]

          if (q == t - 1) {
            fit <- speedglm(temp_formula, family = quasibinomial(), weights = fitdata$w, data = fitdata)
          } else {
            fit <- speedglm(temp_formula, family = quasibinomial(), weights = fitdata$w, data = fitdata)
          }

        } else {

          if (q == t - 1) {
            fit <- speedglm(temp_formula, family = binomial(), data = fitdata)
          } else {
            fit <- speedglm(temp_formula, family = quasibinomial(), data = fitdata)
          }

        }

        # add fit into list here

        this_outcome_fit <- list(fit)
        this_outcome_summary <- list(get_summary(fit))
        this_outcome_stderr <- list(get_stderr(fit))
        this_outcome_vcov <- list(get_vcov(fit))
        this_outcome_rmse <- list(get_rmse(fit))

        names(this_outcome_fit) <- names(this_outcome_summary) <- names(this_outcome_stderr) <- names(this_outcome_vcov) <- names(this_outcome_rmse) <- paste0("outcome.Time", t, ".Step", q)

        fit_outcome <- c(fit_outcome, this_outcome_fit)
        fit_outcome_summary <- c(fit_outcome_summary, this_outcome_summary)
        fit_outcome_stderr <- c(fit_outcome_stderr, this_outcome_stderr)
        fit_outcome_vcov <- c(fit_outcome_vcov, this_outcome_vcov)
        fit_outcome_rmse <- c(fit_outcome_rmse, this_outcome_rmse)

        stratify_names_all <- c()
        interv_treat <- c()
        for (treat in 1:length(intervention_varnames[[1]])) {
          intervention_variable <- intervention_varnames[[1]][[treat]]
          abar <- abar_all[[treat]][[1]]
          int_time <- intervention_times[[1]][[treat]]

          times <- 0:(q-1)

          if (q %in% int_time) {
            interv_treat <- c(interv_treat, intervention_variable)
          }

          if (hazard_based) {
            if (!is.null(competing_varname) & total_effect == FALSE) {
              stratify_names <- c(paste0(censor_varname, "_", q+1), paste0(competing_varname, "_", q),
                                  sapply(times, function(x){paste0(intervention_variable, "_", x)}))
            } else {
              stratify_names <- c(paste0(censor_varname, "_", q+1),
                                  sapply(times, function(x){paste0(intervention_variable, "_", x)}))
            }
          } else {
            if (!is.null(competing_varname) & total_effect == FALSE) {
              stratify_names <- c(paste0(censor_varname, "_", q), paste0(competing_varname, "_", q-1),
                                  sapply(times, function(x){paste0(intervention_variable, "_", x)}))
            } else {
              stratify_names <- c(paste0(censor_varname, "_", q),
                                  sapply(times, function(x){paste0(intervention_variable, "_", x)}))
            }
          }

          if (is.character(abar)){
            abar_names <- sapply(times, function(x){paste0(abar, "_", x)})

            if (!is.null(competing_varname) & total_effect == FALSE) {
              stratify_names <- sapply(stratify_names,
                                       function(x){
                                         idx <- which(stratify_names == x) - 2
                                         if (grepl(censor_varname, x)) {
                                           paste0("(!is.na(", x,"))&", x, "==", 0)
                                         } else if (grepl(competing_varname, x)) {
                                           paste0("(!is.na(", x,"))&", x, "==", 0)
                                         } else {paste0(x, "==", abar_names[idx])}})
            } else {
              stratify_names <- sapply(stratify_names,
                                       function(x){
                                         idx <- which(stratify_names == x) - 1
                                         if (grepl(censor_varname, x)) {
                                           paste0("(!is.na(", x,"))&", x, "==", 0)
                                         } else {paste0(x, "==", abar_names[idx])}})
            }
          } else {

            if (!is.null(competing_varname) & total_effect == FALSE) {
              stratify_names <- sapply(stratify_names,
                                       function(x){
                                         if (grepl(censor_varname, x)) {
                                           paste0("(!is.na(", x,"))&", x, "==", 0)
                                         } else if (grepl(competing_varname, x)) {
                                           paste0("(!is.na(", x,"))&", x, "==", 0)
                                         } else {paste0(x, "==", abar)}})
            } else {
              stratify_names <- sapply(stratify_names,
                                       function(x){
                                         if (grepl(censor_varname, x)) {
                                           paste0("(!is.na(", x,"))&", x, "==", 0)
                                         } else {paste0(x, "==", abar)}})
            }

          }
          stratify_names_all <- c(stratify_names_all, stratify_names)
        }
        stratify_names_combine <- paste0(stratify_names_all, collapse = "&")

        if (q == 0) {
          fitdata <- tmpdata
        } else {
          fitdata <- tmpdata %>% dplyr::filter(eval(parse(text = stratify_names_combine)))
        }

        if (weighted & hazard_based == FALSE) {
          fitdata$w <- 1/fitdata[, paste0("pi", q)]
          fitdata[, paste0("y", t, "pred")] = predict(fit, newdata = fitdata, type="response", weight = fitdata$w)

        } else {
          fitdata[, paste0("y", t, "pred")] = predict(fit, newdata = fitdata, type="response")
        }

        if (hazard_based) {
          weight_haz <- predict(outcome_pred_times[[q+1]], newdata = fitdata, type="response")

          if (!is.null(competing_varname) & total_effect == TRUE) {
            weight_comp <- predict(comp_pred_times[[q+1]], newdata = fitdata, type="response")
            fitdata[, paste0("y", t, "pred")] <- fitdata[, paste0("y", t, "pred")] * (1 - weight_haz) * (1 - weight_comp) + weight_haz
          } else {
            fitdata[, paste0("y", t, "pred")] = fitdata[, paste0("y", t, "pred")] *
              (1 - weight_haz) + weight_haz
          }
        } else {
          if (q != 0) {
            if (!is.null(competing_varname) & total_effect == TRUE) {
              fitdata[, paste0("y", t, "pred")] = case_when(fitdata[, paste0(competing_varname, "_", q-1)] == 1 ~ 0,
                                                            (fitdata[, paste0(outcome_varname, "_", q)] == 1) & (fitdata[, paste0(competing_varname, "_", q-1)] == 0) ~ 1,
                                                            (fitdata[, paste0(outcome_varname, "_", q)] == 0) & (fitdata[, paste0(competing_varname, "_", q-1)] == 0) ~ fitdata[, paste0("y", t, "pred")])

            } else {
              fitdata[, paste0("y", t, "pred")] = ifelse(fitdata[, paste0(outcome_varname, "_", q)] == 1,
                                                         1, fitdata[, paste0("y", t, "pred")])
            }
          }
        }

      }
    }

    meany[, t + 1] <- mean(fitdata[, paste0("y", t, "pred")])

  }
  meany[, 1] <- 0
  meany <- as.data.frame(meany)

  fit_all <- c(fit_all, list(fit_outcome))
  fit_summary <- c(fit_summary, list(fit_outcome_summary))
  fit_stderr <- c(fit_summary, list(fit_outcome_stderr))
  fit_vcov <- c(fit_vcov, list(fit_outcome_vcov))
  fit_rmse <- c(fit_rmse, list(fit_outcome_rmse))
  
  outcome_by_step <- list(fit = fit_all, 
                          summary = fit_summary, 
                          stderr = fit_stderr, 
                          vcov = fit_vcov, 
                          rmse = fit_rmse)

  time_name <- c()

  for (i in 0:K) {

    time_name <- c(time_name, paste0("time", i))
  }

  colnames(meany) <- time_name
  rownames(meany) <- intervention_description

  return(list(gformula_risk_last_time = meany[length(meany)], gformula_risk = meany,
              weight_h = risk_weighted, ipw_model = logit_censor,
              np_model = np_model, 
              outcome_init = outcome_init, comp_init = c(),
              outcome_by_step = outcome_by_step, 
              hazard_by_step = hazard_by_step, 
              comp_by_step = comp_by_step))
}

Try the gfoRmulaICE package in your browser

Any scripts or data that you put into this service are public.

gfoRmulaICE documentation built on April 3, 2025, 8:53 p.m.