R/get_cohort_expectedCI_VE_crr.R

Defines functions get_cohort_expectedCI_VE_crr

Documented in get_cohort_expectedCI_VE_crr

#' Expected lower and expected upper limit of the confidence intervals of variant and vaccine-specific efficacy based on cumulative-risk ratio.
#' @description
#' The function `get_cohort_expectedCI_VE_crr` simulates confindence intervals for variant and vaccine-specific efficacy (VE) for a given sample size.
#' The efficacy is defined as `VE = 1 - cumulative-risk ratio`, and the function returns expected lower and expected upper confidence interval limit
#' for both absolute and relative VE.
#'
#' @param anticipated_VE_for_each_brand_and_variant a matrix of vaccine efficacy of each vaccine (row) against each variant (column). Each value must be a real number between 0 and 1.
#' @param brand_proportions_in_vaccinated a vector denoting the proportion in which vaccines are given in the vaccinated subjects of the study cohort. Each value of this vector must be a real number between 0 and 1 and the sum of the values of this vector must be equal to 1.
#' @param overall_vaccine_coverage the proportion of the study cohort that will be vaccinated. It should be a real number between 0 and 1.
#' @param proportion_variants_in_unvaccinated_cases a vector of the proportions in which each variant is expected to be present in the unvaccinated and infected subjects in the study cohort. Each value of this vector must be a real number between 0 and 1 and the sum of the values of this vector must be equal to 1.
#' @param overall_attack_rate_in_unvaccinated the proportion of the study cohort that is expected to infected over the study period. It should be a real number between 0 and 1.
#' @param calculate_relative_VE a logical indicating if calculations should also be done for relative vaccine efficacy (default `TRUE`).
#' @param alpha controls the width `[100*alpha/2, 100*(1 - alpha/2)]%` of the confidence interval. It is a numeric that must take a value between 0 and 1.
#' @param confounder_adjustment_Rsquared we use this parameter to adjust the calculations for potential confounders using the methodology proposed by Hsieh and Lavori (2000). It represents the amount of variance (R^2) explained in a regression model where vaccination status is the outcome and confounders of interest are predictors. It is a numeric that must take a value between 0 (no adjustment for confounders) and 1.
#' @param prob_missing_data to adjust the calculations for non-informative and random subject loss to follow-up/dropout. it should take a numeric value between 0 and 1.
#' @param total_subjects a vector of study cohort size for which calculations should be done.
#' @param nsims total number of Monte Carlo simulations conducted.
#'
#' @details
#' In this function efficacy is defined as `VE = 1 - cumulative-risk ratio`, where 'cumulative-risk ratio' is
#' the ratio of cumulative-risk of being a case of a particular variant/variant among the groups being compared.
#' When the groups being compared are a particular vaccine versus placebo then we call the VE
#' as the absolute VE of the vaccine. For `M` vaccines there are `M` absolute VE, one each for the `M` vaccines.
#' When the groups being compared are a particular vaccine versus another vaccine then we call the VE
#' as the relative VE of the vaccines, for a particular variant. For `M` vaccines and `I` variants there are `I x 2 x utils::combn(M, 2)`
#' permutations of relative VE of two vaccines against the same variant.
#'
#' We first transform the user inputs for `I` variants and `M` vaccines into a `(I + 1) x (M + 1)` cross table of
#' cumulative-risks of being a case or a control over the study period. The overall sum of all cumulative-risks,
#' i.e., all cells, of this table is 1. The first row of our cumulative-risk table contain cumulative-risk of being a control.
#' The first column corresponds to subjects who are unvaccinated.
#' Thus, the cell `{1,1}` contains the probability (cumulative-risk) that over the study period a subject will be a control and unvaccinated.
#' The remaining `Ì` rows correspond to subjects who are cases of a particular variant/variant of the pathogen,
#' and the remaining `M` columns correspond to subjects who are vaccinated with a particular vaccine.
#'
#' The next step is to simulate the data. To speed up our computations we sample an `(I + 1) x (M + 1)`
#' cross table of data from a multinomial distribution with probabilities taken from our cumulative-risk table.
#' The total subjects sampled in the cross table are are `total_subjects * (1 - prob_missing_data)`.
#' We then estimate the absolute and relative VE of each vaccine using the cumulative-risks based on the sampled data.
#' The confidence intervals with widths `[100*alpha/2, 100*(1 - alpha/2)]%` are obtained using normal approximation
#' to the distribution of log of cumulative-risk ratio (Morris and Gardner, 1988).
#' To adjust for confounders, the standard-error used in the confidence interval is rescaled to `SE/(1 - confounder_adjustment_Rsquared)` (Hsieh and Lavori, 2000)
#' We repeat this procedure `nsims` times, and in each such simulation we obtain `nsims` confidence intervals.
#'
#' To conduct simulations faster all the calculations are done without using for loops.
#' Instead we use a three-dimensional R arrays, with one-dimension for `nsims`,
#' another for the sample size vector `total_subjects`,
#' and another for a vector containing the flattened cross table of simulated data on cases and controls.
#'
#' @return A data frame consisting of the input parameters, absolute and relative VE combinations,
#' and the expected lower and expected upper width of the confidence intervals for each absolute and relative VE combination.
#'
#' @examples As an example we recommend running the function without passing any parameter to it.
#' The default scenario is for three vaccines and three pathogen variants.
#'
#' @references
#' 1. Hsieh, F. Y., & Lavori, P. W. (2000). Sample-size calculations for the Cox proportional hazards regression model with nonbinary covariates. Controlled clinical trials, 21(6), 552-560.
#' 2. Morris, J. A., & Gardner, M. J. (1988). Statistics in medicine: Calculating confidence intervals for relative risks (odds ratios) and standardised ratios and rates. British medical journal (Clinical research ed.), 296(6632), 1313.
#' @importFrom utils combn
#' @export
get_cohort_expectedCI_VE_crr = function(anticipated_VE_for_each_brand_and_variant=
                                          matrix(data=c(0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 1), nrow = 3, ncol = 3, byrow = T,
                                                 dimnames = list(paste0('brand', 1:3), paste0('variant', 1:3))),
                                        brand_proportions_in_vaccinated =
                                          c('brand1'=0.3, 'brand2'=0.5, 'brand3'=0.2),
                                        overall_vaccine_coverage=0.3,
                                        proportion_variants_in_unvaccinated_cases = c('variant1'=0.6, 'variant2'=0.3, 'variant3'=0.1),
                                        overall_attack_rate_in_unvaccinated = 0.1,
                                        calculate_relative_VE = T,
                                        alpha=0.05,
                                        confounder_adjustment_Rsquared = 0,
                                        prob_missing_data = 0.1,
                                        total_subjects=seq(1000, 10000, 25),
                                        nsims = 500){

  check_input(anticipated_VE_for_each_brand_and_variant, brand_proportions_in_vaccinated, proportion_variants_in_unvaccinated_cases)

  total_vaccine_brands = length(brand_proportions_in_vaccinated)
  total_case_variants = length(proportion_variants_in_unvaccinated_cases)

  relative_VE_combn = get_vaccine_comparison_combinations(total_vaccine_brands, total_case_variants, calculate_relative_VE)

  prob_tables = get_cohort_full_table_crr(anticipated_VE_for_each_brand_and_variant, overall_vaccine_coverage, overall_attack_rate_in_unvaccinated,
                                          proportion_variants_in_unvaccinated_cases, brand_proportions_in_vaccinated)

  flat_full_table = c(prob_tables$full_table)
  flat_conditional_table = c(prob_tables$conditional_table)

  total_total_subject_settings = length(total_subjects)
  missing_data_adjusted_total_subjects = round(total_subjects * (1-prob_missing_data))

  cell_counts_sims = array(data = do.call('cbind', lapply(missing_data_adjusted_total_subjects, rmultinom, n=nsims, prob=flat_full_table)),
                           dim = c((1+total_vaccine_brands)*(1+total_case_variants), nsims, total_total_subject_settings))

  #summarizing the counts per vaccine among cases. so want to add over all variants per vaccine
  cell_counts_sims_total_by_brand = array(
    c(
      apply(cell_counts_sims, c(3), function(x){
        cumsum_overall = rbind(0,colCumsums(x))
        a = seq(1, nrow(x), by = 1 + total_case_variants)
        b = seq(1 + total_case_variants, nrow(x), by = 1 + total_case_variants)
        return(cumsum_overall[b + 1,] - cumsum_overall[a,])
      })
    ),
    dim = c(1+total_vaccine_brands, nsims, total_total_subject_settings)
  )

  #cell_counts_sims = rmultinom(n=nsims, size=missing_data_adjusted_total_subjects, prob = c(full_table))
  #Haldane's correction
  cell_counts_sims[cell_counts_sims==0] = 0.5
  cell_counts_sims_total_by_brand[cell_counts_sims_total_by_brand==0] = 0.5

  CONTROL_ROW = 1
  brand1_denominator_indices = relative_VE_combn[BRAND1,] + CONTROL_ROW
  brand1_numerator_indices = relative_VE_combn[BRAND1,]*(1 + total_case_variants) + CONTROL_ROW + relative_VE_combn[VARIANT,]
  brand2_denominator_indices = relative_VE_combn[BRAND2,] + CONTROL_ROW
  brand2_numerator_indices =  relative_VE_combn[BRAND2,]*(1 + total_case_variants) + CONTROL_ROW + relative_VE_combn[VARIANT,]

  #dimensions of the following are ncol(relative_VE_combn) x nsims
  numerator_vaccine1 = cell_counts_sims[brand1_numerator_indices,,,drop=F]
  denominator_vaccine1 = cell_counts_sims_total_by_brand[brand1_denominator_indices,,,drop=F]
  numerator_vaccine2 = cell_counts_sims[brand2_numerator_indices,,,drop=F]
  denominator_vaccine2 = cell_counts_sims_total_by_brand[brand2_denominator_indices,,,drop=F]

  estimate = (numerator_vaccine1/denominator_vaccine1)/(numerator_vaccine2/denominator_vaccine2)
  standard_error = sqrt(1/numerator_vaccine1 + 1/numerator_vaccine2 - 1/denominator_vaccine1 - 1/denominator_vaccine2)
  confounder_adjusted_standard_error = standard_error / (1-confounder_adjustment_Rsquared)

  log_estimate = log(estimate)
  quantile_times_adjusted_standard_error = qnorm(1-alpha/2) * confounder_adjusted_standard_error
  low = exp(log_estimate - quantile_times_adjusted_standard_error)
  upp = exp(log_estimate + quantile_times_adjusted_standard_error)

  #expected_VE = rowMeans(1-estimate, na.rm=T)
  expected_VE = apply(X = 1-estimate, MARGIN = 3, FUN = rowMeans, na.rm=T)
  avg_lower_limit = apply(X = 1-upp, MARGIN = 3, FUN = rowMeans, na.rm=T)
  avg_upper_limit = apply(X = 1-low, MARGIN = 3, FUN = rowMeans, na.rm=T)

  anticipated_VE = 1 - flat_conditional_table[brand1_numerator_indices]/flat_conditional_table[brand2_numerator_indices]

  ret = data.frame(vaccine_1 = rep(paste("Brand", relative_VE_combn[BRAND1,]), total_total_subject_settings),
                   vaccine_2 = rep(ifelse(relative_VE_combn[BRAND2,]==0,
                                          no = paste("Brand", relative_VE_combn[BRAND2,]),
                                          yes = "Unvaccinated"), total_total_subject_settings),
                   variant = rep(paste("Variant", relative_VE_combn[VARIANT,]), total_total_subject_settings),
                   total_subjects = rep(total_subjects, each=ncol(relative_VE_combn)),
                   anticipated_VE = anticipated_VE,
                   expected_VE = c(expected_VE),
                   bias_VE = c(expected_VE - anticipated_VE),
                   avg_lower_limit = c(avg_lower_limit),
                   avg_upper_limit = c(avg_upper_limit),
                   alpha = alpha,
                   overall_vaccine_coverage = overall_vaccine_coverage,
                   overall_attack_rate_in_unvaccinated = overall_attack_rate_in_unvaccinated,
                   confounder_adjustment_Rsquared = confounder_adjustment_Rsquared,
                   prob_missing_data = prob_missing_data
  )

  ret = cbind(ret,
              t(data.frame(brand_proportions_in_vaccinated,
                           row.names = paste0('brand_prop_', 1:total_vaccine_brands))),
              t(data.frame(proportion_variants_in_unvaccinated_cases,
                           row.names = paste0('variant_prop_', 1:total_case_variants))))

  return(ret)
}
anirudhtomer/vess documentation built on Feb. 24, 2022, 3 p.m.