need to delete: impact_results$[[VARIANT]]$groups$ec 2-23m A$ predict.bsts reg.mean beta.mat

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.height = 3,
  fig.width = 5,
  fig.align = "center", 
  dpi=300, 
    out.width="600px"
)

Installing the package

Uncomment and run the following 3 lines of code to install the package. You will be prompted to manually select a number to indicate whether to update dependencies and which ones.

#install.packages('devtools') 
#library(devtools) 
#devtools::install_github('https://github.com/weinbergerlab/InterventionEvaluatR') 
library(xtable)
library(knitr)
library(htmlTable)
library(InterventionEvaluatR)
library(coda) 
#Save the Ecuador mortality data as part of the package
# library(RCurl)
# script <- getURL("https://raw.githubusercontent.com/weinbergerlab/paho-pneumonia-mortality/master/Data/PAHO%20all%20age%20cuts_SubChapters.csv", ssl.verifypeer = FALSE)
# 
# paho1 <-read.csv(text=getURL("https://raw.githubusercontent.com/weinbergerlab/paho-pneumonia-mortality/master/Data/PAHO%20all%20age%20cuts_SubChapters.csv"),  header=T)
# ecuador_mortality<-paho1[paho1$age_group %in% c('ec 2-59m A','ec 2-23m A','ec 24-59m A'), ]
# ecuador_mortality$age_group<-factor(ecuador_mortality$age_group)
# save(ecuador_mortality, file='../data/ecuador_mortality.rda')

title: "Estimated change in mortality associated with the introduction of vaccine in Ecuador"

Analysis goal

The goal for this analysis is to quantify changes in the rates of mortality due to pneumonia following the introduction of PCV10 in Ecuador in August 2010. To do this, we will use data from the national mortality registry from Ecuador for the period from 2005-2016. We will learn how to analyze these time series data and how to interpret the results.

The main outcome of interest for the analysis will be to evaluate changes in J12_J18_prim, which is the number of deaths where J12-J18 is coded as the primary cause of death (In Ecuador, only a single cause of death is recorded).


View dataset

We are using a dataset that has 3 different age strata: 2-23 months, 24-59 months, and 2-59 months, combined. The dataset contains monthly time series for a number of different causes of death that are classified by ICD10 sub-chapter. The columns have counts per month of deaths for the causes indicated by the range of ICD10 codes indicated in the header name.

Let's load the data, which are included with the package

    data(ecuador_mortality, package = "InterventionEvaluatR") #load the data
    head(ecuador_mortality[,1:5]) #View first few rows and columns
    ds<-ecuador_mortality

Ensure your date variable is in an R date format. If your variable is in a character or factor format, you need to tell R the format. -- %m is a 2 digit month; %b is a month abbreviation (ie Jan, Feb) -- %d is a 2 digit day (0-31), -- %Y is a 4 digit year (e.g. 2011), %y is a 2 digit year (e.g. 11).
These codes are separated by a dash or slash or space. Modify the tryFormats script below if needed to match the format in your dataset

ds$monthdate<-as.Date(ds$monthdate, tryFormats=c('%Y-%m-%d',
                                                    '%m-%d-%Y',
                                                    '%m/%d/%Y',
                                                    '%Y/%m/%d',
                                                    '%d/%m/%Y'
                                                    ) )

Set parameters for analysis

Here we need to set a few parameters. We Use the evaluatr.init() function to specify the name of the dataset, the date at which the vaccine is introduced, the date at which we want to begin evaluating the vaccine (typically 1-2 year after vaccine introduction). We also provide some information on the dataset, sch as whether the data are monthly or quarterly (n_seasons), the variable names for the grouping variable, the date variable, the outcome variable, and the denominator variable (if any). You can also set the number of interations for the MCMC. the default is to use a burn-in period of 5000 iterations and to sample 10,000 iterations afterthe burn in. This is a decent place to start. After evaluating model convergence (see below), you might want to increase the burn-in period.

analysis <- evaluatr.init(
  country = "Ecuador", data = ds,
  post_period_start = "2010-08-01", #First 'post-intervention' month is Jan 2012
  eval_period_start = "2011-08-01", #We ignore first 12 month of data to allow for vaccine ramp up
  eval_period_end = "2015-12-01", #The evaluation period lasts 2 years
  n_seasons = 12, #This is monthly data, so select 12
  year_def = "cal_year", # we are in southern hemisphere, so aggregate results by calendar year (Jan-Dec)
  group_name = "age_group",  #Strata categry name
  date_name = "monthdate", #Date variable name
  outcome_name = "J12_J18_prim", #Outcome variable name
  denom_name = "acm_noj_prim" , #Denominator variable name
  log.covars=TRUE, #log-transform the covariates
  set.burnN=5000,
  set.sampleN=10000,
  ridge= F
)
set.seed(1)

Sort data by age group and month

    ds<-ds[order(ds[, analysis$group_name], ds[,analysis$date_name]),] #Sort data by age group and month

Run analyses

This function runs Interrupted time series analysis, an alternative time trend analysis, synthetic controls, and STL+PCA. All of these results are saved in the object 'impact_results'.

Run a simple control analysis, controlling for 1 control variable at a time

Before getting into more complicated analyses, we will first try to fit a simple Poisson regression model (with overdispersion) where we adjust for seasonality and 1 control variable at a time. These models are fit to data from the pre-vaccine periods and extrapolated to the post-vaccine period. Fitting 1 control variable at a time allows us to see how the use of different controls influences the results. This take a few minutes, so we will use a pre-run set of results.

 glmer_results= evaluatr.univariate(analysis)

Then plot the results. The results are ordered by goodness of fit (based on AIC scores), with best fitting covariates on top. Each plot represents a different age group. Overall, we see a generally consistent pattern. The use of the subchapter R00-09 as a control variable leads to estimates that are closer to 1 (no effect). This subchapter is "Symptoms and signs involving the circulatory and respiratory systems". These are often considered non-specific 'junk' codes. There could be arguments for or against using this subchapter as a control. On the downside, it is possible that actual pneumonia deaths incorrectly were assigned a code of R00-99, and the vaccine could therefore reduce the incidence of R00-09 codes and bias the estimates towards no effect. On the upside, the use of these junk codes as a control could help to adjust for underlying improvements or changes in coding quality.

par(mar=c(4,5,1,1)) #fix margins
group.labels<-as.character(unique(analysis$input_data[,analysis$group_name]))
lapply(glmer_results,evaluatr.univariate.plot)

Run Synthetic control analysis

For teaching purposes, this code has been pre-run since it takes some time and computational resources.

impact_results = evaluatr.impact(analysis)

Were there any strata that were too sparse to analyze?

if (!is.null(names(analysis$sparse_groups[analysis$sparse_groups])) && length(names(analysis$sparse_groups[analysis$sparse_groups])) != 0) {
  print(xtable(data.frame("Sparse Groups" = names(analysis$sparse_groups[analysis$sparse_groups]), check.names = FALSE), align = "cc"), type="html")
}

Compare estimates from different models

This shows the estimated rate ratio and 95% credible intervals from a synthetic controls analysis; a time-trend analysis where we used the specified denominator (all non-respiratory deaths) to adjust the number of pneumonia deaths in each month and a linear trend for time; a classic interrupted time series analysis (segmented regression); and the STL+PCA approach, which smooths and combines the control variables prior to including them in the model.

results.table<- cbind.data.frame(
  impact_results$full$rr_mean_intervals, 
  impact_results$time$rr_mean_intervals, 
  impact_results$its$rr_mean_intervals, 
  impact_results$pca$rr_mean_intervals)
  table<-xtable(results.table)

  htmlTable(table)

Cases averted

How many cases were prevented from the time of vaccine introduction to the last time point in each stratum (+/- 95% CrI)? You can modify the number 'last.point' to pull out the cumulative number of cases at any point in the time series. In this case we are printing results fromthe SC model

last.point<-dim(impact_results$full$cumsum_prevented)[1]
cum.prevented<-impact_results$full$cumsum_prevented[last.point,,]

Format and print table

cum1<- round(t(cum.prevented))
cum2<- paste0(cum1[,'50%'], ' (', cum1[,'2.5%'],', ',cum1[,'97.5%'],')')
cum3<-cbind.data.frame(row.names(cum1), cum2)
names(cum3)<-c('Stratum','Cases Averted (95% CrI)')
  htmlTable(cum3, align='l')

Number of variables selected in SC analysis

The Synthetic controls analysis uses a Bayesian variable selection algorithm to weight the candidate covariates. In each MCMC iteration, it tests a different combination of variables. The model size indicates how many variables are selected in any given model. If <1 variable is selected on average, this indicates that no suitable control variables were identified. In this example 1-2 variables were selected on average in the 2-23m and 2-59 m age categories, while no controls were identified in the 24-59m age group (the average model size is <1 (0.44)).

model_size = data.frame(t(analysis$model_size))
htmlTable(model_size, align='c')

Inclusion Probabilities

incl_probs <- NULL
for (group in analysis$groups) {
  incl_prob <- impact_results$full$groups[[group]]$inclusion_probs[-c(1:(analysis$n_seasons - 1)), ]
  incl_prob <- incl_prob[order(-incl_prob$inclusion_probs), ]
  incl_prob <- incl_prob[c(1:3), ]
  incl_prob2 <- round(incl_prob[, 2],2)
  incl_prob_names <- incl_prob[, 1]
  incl_prob3 <- data.frame("Group" = group, "Greatest Inclusion Variable" = incl_prob_names[1], "Greatest Inclusion Probability" = incl_prob2[1], "Second Greatest Inclusion Variable" = incl_prob_names[2], "Second Greatest Inclusion Probability" = incl_prob2[2], "Third Greatest Inclusion Variable" = incl_prob_names[3], "Third Greatest Inclusion Probability" = incl_prob2[3], check.names = FALSE)
  incl_probs <- rbind(incl_probs, incl_prob3)
}
rownames(incl_probs) <- NULL

Here we can see which variables received the most weight in the models for each strata. Values range from 0-1, with values closer to 1 indicating that the variable was included in a larger proportion of the variable combinations that were tested A value of 1 would indicate that the variable was always included in the model, a value of 0.48 would indicate the variable was included in 48% of the models that were tested.

htmlTable(incl_probs)

Sensitivity analysis

In the sensitivity analyses, we re-run the synthetic controls analysis, but sequentially drop the top 1,2, or 3 variables that were most influential (received the most weight) in the original analysis. This allows us to evaluate whether the results are sensitive to the inclusion of a specific variable

  sensitivity_results <- evaluatr.sensitivity(analysis)

The purpose of this analysis is to evaluate the sensitivity of the synthetic controls results to the inclusion of specific control variables. The model is first fit with all variables, then the top-weighted variable is excluded, and the model is re-run. If the results change drastically after dropping 1 variable, this indicates that that variable is highly influential on the results. If the results don't change, this indicates that the results are robust to the exclusion of that variable (this assumes that the variable that is excluded received a decent amount of weight prior to exclusion)

if (exists("sensitivity_results")) {
  htmlTable(sensitivity_results$sensitivity_table_intervals)
}

Generate and save the plots

plots <- evaluatr.plots(analysis)

Plot the results for 1 age group

First look at the results from the synthetic controls model for 1 age group.

This first plot shows a monthly time series, with observed, fitted, and counterfacual values. The observed number of deaths is shown in the black line. The fitted values for the pre-vaccine period are shown in the red dotted line, and the counterfactual estimate with its 95% credible interval is shown as the white dotted line and gray shaded area. if the black line is below the gray shaded area, this would indicate that obsrved cases are lower than expected based on changes in the control diseases in the post-vaccine period. If the controls appropriately adjust for underlying trends, then this would reflect an effect of the vaccine.

      plots$groups[["ec 2-23m A"]]$pred_full 

It is sometimes easier to look at the results if we aggregate the observed and expected values up to the annual time scale. Here the observed values are shown as black dots. When the black dots go below the gray shaded area, this indicates that the observed cases are lower than expected based on changes in the control diseases in the post-vaccine period. If the controls appropriately adjust for underlying trends, then this would reflect an effect of the vaccine. The vertical line indicates the date of vaccine introduction. The year when the vaccine is introduced is included to the right of the line, even if the vaccine was introduced part-way through the year. For instance, regardless of whether the the vaccine was introduced in January 2009 or November 2009, the observed dot for 2009 would be to the right of the line.

      plots$groups[["ec 2-23m A"]]$pred_full_agg 

Finally, we can look at the cumulative cases prevented. In this example, there have been 445 cases prevented (95%CrI: 58, 931) from the time of vaccine introduction to the last day month of the study period. This is calculated by takin the difference between the observed and fitted number of cases in each month, and summing them. If atleast 1 control disease is identified from the synthetic controls model, then the result here is drawn from that model, otherwise, it is drawn from the STL+PCA model.

      plots$groups[["ec 2-23m A"]]$cumsum_prevented 

Printing plots for all models and age groups

We instead might want to just print everything for all age groups and models. We can use the following code to do that

Plot Observed vs expected yearly time series

      par(mfrow=c(4,1))
for (group in names(plots$groups)) {
      print(plots$groups[[group]]$pred_full_agg )
      print(plots$groups[[group]]$pred_best_agg )
      print(plots$groups[[group]]$pred_time_agg )
      print(plots$groups[[group]]$pred_pca_agg )
}

Save results

#output_file <- "Results" # Directory where results will be saved.
#output_file <- paste0(output_file, "_", analysis$country, "_", format(Sys.time(), "%Y-%m-%d-%H%M%S"), ".Rds")
#evaluatr.save(analysis, output_file)


weinbergerlab/InterventionEvaluatR documentation built on June 28, 2022, 4:20 p.m.