knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>"
)
library(tidyr);library(ggplot2);library(designmatch);library(MatchIt);library(Matching);library(sensitivitymv);library(sensitivitymw);library(knitr);library(cobalt);library(dplyr)

Introduction.

Over the past few years, technology has made it easier for researchers to access variety of observational data. Such data include information collected from patient medical records, national surveys et al. Availability of this data provides opportunities for exploring important research questions, however to do so one has be careful in designing studies based on these data. A common use of observation study is in estimating treatment effect. This typically involves defining a variable to measured exposure and then using it to identify a treatment and control group. Since the gold standard for estimating a treatment effect is to use data collected from a randomized clinical trial, observational data has to be preprocessed using observational study design algorithms to ensure data used to make treatment effect is as similar as possible to data that would have been obtained from a clinical trial.

One of the assumptions made when estimating treatment effect from observational studies is what is called the the ignorabiltiy assumption,that is, only the observed covariates are sufficient to explain any meaningful differences between the treatment and control groups. The consequence of this assumption is that if the groups being compared are similar on average, on these observed covariates , then any differences in the outcome being studied is due to treatment- the only distinguishing factor between the groups being compared. The design phase of an observational study is usually focused on achieving the best possible balance of observed covariates between the treated and control group. Several design algorithms have been proposed and R packages that implement them are widely available. The main steps during the design phase of an observational study are:

  1. Determine variables that confounder the relationship between treatment and outcome
  2. Using confounders in step 1, compute a metric (e.g propensity score) to be used to measure distance between any two observations
  3. Match observation based on the metric in step 2
  4. Assess balance of variables in step 1 across treatment and control group. If balance is poor repeat step 2-4.
  5. Estimate the treatment effect in the preprocessed sample

In his 2008 paper , Stuart et al provides a detailed description of the major design algorithms and R packages that implement them.

When designing an observation study, one usually has to run different matching algorithm and pick one that gives the best balance. With various matching algorithms housed in different R packages , for sometime , it was hard to compare across design methods. To easy this process, Greifer et al developed the cobalt package that interacts seamlessly with objects from Matching and MacthIt packages and generate key balance summary statistics on.

The main critique of observational studies that use matched sampling design methods is that treatment effect estimates obtained are premised on the assumption that observed variables are sufficient to explain the differences between treated and control group. To deal with this criticism , Rosenbaum developed a sensitivity analysis framework that , if a significant treatment effect exists, computes a parameter to quantify a bias magnitude due to unmeasured confounders that would be required to nullify significant results. Presenting results of these sensitivity analysis allows the reader to make an informed decision on how much faith to place in your findings.

A significant portion of sensitivity methods developed by Rosenbaum has been implemented in the rboundspackage (Keele,2008). However rbounds functions only interact with objects from the Matching package. The package does not also implement a sensitivity analysis method for time to event outcomes. To perform sensitivity analysis involving matched sets, Rosenbaum developed two packages sensitivitymv and sensitivitymw(Rosenbaum,2013). These two packages also provide more robust test statistic that increase the power of sensitivity. However, to perform sensitivity analysis using these packages require reshaping your data set to conform to a certain format.

It is important to note that this package does not replace the highly sophisticated matching and sensitivity tools in the packages described above. What this package does is to bridge the most common matching packages to various sensitivity packages where such connections do not exist and would require more work on the part of the analyst.

Why use this package ?

Easy sensitivity analyisis when matched sampling is not necessary done with Matching package.

Our packages interacts seamlessly with objects from packages that implement matching methods not contained in the Matching package, specifically MatchIt(Ho et al 2007) and designmatch(Zubezarreta et al 2016). This package therefore gives the user the freedom to perform sensitivity analysis using output from any matching package.

Sensitivity analysis for matched sets from any package.

Unlike rbounds which implements sensitivity analysis for matched pairs only , this package contains functions which accept objects from various matching packages that produce matched sets. These functions will then ,simultaneously, transforms and performs sensitivity analysis using these objects. Using this package in this way can save significant time require to transform objects from various packages to create valid inputs for sensitivitymv or sensitivitymw, the two packages that perform sensitivity analysis for matched sets.

Balance and Sensitivity Analysis Plots.

Most matching packages contain plots and tables useful in judging quality of matches obtained after matching, the most common being the love plot (Thomas Love). This package adds to these tools by providing sensitivity analysis and amplification plots(Rosenbaum ,2013) to help visualize and interpret findings.

Using the package.

This package contains four main functions rubinrules(), pens2(), binarysens2, t2event() . Other auxilliary functions in the package include ampPlot() and eda(). In the sections that follow , a detailed description of how each is used is given.

Currently this package is housed on github therefore to install it use the following code:

library(devtools)
devtools::install_github("Ngendahimana/SensitivityR5")

rubinRules

rubinRules() is a function that can be use to evaluate the three balance meusures suggested by Rubin (2001) and are based on the theory published in another paper by Rubin and Thomas (2006). Rubin Provides three guidelines to ensure that the results of regression adjustment - often conducted in the analysis phase of observational studies to clear any residual imbalances - are valid. The three guidelines are:

An example of how to use the rubinRules function to assess these guidelines is illustrated below

library(SensitivityR5)

data("toy",package = "SensitivityR5")
psmodel <- glm(treated ~ covA + covB + covC + covD + covE + covF + Asqr + BC + BD, family=binomial(), data=toy)
toy$ps <- psmodel$fitted
toy$linps <- psmodel$linear.predictors
covlist1=c('covA', 'covB', 'covC', 'covD', 'covE', 'covF.Middle', 'covF.High', 'Asqr','BC', 'BD')
k =rubinRules2(data=toy,Treatment='treated',covlist=covlist1)
k$plot

The first two Rubin balance meausures are displayed at the top of the plot while the third balance measure which consists of series of residual variance ratios for variable are displayed inside the dot plot. The blue lines indicate acceptable range for the residual variance ratios. Ratio outside region bounded by the red lines are considered unacceptable. This plot is a ggplot object and can be modified like any ggplot graph. For example to modify the plot title and background, the following code could be executed:

k$plot+theme_bw()+labs(title = "Assessing Rubin (2010) Balance Measures")

Various items can be called off of the list (k) using conventional R methods. For example to obtain a dataset of residual variance ratio , you can do the following.

k$RUBIN3

Love Plot

Although Love plot for assessing standardized balance measure for objects produced by Matching and MatchIt has already been implemented in Cobalt package through the Love.plot function, this function cannot interact with designmatch object directly. On the other hand the love plot object currently implemented in designmatch package is not a ggplot objects and therefore makes it hard to customize as desired. The Love plot function in this package allows you to generate ggplot Love figures using objects from any of the three matching packages including designmatch package.

The code chunk below shows how to accomplish this.

Using designmatch objects

This example uses to design match which implements cardinality matching which balances variables directly rather than using a summary measure to quantify the distance between observations. This matching optimizes on the sample size rather than simply a distance measure like propensity score. designmatchingpackage also enables other more sophisticated matching algorithms such as fine, near-fine balancing, exact matching and near exact matching. These algoritms can be appealing in various circumstances, especially when the researcher wishes to incorporate subject matter understanding of the question at hand.

data("lalonde",package = "designmatch")
attach(lalonde)
## Treatment indicator
t_ind =lalonde$treat

## Distance matrix
dist_mat = NULL

## Subset matching weight.
subset_weight = 1 

# Moment balance: constrain differences in means to be at most .05 standard deviations apart
mom_covs = cbind(age, education, black, hispanic, married, nodegree, re74, re75)
mom_tols = round(absstddif(mom_covs, t_ind, .05), 2)
mom = list(covs = mom_covs, tols = mom_tols)

## Fine balance
#fine_covs = cbind(black, hispan, married, nodegree)
#fine = list(covs = fine_covs)

## Exact matching
#exact_covs = cbind(black)
#exact = list(covs = exact_covs)

## Solver options
t_max = 60*5
solver = "glpk"
approximate = 1
solver = list(name = solver, t_max = t_max, approximate = approximate,round_cplex = 0, trace = 0)

## Cardinality matching
out = bmatch(t_ind = t_ind, dist_mat = dist_mat, subset_weight = subset_weight, mom = mom,  solver = solver)

# Indices of the treated units and matched controls
t_id = out$t_id
c_id = out$c_id
#detach(lalonde)
detach(lalonde)

To assess balance with love_plot function in this package after cardinality matching you would need to execute the following code

p = love_plot(X =out, data = lalonde , covList=c("age", "education", "black", "hispanic", "married", "nodegree", "re74", "re75"))

p

Note that out is an object from the designmatch package that implements cardinality matching.

unlike the existing love plot in designmatch which does not allow one to modify its appearence, this plot can be enhanced using ggplot2 features. For example we can add desired standardized difference threshhold lines , say, from -0.1 to 0.1 and remove the legend title with the following code:

p+theme_bw()+geom_vline(xintercept = -0.1)+ geom_vline(xintercept = 0.1) +theme(legend.title = element_blank())

Object from MatchIt and Matching packages can also be directly used with the love_plot function. You can also use the love plot function to extrac other important balance measures. We illustrate this using optimal matching functionality within the MatchiItpackage...

Sensitivity Analysis.

As mentioned earlier, one of the aims of this package is to facilitate seamless transition between analysis of an observational study to perfoming sensitivity analysis. In the section that follow we illustrate how to use functions within this package to accomplish this task.

pens2 Function.

This function is meant to facilitate sensitivity analysis for continous outcomes. There is a similar function in rbounds package that implements this method howevever rbounds interacts only with Matching package objects which implements limited methods of matched sampling.

Using lalonde dataset , the code below illustrates how to do sensitivity analysis after implementing an optimal matching algorithm.

library(Matching);library(MatchIt)
data("lalonde",package = "Matching")

## Sensitivity analysis with a matchit object
m.out = matchit(treat ~ age  + educ +  black + hisp +married + nodegr + re74  + re75  +
u74 + u75, family=binomial, data = lalonde, method = "optimal")

## Estimating treatment effect. Ideally, balance assessement should be done prior to estimating treatment effect
mod = lm(re78~age  + educ +  black + hisp +married + nodegr + re74  + re75  +u74 + u75,data = match.data(m.out))

## Sensitivity analysis
sens.out =pens2(x = m.out, y="re78",Gamma = 2, GammaInc = 0.1,est = 629.7)

Note that when using MatchIt objects you have to specify the effect estimate or put -1 if there a negative effect estimate and 1 otherwise. After executing this code, you can access a table of sensitivity parameters and their respective Rosenbaum bounds by running the code below:.

kable(sens.out$bounds)

The other advantage of using this package is that it allows you to used objects processed by designmatch package to run sensitivity analysis. This especially important when one wants to implemented more sophisticated matching methods that are not available with either MatchtIt or Matching package such allowing for near fine matching or fine matching. For example to run a sensitivity analysis using the design match object out created earlier on and obtain Rosenbaum sensitivity bounds, the following code would be used:

data("lalonde",package = "designmatch")
attach(lalonde)
## Treatment indicator
t_ind = treatment
## Distance matrix
dist_mat = NULL
## Subset matching weight
subset_weight = 1
## Moment balance: constrain differences in means to be at most .05 standard deviations apart
mom_covs = cbind(age, education, black, hispanic, married, nodegree, re74, re75)
mom_tols = round(absstddif(mom_covs, t_ind, .05), 2)
mom = list(covs = mom_covs, tols = mom_tols)
## Fine balance
fine_covs = cbind(black, hispanic, married, nodegree)
fine = list(covs = fine_covs)
## Exact matching
exact_covs = cbind(black) 
exact = list(covs = exact_covs)
## Solver options
t_max = 60*5
solver = "glpk"
approximate = 1
solver = list(name = solver, t_max = t_max, approximate = approximate,round_cplex = 0, trace = 0)
## Match
out = bmatch(t_ind = t_ind, dist_mat = dist_mat, subset_weight = subset_weight,mom = mom, fine = fine, exact = exact, solver = solver)
sens2 = pens2(x = out, y="re78",Gamma = 2, GammaInc = 0.1,est = 234,treat = "treatment",data = lalonde)
kable(sens2$bounds)
detach(lalonde)

Survsens function.

At the time of writing this vignette there are no sensitivity package with a function that implements sensitivity analysis for time to event outcomes. Survsens performs this analysis and is largerly based on lecture notes provided by Prof.Thomas Love in his travelling course on design of observational studies. We illustrate how to use this function using made up data called toy that is part of this package. To run sensitivity analysis with this function, you first have to perfom matching using a matching package of your choice and assess quality of your match. For example we can first match using MatchIt package with the following code

data("toy",package = "SensitivityR5")
psmodel <- glm(treated ~ covA + covB + covC + covD + covE + covF + Asqr + BC + BD, family=binomial(), data=toy)
toy$ps <- psmodel$fitted
toy$linps <- psmodel$linear.predictors
X <- toy$linps ## matching on the linear propensity score
Tr <- as.logical(toy$treated)
Y <- toy$out3.time
match1 <- Match(Y=Y, Tr=Tr, X=X, M = 1, replace=FALSE, ties=FALSE)

match.it <- matchit(treated ~ covA + covB + covC + covD + covE + covF + Asqr + BC + BD, data = toy, method='nearest', ratio=1) 

After assessing the quality of matches and estimating treatment effect, you can then do a sensitivity analysis with SurvSens function using the code similar to the one below to extract Rosenbaum bounds table.

res_Surv =Survsens(x= match.it,data =toy,exp='treated',outcome = 'out2',failtime = 'out3.time',Gamma=1.2,Gammainterval = 0.01,alpha ,plot_title = 'Time To Event Outcome Sensitivity Plot')

kable(res_Surv$bounds)

MultiControlSens function

Currently sensitivitymw is the only package that implements sensitivity analysis for in matched sampling designs that allow matched sets rather than simply matched pair. The MultiControlSens complements the sensitivitymw package by processing objects from various matching packages to allow seamless interactions during sensitivity analysis. For example, using the lalonde data , if we match 2 controls to 1 treated using the optimal option in matchit function , we can perform sensititivity analysis using the following code:

data("lalonde",package ="designmatch")
covs0 = c("age", "education", "black", "hispanic", "married", "nodegree", "re74", "re75")
m.out <- matchit(f.build("treatment", covs0), data = lalonde, method = "nearest", replace = TRUE,ratio = 2)
res1 = multiControlSens(X =m.out,outcomeName = "re78",Gamma = 2,GammaInc = 0.1,n_contrl = 2)

This function returns Rosenbaum sensitivity bounds and a reformated dataset that can be used for other analyses. The first few rows of the data returned by the function are shown below

kable(head(res1$data,5))

Rosenbaum bounds can also be exctracted

kable(res1$pvalues)

From the table above we observe designing an observational study such that 2 control can be matched to a single case then the odds of occurence of an unobserved variable in one exposure group would have to 1.3 the odds of occurence in the other exposure group to nullify any qualitative findings of significance treatment difference between these exposure groups.

binarysens2 function.

The rbounds package has implimented method developed by Rosenbaum for binary outcomes sensitivity analyis using the binarysens function. In fact , binarysens2 gets most of its coding inspiration from binarysens2. What this function adds on is the ability to do sensitivity analysis using other algorithms that binarysensis currently unable to intaract with. In the sections that follow we illustrate how to use binarysens2 with objects from various packages.

Using matchIt objects.

data("GerberGreenImai",package = "Matching")
## Sensitivity analysis with a Match object
m.out = matchit(PHN.C1 ~ PERSONS + VOTE96.1 + NEW +MAJORPTY + AGE + WARD , family=binomial, data = GerberGreenImai, method = "nearest")
mod = lm(VOTED98 ~ PHN.C1+PERSONS + VOTE96.1 + NEW +MAJORPTY + AGE + WARD,data = match.data(m.out))
binarysens2(x=m.out,y ="VOTED98", Gamma=2, GammaInc=.1)

The graph above plots the p-value upper bounds corresponding to various $\Gamma$ parameters. The cordinates displayed on the plot refers tot the maximum bias allowable under a specified significance level ($\alpha$) beyond which the significant findings of true treatment effect can be attributed to unmeasured biases.

pens2 function

This function is meant to facilitate sensitivity analysis when you have a continous outcome. Again, unlike the current function in rbounds, this package is meant to handle objects from multiple matching algorithms including cardinality matching. In the sections that follow we illustrate how to use it to perform sensitivity analysis with objects from MatchIt and designMatch packages.

with MatchIt objects

We use the lalonde data in this illustration. Suppose you perfom an optimal match on observed covariates on, then you would execute the following code.

data("lalonde",package = "Matching")
m.out = matchit(treat ~ age  + educ +  black + hisp +married + nodegr + re74  + re75  +
u74 + u75, family=binomial, data = lalonde, method = "nearest")

Ideally, the next step, which we skip here, would be to assess balance of baseline covariates using tools such as the rubins plot and love plots and finally estimate an effect. To perfom sensitivity analysis with pens2, you then need to pass a matchIt object, the estimated treatment effect, name of the outcome variable and the maximum value of the Rosenbaum sensitivity parameter.

mod = lm(re78~age  + educ +  black + hisp +married + nodegr + re74  + re75  +u74 + u75,data = match.data(m.out))

sens_object = pens2(x = m.out, y="re78",Gamma = 2, GammaInc = 0.1,est = 629.7)

The pens2 function returns a list of items that include a table of bounds for the range of sensitivity parameters passed to the function. To object the table, you can call it from the list as follows

kable(sens_object$bounds)

with designMatch objects.

If on the other hand you have chosen to do cardinality matching, which optimizes on sample size under constraints of covariate balance, then you would follow the same steps as in section above. First perfom matching with the designMatch package;

data("lalonde",package = "designmatch")
library(designmatch)
attach(lalonde)
## Treatment indicator
t_ind = treatment
## Distance matrix
dist_mat = NULL
## Subset matching weight
subset_weight = 1
## Moment balance: constrain differences in means to be at most .05 standard deviations apart
mom_covs = cbind(age, education, black, hispanic, married, nodegree, re74, re75)
mom_tols = round(absstddif(mom_covs, t_ind, .05), 2)
mom = list(covs = mom_covs, tols = mom_tols)
## Fine balance
fine_covs = cbind(black, hispanic, married, nodegree)
fine = list(covs = fine_covs)
## Exact matching
exact_covs = cbind(black)
exact = list(covs = exact_covs)
## Solver options
t_max = 60*5
solver = "glpk"
approximate = 1
solver = list(name = solver, t_max = t_max, approximate = approximate,round_cplex = 0, trace = 0)
## Match
out = bmatch(t_ind = t_ind, dist_mat = dist_mat, subset_weight = subset_weight,mom = mom, fine = fine, exact = exact, solver = solver)

Note, in this matching scheme, we have specified that we are fine matching on marital status,race e.t.c. These specifications are all optional. After matchin , you can then call the pens2to perform sensitivity analysis.

cardMatch = pens2(x = out, y="re78",Gamma = 2, GammaInc = 0.1,est = 234,treat = "treatment",data = lalonde)

kable(head(cardMatch$bounds,5))
detach(lalonde)


Ngendahimana/sensitivityR5 documentation built on June 24, 2020, 4:09 a.m.