Finds optimal near-far match

Share:

Description

Discovers optimal near-far matches using the partial F statistic (for continuous treatments) or residual deviance (for binary treatments); PLEASE NOTE required variable ordering

Usage

1
2
opt.nearfar(dta, trt.bin = FALSE, imp.var = NA, tol.var = NA,
 adjust.IV = TRUE, sink.range = c(0, 0.5), cutp.range = NA, max.time.seconds = 300)

Arguments

dta

Data frame wherein first column is outcome, second column is treatment, third column is IV, and fourth through last columns are measured confounders

trt.bin

TRUE if treatment of interest is binary; FALSE otherwise

imp.var

A list of (up to 5) named variables to prioritize in the “near” matching

tol.var

A list of (up to 5) tolerances attached to the prioritized variables where 0 is highest priority

adjust.IV

if TRUE, include measured confounders in treatment~IV model that is optimized; if FALSE, exclude

sink.range

A two element vector of (min, max) for range of sinks over which to optimize in the near-far match; default (0, 0.5) such that maximally 50% of observations can be removed

cutp.range

a two element vector of (min, max) for range of cutpoints (how far apart the IV will become) over which to optimize in the near-far match; default is (one SD of IV, range of IV)

max.time.seconds

How long to let the optimization algorithm run; default is 300 seconds = 5 minutes

Details

PLEASE NOTE required variable ordering for input data frame such that first column is outcome, second column is treatment, third column is IV, and fourth through last columns are measured confounders - otherwise your results will not make sense! Additionally, if any absolute standardadized differences are greater than 0.2 in the measured confounders, you may want to further restrict the sink and cutpoint ranges over which to optimize.

Value

n.calls

Number of calls made to the objective function

sink.range

A two element vector of (min, max) for range of sinks over which to optimize in the near-far match; default (0, 0.5) such that maximally 50% of observations can be removed

cutp.range

a two element vector of (min, max) for range of cutpoints (how far apart the IV will become) over which to optimize in the near-far match; default is (one SD of IV, range of IV)

pct.sink

Optimal percent sinks

cutp

Optimal cutpoint

maxF

Highest value of partial F-statistic (continuous treatment) or residual deviance (binary treatment) found by simulated annealing optimizer

match

A two column matrix where the first column is the index of an “encouraged” individual and the second column is the index of the corresponding “discouraged” individual from the pair matching

summ

A table of mean variable values for both the “encouraged” and “discouraged” groups across all variables plus absolute standardized differences for each variable

Author(s)

Joseph Rigdon jrigdon@stanford.edu

References

Lu B, Greevy R, Xu X, Beck C (2011). Optimal nonbipartite matching and its statistical applications. The American Statistician, 65(1), 21-30.

Xiang Y, Gubian S, Suomela B, Hoeng J (2013). Generalized Simulated Annealing for Efficient Global Optimization: the GenSA Package for R. The R Journal, 5(1). URL http://journal.r-project.org/.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
#Generate data
set.seed(81)
dta = mvrnorm(100,c(10,10,10),matrix(c(1,-0.5,0.5,-0.5,1,0.5,0.5,0.5,1),3,3))
Zstar = dta[,1] #Part of Z that is correlated with unmeas conf
X.unmeas = dta[,2] #Unmeas conf
X.meas = dta[,3] #Meas conf
IV = rnorm(100,10,1) #Instrumental variable
Z = 1+5*Zstar+3*X.meas+1*IV+rnorm(100,0,10) #Observed treatment
Y = 1+1*Z+1*X.meas+5*X.unmeas+rnorm(100,0,20) #Outcome
df.sim = data.frame(Y=Y,Z=Z,IV=IV,X=X.meas) #set up for near-far match 
#(X is measured confounder)

#Execute near-far match (just for illustration)
#The default setting is max.time.seconds=300
nf = opt.nearfar(dta=df.sim,trt.bin=FALSE,imp.var=NA,tol.var=NA,
    adjust.IV=TRUE,max.time.seconds=3)

#Look at absolute standardized difference of variables after match
nf$summ

#Illustrate inference using effect ratio
eff.ratio(dta=df.sim,match=nf$match,alpha=0.05)

#Illustrate inference using 2SLS post-near-far match
df.sim3 = df.sim[as.numeric(nf$match),]
m.post = ivreg(Y~X+Z|IV+X,data=df.sim3)
summary(m.post)$coeff