README.md

MultiObjMatch: Multi-objective Matching for R

The MultiObjMatch offers a user-friendly R package that implements matching of two groups of subjects to treated and control groups in observational studies. This package allows users to form matches that achieve a specified balance among the three objectives: the number of treated units matched, the total variation imbalance on the marginal distribution of key categorical variables, and sum of within-pair distance. Researchers are allowed to form matches that meet user-specified design goals for matching problems in observational studies. More detailed discussion can be found in Pimentel and Kelz (2020).

Besides the main matching algorithm, the package also contains useful functions for generate numeric and graphical diagnostics.

0.Set-up

This section provides a guide to the package usage. Before running the example, users can install the package from github.

library(devtools)
install_github("ShichaoHan/MultiObjMatch", ref="main")
library(MultiObjMatch)

In the demo below, the dataset "lalonde" is loaded from the package MatchIt.

library(cobalt)
data("lalonde", package="cobalt")

1.Matching

After data pre-processing, users can use the main matching function distBalMatch or twoDistMatch to generate a set of possible matches.

1.1 distBalMatch('Basic' version)

If users want to trade-off among (1)pair-wise distance, (2)number of treated units left unmatched and (3)the distance between the marginal distributions of specified balance variables.

psCols <- c("age", "educ", "married", "nodegree")
treatVal <- "treat"
responseVal <- "re78"  
pairDistVal <- c("age", "married", "educ", "nodegree")
exactVal <- NULL
myBalVal <- c("race")
r1s <- seq(0.1, 5, 0.5)
r2s <- c(0.001)

resDistBal <- distBalMatch(df=lalonde, treatCol= treatVal,myBalCol = myBalVal, rhoExclude=r1s, rhoBalance=r2s, distList = pairDistVal, exactlist = exactVal, propensityCols = psCols,idCol = NULL, maxUnMatched = 0.1, caliperOption=NULL, 
                                    toleranceOption=1e-1, maxIter=0, rho.max.f = 10)

1.2 twoDistMatch('Advanced' version)

If users were to tradeoff among two different distance measures and the number of treated units left unmatched, twoDistMatch should be used for generating possible matches. Users can create their own distance matrices and input into the matching function.

## Data generation
set.seed(999)
x1 = rnorm(100, 0, 0.5)
x2 = rnorm(100, 0, 0.1)
x3 = rnorm(100, 0, 1)

x4 = rnorm(100, x1, 0.1)

r1ss <- seq(0.1,50, 10)
r2ss <- seq(0.1,50, 10)

x = cbind(x1, x2, x3,x4)
z = sample(c(rep(1, 50), rep(0, 50)))
e1 = rnorm(100, 0, 1.5)
e0 = rnorm(100, 0, 1.5)
y1impute = x1^2 + 0.6*x2^2 + 1 + e1
y0impute = x1^2 + 0.6*x2^2 + e0
treat = (z==1)
y = ifelse(treat, y1impute, y0impute)

names(x) <- c("x1", "x2", "x3", "x4")
df <- data.frame(cbind(z, y, x))
df$x5 <- 1
d1 <- as.matrix(dist(df["x1"]))
d2 <- as.matrix(dist(df["x2"]))

idx <- 1:length(z)
treatedUnits <- idx[z==1]
controlUnits <- idx[z==0]

d1 <- as.matrix(d1[treatedUnits, controlUnits])
d2 <- as.matrix(d2[treatedUnits, controlUnits])



resTwoDist <- twoDistMatch(df = df, treatCol = "z", responseCol = "y",  
                     dMat=d1, dType= "User", dMat1=d2, dType1="User", myBalCol=c("x5"),  rhoExclude=r1ss, rhoDistance=r2ss, propensityCols = c("x1"), pScores = NULL, idCol = NULL, maxUnMatched = 0.1, caliperOption=0.25, 
                     toleranceOption=1e-6, maxIter=3, rho.max.f = 10)

2.Numeric Diagnostics

One primary function that can be used for analysis is the function that generate the objective values with corresponding penalty coefficients: generateRhoObj.

generateRhoObj(resDistBal)

Users can use the main function compare_matching on specified covariates to compare the covariate balance across different matching.


compareMatching(resDistBal, covList=c("age", "educ", "race", "married", "nodegree"))

The number of matched units and percentage of matched units can be automatically generated using the helper function getUnmatched:

getUnmatched(resDistBal)

3.Graphical Diagnostics

The primary graphical diagnostic function is visualize, which can be applied to the matching results from both basic and advanced versions.

visualize(resTwoDist, "dist1", "exclude")

There are three helper functions that generate the graphical diagnostics for the basic version. Note that they are only for the basic version, where the original dataframe containing all covariates information is necessary.

generateTVGraph(resDistBal)
generatePairdistanceGraph(resDistBal)
generatePairdistanceBalanceGraph(resDistBal)

4.Get Matched Data

Users can use the helper function matched_data to obtain the dataframe containing only the matched treated and control pairings by passing in the result from the main matching function and the index of the match.

matchedData(resDistBal, 1)

Then, users can use outcome analysis of their choice upon the matched data.



ShichaoHan/MultiObjMatch documentation built on May 3, 2022, 7:24 p.m.