distBalMatch: Matching tradeoffs distance, exclusion and marginal imbalance

distBalMatchR Documentation

Matching tradeoffs distance, exclusion and marginal imbalance

Description

distBalMatch is the main matching function that takes the input information to generate possible set of matches. This is where users can pass in the data, change the parameters and obtain matching in a varying range. If all the optional arguments are set to default, the function explores the tradeoffs among three objective functions: pair-wirse Mahalanois distance sum, the number of treated units being included in the match, and the total variation distance on the pre-specified covariates. used for measuring marginal balance.

Usage

distBalMatch(
  df,
  treatCol,
  myBalCol,
  rhoExclude = c(1),
  rhoBalance = c(1, 2, 3),
  distMatrix = NULL,
  distList = NULL,
  exactlist = NULL,
  propensityCols = NULL,
  pScores = NULL,
  idCol = NULL,
  responseCol = NULL,
  maxUnMatched = 0.25,
  caliperOption = NULL,
  toleranceOption = 0.01,
  maxIter = 0,
  rho.max.f = 10
)

Arguments

df

data frame that contain columns indicating treatment, outcome and covariates.

treatCol

character of name of the column indicating treatment assignment.

myBalCol

charactor of column name of the variable that we want to evaluate balance on; NULL by default.

rhoExclude

(optional) factor of values of exclusion penalty. Default value is c(1).

rhoBalance

(optional) factor of values of marginal balance. Default value is c(1,2,3).

distMatrix

(optional) a matrix that specifies the pair-wise distances between any two objects; default is NULL.

distList

(optional) factor of the names of the variables used for calculating within-pair distance; default is NULL.

exactlist

(optional) factor of the names of the variables that we want exact matching on; NULL by default.

propensityCols

(optional) factor of names of columns that users want to fit a propensity score model.

pScores

(optional) character of the name of the column that indicate the propensity score; default is NULL.

idCol

(optional) character of the name of the column that indicate the id for each unit; default is NULL.

responseCol

(optional) character of name of the column indicating the outcome variable. NULL by default.

maxUnMatched

(optional) double of the maximum proportion of unmatched unit that can be accepted; default is 0.25.

caliperOption

(optional) double of the caliper value; default is NULL, which is no caliper.

toleranceOption

(optional) double of tolerance of close match distance; default is 1e-2.

maxIter

(optional) interger of the maximum number of iterations to search for (rho1, rho2) pair that improve the matching; default is 0.

rho.max.f

(optional) double of the scaling factor used in proposal for rhos; default is 10.

Details

This is the main function that users can use to obtain possible matchings. Changing the parameters can lead to matchings with varying values of ojective function and level of balance on covariates.

  • rhoExclude corresponds to the coefficient in front of the objective function of number of treated units being excluded.

  • rhoBalance corresponds to the coefficient in front of the objective function of marginal balance (by default) or the second distance measure (if optionally specified by the users).

  • Propensity scores are fitted to impose the calipers. The matching algorithm would exclude units that are "far" from each others; by default, units that are 0.25 standard deviation of propensity score are not considered for matching. Users can specify the covariates used for fitting a propensity score model through the argument propensityCols. Users can also add a column of fitted propensity score values <pScores> to the dataframe, and accordingly change the argument pScores.

  • toleranceOption controls for the precision of the objective functions value. In the matching algorithm, precision might be lost during the data abstraction. Generally, the smaller the toleranceOption value, the higher the precision. Large tolerance value might result in an undesired optimization outcome.

  • maxIter controls for the number of iteration of automatic grid search of rho values in the multi-objective optimization problem. Users can either change maxIter parameter or add more values to the arguments rhoExclude and rhoBalance to expand the range of matchings to be explored.

  • rho.max.f controls for the maximal coefficients in front of the objective function corresponding to the marginal balance or the second distance measure to be rho.max.f times the maximal pair-wise distance

  • distMatrix is the distance matrix for pair-wise distance with size (number of treated units, number of control units). The (i,j) element of the matrix is thus the distance between ith treated unit and jth control unit

Value

a named list whose elements are:

  • "rhoList": list of rhos for each match

  • "matchList": list of matches indexed by number

  • "treatmentCol": character of treatment variable

  • "covs": factor of names of the variables used for calculating within-pair distance

  • "exactCovs": factor of names of variables that we want exact or close match on

  • "idMapping": factor of row index for each observation in the sorted data frame for internal use

  • "stats": data frame of important statistics (total variation distance) for variable on which marginal balance is measured

  • "b.var": character of the variable on which marginal balance is measured

  • "dataTable": data frame sorted by treatment value

  • "t": a treatment vector

  • "df": the original dataframe input by the user

  • "pair_cost1": list of pair-wise distance sum using the first distance measure

  • "pair_cost2": list of pair-wise distance sum using the second distance measure

  • "version": the version of the matching function called. This functionality is primarily designed for internal use. "Basic" indicates the matching comes from distBalMatch and "Advanced" from twoDistMatch.

  • "fDist1": a vector of values for the first objective function; it corresponds to the pair-wise distance sum according to the first distance measure.

  • "fExclude": a vector of values for the second objective function; it corresponds to the number of treated units being unmatched.

  • "fDist2": a vector of values for the third objective function; it corresponds to the marginal balanced distance for the specified variable(s).

See Also

Other main matching function: twoDistMatch()

Examples

## Not run: 
data("lalonde", package="cobalt")
psCols <- c("age", "educ", "married", "nodegree")
treatVal <- "treat"
responseVal <- "re78"
pairDistVal <- c("age", "married", "educ", "nodegree")
exactVal <- c("educ") 
myBalVal <- c("race")
r1s <- c( 0.1, 0.3, 0.5, 0.7, 0.9,1, 1.1, 1.2, 1.3, 1.4, 1.5, 1.6, 1.7)
r2s <- c(0.01)
matchResult <- distBalMatch(lalonde, treatVal, responseVal, myBalVal, 
pairDistVal, exactVal,rhoExclude =r1s, rhoBalance=r2s, 
propensityCols = psCols, pScores = NULL, idCol = NULL, maxUnMatched = 0.1, 
caliperOption=NULL, toleranceOption=1e-1, maxIter=0, rho.max.f = 10)

## End(Not run)

ShichaoHan/MultiObjMatch documentation built on May 3, 2022, 7:24 p.m.