Genetic Matching
Description
This function finds optimal balance using multivariate matching where
a genetic search algorithm determines the weight each covariate is
given. Balance is determined by examining cumulative probability
distribution functions of a variety of standardized statistics. By
default, these statistics include ttests and KolmogorovSmirnov
tests. A variety of descriptive statistics based on empiricalQQ
(eQQ) plots can also be used or any user provided measure of balance.
The statistics are not used to conduct formal hypothesis tests,
because no measure of balance is a monotonic function of bias and
because balance should be maximized without limit. The object
returned by GenMatch
can be supplied to the Match
function (via the Weight.matrix
option) to obtain causal
estimates. GenMatch
uses genoud
to
perform the genetic search. Using the cluster
option, one may
use multiple computers, CPUs or cores to perform parallel
computations.
Usage
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17  GenMatch(Tr, X, BalanceMatrix=X, estimand="ATT", M=1, weights=NULL,
pop.size = 100, max.generations=100,
wait.generations=4, hard.generation.limit=FALSE,
starting.values=rep(1,ncol(X)),
fit.func="pvals",
MemoryMatrix=TRUE,
exact=NULL, caliper=NULL, replace=TRUE, ties=TRUE,
CommonSupport=FALSE, nboots=0, ks=TRUE, verbose=FALSE,
distance.tolerance=1e05,
tolerance=sqrt(.Machine$double.eps),
min.weight=0, max.weight=1000,
Domains=NULL, print.level=2,
project.path=NULL,
paired=TRUE, loss=1,
data.type.integer=FALSE,
restrict=NULL,
cluster=FALSE, balance=TRUE, ...)

Arguments
Tr 
A vector indicating the observations which are in the treatment regime and those which are not. This can either be a logical vector or a real vector where 0 denotes control and 1 denotes treatment. 
X 
A matrix containing the variables we wish to match on. This matrix may contain the actual observed covariates or the propensity score or a combination of both. 
BalanceMatrix 
A matrix containing the variables we wish
to achieve balance on. This is by default equal to 
estimand 
A character string for the estimand. The default estimand is "ATT", the sample average treatment effect for the treated. "ATE" is the sample average treatment effect, and "ATC" is the sample average treatment effect for the controls. 
M 
A scalar for the number of matches which should be
found. The default is onetoone matching. Also see the 
weights 
A vector the same length as 
pop.size 
Population Size. This is the number of individuals

max.generations 
Maximum Generations. This is the maximum
number of generations that 
wait.generations 
If there is no improvement in the objective
function in this number of generations, optimization will stop. The
other options controlling termination are 
hard.generation.limit 
This logical variable determines if the

starting.values 
This vector's length is equal to the number of variables in 
fit.func 
The balance metric 
MemoryMatrix 
This variable controls if

exact 
A logical scalar or vector for whether exact matching
should be done. If a logical scalar is
provided, that logical value is applied to all covariates in

caliper 
A scalar or vector denoting the caliper(s) which
should be used when matching. A caliper is the distance which is
acceptable for any match. Observations which are outside of the
caliper are dropped. If a scalar caliper is provided, this caliper is
used for all covariates in 
replace 
A logical flag for whether matching should be done with
replacement. Note that if 
ties 
A logical flag for whether ties should be handled deterministically. By
default 
CommonSupport 
This logical flag implements the usual procedure
by which observations outside of the common support of a variable
(usually the propensity score) across treatment and control groups are
discarded. The 
nboots 
The number of bootstrap samples to be run for the

ks 
A logical flag for if the univariate bootstrap
KolmogorovSmirnov (KS) test should be calculated. If the ks option
is set to true, the univariate KS test is calculated for all
nondichotomous variables. The bootstrap KS test is consistent even
for noncontinuous variables. By default, the bootstrap KS test is
not used. To change this see the 
verbose 
A logical flag for whether details of each
fitness evaluation should be printed. Verbose is set to FALSE if
the 
distance.tolerance 
This is a scalar which is used to determine
if distances between two observations are different from zero. Values
less than 
tolerance 
This is a scalar which is used to determine numerical tolerances. This option is used by numerical routines such as those used to determine if a matrix is singular. 
min.weight 
This is the minimum weight any variable may be given. 
max.weight 
This is the maximum weight any variable may be given. 
Domains 
This is a 
print.level 
This option controls the level of printing. There
are four possible levels: 0 (minimal printing), 1 (normal), 2
(detailed), and 3 (debug). If level 2 is selected, 
project.path 
This is the path of the

paired 
A flag for whether the paired 
loss 
The loss function to be optimized. The default value, If the value of 
data.type.integer 
By default, floatingpoint weights are considered. If this option is
set to 
restrict 
A matrix which restricts the possible matches. This
matrix has one row for each restriction and three
columns. The first two columns contain the two observation numbers
which are to be restricted (for example 4 and 20), and the third
column is the restriction imposed on the observationpair.
Negative numbers in the third column imply that the two observations
cannot be matched under any circumstances, and positive numbers are
passed on as the distance between the two observations for the
matching algorithm. The most commonly used positive restriction is
Exclusion restriction are even more common. For example, if we want
to exclude the observation pair 4 and 20 and the pair 6 and 55 from
being matched, the restrict matrix would be:

cluster 
This
can either be an object of the 'cluster' class returned by one of
the 
balance 
This logical flag controls if load balancing is done
across the cluster. Load balancing can result in better cluster
utilization; however, increased communication can reduce
performance. This option is best used if each individual call to

... 
Other options which are passed on to

Value
value 
The fit
values at the solution. By default, this is a vector of pvalues
sorted from the smallest to the largest. There will generally be
twice as many pvalues as there are variables in

par 
A vector
of the weights given to each variable in 
Weight.matrix 
A matrix whose diagonal corresponds to the
weight given to each variable in 
matches 
A matrix where the first column contains the row
numbers of the treated observations in the matched dataset. The
second column contains the row numbers of the control
observations. And the third column contains the weight that each
matched pair is given. These objects may not correspond
respectively to the 
ecaliper 
The
size of the enforced caliper on the scale of the 
Author(s)
Jasjeet S. Sekhon, UC Berkeley, sekhon@berkeley.edu, http://sekhon.berkeley.edu/.
References
Sekhon, Jasjeet S. 2011. "Multivariate and Propensity Score Matching Software with Automated Balance Optimization.” Journal of Statistical Software 42(7): 152. http://www.jstatsoft.org/v42/i07/
Diamond, Alexis and Jasjeet S. Sekhon. 2013. "Genetic Matching for Estimating Causal Effects: A General Multivariate Matching Method for Achieving Balance in Observational Studies.” Review of Economics and Statistics. 95 (3): 932–945. http://sekhon.berkeley.edu/papers/GenMatch.pdf
Sekhon, Jasjeet Singh and Walter R. Mebane, Jr. 1998. "Genetic Optimization Using Derivatives: Theory and Application to Nonlinear Models.” Political Analysis, 7: 187210. http://sekhon.berkeley.edu/genoud/genoud.pdf
See Also
Also see Match
, summary.Match
,
MatchBalance
, genoud
,
balanceUV
, qqstats
,
ks.boot
, GerberGreenImai
, lalonde
Examples
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39  data(lalonde)
attach(lalonde)
#The covariates we want to match on
X = cbind(age, educ, black, hisp, married, nodegr, u74, u75, re75, re74)
#The covariates we want to obtain balance on
BalanceMat < cbind(age, educ, black, hisp, married, nodegr, u74, u75, re75, re74,
I(re74*re75))
#
#Let's call GenMatch() to find the optimal weight to give each
#covariate in 'X' so as we have achieved balance on the covariates in
#'BalanceMat'. This is only an example so we want GenMatch to be quick
#so the population size has been set to be only 16 via the 'pop.size'
#option. This is *WAY* too small for actual problems.
#For details see http://sekhon.berkeley.edu/papers/MatchingJSS.pdf.
#
genout < GenMatch(Tr=treat, X=X, BalanceMatrix=BalanceMat, estimand="ATE", M=1,
pop.size=16, max.generations=10, wait.generations=1)
#The outcome variable
Y=re78/1000
#
# Now that GenMatch() has found the optimal weights, let's estimate
# our causal effect of interest using those weights
#
mout < Match(Y=Y, Tr=treat, X=X, estimand="ATE", Weight.matrix=genout)
summary(mout)
#
#Let's determine if balance has actually been obtained on the variables of interest
#
mb < MatchBalance(treat~age +educ+black+ hisp+ married+ nodegr+ u74+ u75+
re75+ re74+ I(re74*re75),
match.out=mout, nboots=500)
# For more examples see: http://sekhon.berkeley.edu/matching/R.

Want to suggest features or report bugs for rdrr.io? Use the GitHub issue tracker. Vote for new features on Trello.