Structural Matching Model to correct for sample selection bias in onesided matching markets
Description
The function provides a Gibbs sampler for a structural matching model that corrects for sample selection bias when the selection process is a onesided matching game; that is, group/coalition formation.
The input is individuallevel data of all group members from onesided matching marktes; that is, from group/coalition formation games.
In a first step, the function generates a model matrix with characteristics of all feasible groups of the same size as the observed groups in the market.
For example, in the stable roommates problem with n=4 students {1,2,3,4} sorting into groups of 2, we have "4 choose 2" = 6 feasible groups: (1,2)(3,4) (1,3)(2,4) (1,4)(2,3).
In the group formation problem with n=6 students {1,2,3,4,5,6} sorting into groups of 3, we have "6 choose 3" = 20 feasible groups. For the same students sorting into groups of sizes 2 and 4, we have "6 choose 2" + "6 choose 4" = 30 feasible groups.
The structural model consists of a selection and an outcome equation. The Selection Equation determines which matches are observed (D=1) and which are not (D=0).
D = 1[V in Γ] with V = Wα + η
Here, V is a vector of latent valuations of all feasible matches, ie observed and
unobserved, and 1[.] is the Iverson bracket.
A match is observed if its match valuation is in the set of valuations Γ
that satisfy the equilibrium condition (see Klein, 2015a). This condition differs for matching
games with transferable and nontransferable utility and can be specified using the method
argument.
The match valuation V is a linear function of W, a matrix of characteristics for
all feasible groups, and η, a vector of random errors. α is a paramter
vector to be estimated.
The Outcome Equation determines the outcome for observed matches. The dependent
variable can either be continuous or binary, dependent on the value of the binary
argument. In the binary case, the dependent variable R is determined by a threshold
rule for the latent variable Y.
R = 1[Y > c] with Y = Xβ + ε
Here, Y is a linear function of X, a matrix of characteristics for observed matches, and ε, a vector of random errors. β is a paramter vector to be estimated.
The structural model imposes a linear relationship between the error terms of both equations as ε = δη + ξ, where ξ is a vector of random errors and δ is the covariance paramter to be estimated. If δ were zero, the marginal distributions of ε and η would be independent and the selection problem would vanish. That is, the observed outcomes would be a random sample from the population of interest.
Usage
1 2 3 4 5  stabit(x, m.id = "m.id", g.id = "g.id", R = "R", selection = NULL,
outcome = NULL, simulation = "none", seed = 123, max.combs = Inf,
method = "NTU", binary = FALSE, offsetOut = 0, offsetSel = 0,
marketFE = FALSE, censored = 0, gPrior = FALSE, dropOnes = FALSE,
interOut = 0, interSel = 0, standardize = 0, niter = 10)

Arguments
x 
data frame with individuallevel characteristics of all group members including market and groupidentifiers. 
m.id 
character string giving the name of the market identifier variable. Defaults to 
g.id 
character string giving the name of the group identifier variable. Defaults to 
R 
dependent variable in outcome equation. Defaults to 
selection 
list containing variables and pertaining operators in the selection equation. The format is

outcome 
list containing variables and pertaining operators in the outcome equation. The format is

simulation 
should the values of dependent variables in selection and outcome equations be simulated? Options are 
seed 
integer setting the state for random number generation if 
max.combs 
integer (divisible by two) giving the maximum number of feasible groups to be used for generating grouplevel characteristics. 
method 
estimation method to be used. Either 
binary 
logical: if 
offsetOut 
vector of integers indicating the indices of columns in 
offsetSel 
vector of integers indicating the indices of columns in 
marketFE 
logical: if 
censored 
draws of the 
gPrior 
logical: if 
dropOnes 
logical: if 
interOut 
twocolum matrix indicating the indices of columns in 
interSel 
twocolum matrix indicating the indices of columns in 
standardize 
numeric: if 
niter 
number of iterations to use for the Gibbs sampler. 
Details
Operators for variable transformations in selection
and outcome
arguments.
add
sum over all group members and divide by group size.
int
sum over all possible twoway interactions x*y of group members and divide by the number of those, given by
choose(n,2)
.ieq
sum over all possible twoway equality assertions 1[x=y] and divide by the number of those.
ive
sum over all possible twoway interactions of vectors of variables of group members and divide by number of those.
inv
...
dst
sum over all possible twoway distances between players and divide by number of those, where distance is defined as exp(xy).
Values of model.list
D
vector that indicates – for all feasible groups in the market – whether a group is observed in the data
D=1
or notD=0
.R
list of grouplevel outcome vectors for equilibrium groups.
W
list with data.frame
W[[t]][G,]
containing characteristics of groupG
in markett
(all feasible groups).X
list with data.frame
X[[t]][G,]
containing characteristics of groupG
in markett
(equilibrium groups only).V
vector of group valuations for all feasible groups in the market.
P
vector that gives for each group the index of the group comprised of residual individuals in the market (for 2group markets).
epsilon
if
simulation!="none"
, the errors in the outcome equation, given bydelta*eta + xi
.eta
if
simulation!="none"
, the standard normally distributed errors in the selection equation.xi
if
simulation!="none"
, the standard normally distributed component of the errors in the selection equation that is independent ofeta
.combs
partitions matrix that gives all feasible partitions of the market into groups of the observed sizes.
E
matrix that gives the indices of equilibrium group members for each group in the market. Only differs from the first two rows in
combs
ifsimulation!="none"
.sigmasquareximean
variance estimate of the error term
xi
in the outcome equation.
Values of model.frame
SEL
data frame comprising variables in selection equation and number of observations equal to the number of feasible groups.
OUT
data frame comprising variables in outcome equation and number of observations equal to the number of equilibrium groups.
Values of draws
alphadraws
matrix of dimension
ncol(W)
x
niter
comprising all paramter draws for the selection equation.betadraws
matrix of dimension
ncol(X)
x
niter
comprising all paramter draws for the outcome equation.deltadraws
vector of length
niter
comprising all draws for thedelta
parameter.sigmasquarexidraws
.
Values of coefs
eta
vector containing the mean of all
eta
draws for each observed group.alphavcov
variancecovariance matrix of draws in alphadraws.
betavcov
variancecovariance matrix of draws in betadraws.
alpha
matrix comprising the coefficient estimates of alpha and their standard errors.
beta
matrix comprising the coefficient estimates of beta and their standard errors.
delta
coefficient estimate of delta and its standard error.
sigmasquarexi
variance estimate of the error term
xi
in the outcome equation and its standard error.
Author(s)
Thilo Klein
References
Klein, T. (2015a). Does AntiDiversification Pay? A OneSided Matching Model of Microcredit. Cambridge Working Papers in Economics, #1521.
Zellner, A. (1986). On assessing prior distributions and Bayesian regression analysis with gprior distributions, volume 6, pages 233–243. NorthHolland, Amsterdam.
Examples
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66  ## Not run:
##  SIMULATED EXAMPLE 
## 1. Simulate onesided matching data for 1,000 markets (m=1000) with 2 groups
## per market (gpm=2) and 5 individuals per group (ind=5)
## 1a. Simulate individuallevel, independent variables
idata < stabsim(m=1000, ind=5, seed=123, gpm=2)
head(idata)
## 1b. Simulate grouplevel variables (takes a minute to complete...)
mdata < stabit(x=idata, simulation="NTU", method="model.frame",
selection = list(ieq="wst"),
outcome = list(ieq="wst"))$model.frame
head(mdata$OUT)
head(mdata$SEL)
## 2. Bias from sorting
## 2a. Naive OLS estimation
lm(R ~ wst.ieq, data=mdata$OUT)$coefficients
## 2b. epsilon is correlated with independent variables
with(mdata$OUT, cor(epsilon, wst.ieq))
## 2c. but xi is uncorrelated with independent variables
with(mdata$OUT, cor(xi, wst.ieq))
## 3. Correction of sorting bias when valuations V are observed
## 3a. 1st stage: obtain fitted value for eta
lm.sel < lm(V ~ 1 + wst.ieq, data=mdata$SEL)
lm.sel$coefficients
eta < lm.sel$resid[mdata$SEL$D==1]
## 3b. 2nd stage: control for eta
lm(R ~ wst.ieq + eta, data=mdata$OUT)$coefficients
## 4. Run Gibbs sampler
fit1 < stabit(x=idata, selection = list(ieq="wst"),
outcome = list(ieq="wst"), method="NTU",
simulation="NTU", niter=2000)
## 5. Coefficient table
summary(fit1)
##  REPLICATION, Klein (2015a) 
## 1. Load data
data(baac00); head(baac00)
## 2. Run Gibbs sampler
klein15a < stabit(x=baac00, selection = list(inv="pi",ieq="wst"),
outcome = list(add="pi",inv="pi",ieq="wst",
add=c("loan_size","loan_size2","lngroup_agei")), offsetOut=1,
method="NTU", binary=TRUE, gPrior=TRUE, marketFE=TRUE, niter=800000)
## 3. Marginal effects
summary(klein15a, mfx=TRUE)
## End(Not run)
