Structural Matching Model to correct for sample selection bias in one-sided matching markets

Description

The function provides a Gibbs sampler for a structural matching model that corrects for sample selection bias when the selection process is a one-sided matching game; that is, group/coalition formation.

The input is individual-level data of all group members from one-sided matching marktes; that is, from group/coalition formation games.

In a first step, the function generates a model matrix with characteristics of all feasible groups of the same size as the observed groups in the market.

For example, in the stable roommates problem with n=4 students {1,2,3,4} sorting into groups of 2, we have "4 choose 2" = 6 feasible groups: (1,2)(3,4) (1,3)(2,4) (1,4)(2,3).

In the group formation problem with n=6 students {1,2,3,4,5,6} sorting into groups of 3, we have "6 choose 3" = 20 feasible groups. For the same students sorting into groups of sizes 2 and 4, we have "6 choose 2" + "6 choose 4" = 30 feasible groups.

The structural model consists of a selection and an outcome equation. The Selection Equation determines which matches are observed (D=1) and which are not (D=0).

D = 1[V in Γ] with V = Wα + η

Here, V is a vector of latent valuations of all feasible matches, ie observed and unobserved, and 1[.] is the Iverson bracket. A match is observed if its match valuation is in the set of valuations Γ that satisfy the equilibrium condition (see Klein, 2015a). This condition differs for matching games with transferable and non-transferable utility and can be specified using the method argument. The match valuation V is a linear function of W, a matrix of characteristics for all feasible groups, and η, a vector of random errors. α is a paramter vector to be estimated.

The Outcome Equation determines the outcome for observed matches. The dependent variable can either be continuous or binary, dependent on the value of the binary argument. In the binary case, the dependent variable R is determined by a threshold rule for the latent variable Y.

R = 1[Y > c] with Y = Xβ + ε

Here, Y is a linear function of X, a matrix of characteristics for observed matches, and ε, a vector of random errors. β is a paramter vector to be estimated.

The structural model imposes a linear relationship between the error terms of both equations as ε = δη + ξ, where ξ is a vector of random errors and δ is the covariance paramter to be estimated. If δ were zero, the marginal distributions of ε and η would be independent and the selection problem would vanish. That is, the observed outcomes would be a random sample from the population of interest.

Usage

1
2
3
4
5
stabit(x, m.id = "m.id", g.id = "g.id", R = "R", selection = NULL,
  outcome = NULL, simulation = "none", seed = 123, max.combs = Inf,
  method = "NTU", binary = FALSE, offsetOut = 0, offsetSel = 0,
  marketFE = FALSE, censored = 0, gPrior = FALSE, dropOnes = FALSE,
  interOut = 0, interSel = 0, standardize = 0, niter = 10)

Arguments

x

data frame with individual-level characteristics of all group members including market- and group-identifiers.

m.id

character string giving the name of the market identifier variable. Defaults to "m.id".

g.id

character string giving the name of the group identifier variable. Defaults to "g.id".

R

dependent variable in outcome equation. Defaults to "R".

selection

list containing variables and pertaining operators in the selection equation. The format is operation = "variable". See the Details and Examples sections.

outcome

list containing variables and pertaining operators in the outcome equation. The format is operation = "variable". See the Details and Examples sections.

simulation

should the values of dependent variables in selection and outcome equations be simulated? Options are "none" for no simulation, "NTU" for non-transferable utility matching, "TU" for transferable utility or "random" for random matching of individuals to groups. Simulation settings are (i) all model coefficients set to alpha=beta=1; (ii) covariance between error terms delta=0.5; (iii) error terms eta and xi are draws from a standard normal distribution.

seed

integer setting the state for random number generation if simulation=TRUE.

max.combs

integer (divisible by two) giving the maximum number of feasible groups to be used for generating group-level characteristics.

method

estimation method to be used. Either "NTU" or "TU" for selection correction using non-transferable or transferable utility matching as selection rule; "outcome" for estimation of the outcome equation only; or "model.frame" for no estimation.

binary

logical: if TRUE outcome variable is taken to be binary; if FALSE outcome variable is taken to be continuous.

offsetOut

vector of integers indicating the indices of columns in X for which coefficients should be forced to 1. Use 0 for none.

offsetSel

vector of integers indicating the indices of columns in W for which coefficients should be forced to 1. Use 0 for none.

marketFE

logical: if TRUE market-level fixed effects are used in outcome equation; if FALSE no market fixed effects are used.

censored

draws of the delta parameter that estimates the covariation between the error terms in selection and outcome equation are 0:not censored, 1:censored from below, 2:censored from above.

gPrior

logical: if TRUE the g-prior (Zellner, 1986) is used for the variance-covariance matrix.

dropOnes

logical: if TRUE one-group-markets are exluded from estimation.

interOut

two-colum matrix indicating the indices of columns in X that should be interacted in estimation. Use 0 for none.

interSel

two-colum matrix indicating the indices of columns in W that should be interacted in estimation. Use 0 for none.

standardize

numeric: if standardize>0 the independent variables will be standardized by dividing by standardize times their standard deviation. Defaults to no standardization standardize=0.

niter

number of iterations to use for the Gibbs sampler.

Details

Operators for variable transformations in selection and outcome arguments.

add

sum over all group members and divide by group size.

int

sum over all possible two-way interactions x*y of group members and divide by the number of those, given by choose(n,2).

ieq

sum over all possible two-way equality assertions 1[x=y] and divide by the number of those.

ive

sum over all possible two-way interactions of vectors of variables of group members and divide by number of those.

inv

...

dst

sum over all possible two-way distances between players and divide by number of those, where distance is defined as exp(-|x-y|).

Values of model.list

D

vector that indicates – for all feasible groups in the market – whether a group is observed in the data D=1 or not D=0.

R

list of group-level outcome vectors for equilibrium groups.

W

list with data.frame W[[t]][G,] containing characteristics of group G in market t (all feasible groups).

X

list with data.frame X[[t]][G,] containing characteristics of group G in market t (equilibrium groups only).

V

vector of group valuations for all feasible groups in the market.

P

vector that gives for each group the index of the group comprised of residual individuals in the market (for 2-group markets).

epsilon

if simulation!="none", the errors in the outcome equation, given by delta*eta + xi.

eta

if simulation!="none", the standard normally distributed errors in the selection equation.

xi

if simulation!="none", the standard normally distributed component of the errors in the selection equation that is independent of eta.

combs

partitions matrix that gives all feasible partitions of the market into groups of the observed sizes.

E

matrix that gives the indices of equilibrium group members for each group in the market. Only differs from the first two rows in combs if simulation!="none".

sigmasquareximean

variance estimate of the error term xi in the outcome equation.

Values of model.frame

SEL

data frame comprising variables in selection equation and number of observations equal to the number of feasible groups.

OUT

data frame comprising variables in outcome equation and number of observations equal to the number of equilibrium groups.

Values of draws

alphadraws

matrix of dimension ncol(W) x niter comprising all paramter draws for the selection equation.

betadraws

matrix of dimension ncol(X) x niter comprising all paramter draws for the outcome equation.

deltadraws

vector of length niter comprising all draws for the delta parameter.

sigmasquarexidraws

.

Values of coefs

eta

vector containing the mean of all eta draws for each observed group.

alphavcov

variance-covariance matrix of draws in alphadraws.

betavcov

variance-covariance matrix of draws in betadraws.

alpha

matrix comprising the coefficient estimates of alpha and their standard errors.

beta

matrix comprising the coefficient estimates of beta and their standard errors.

delta

coefficient estimate of delta and its standard error.

sigmasquarexi

variance estimate of the error term xi in the outcome equation and its standard error.

Author(s)

Thilo Klein

References

Klein, T. (2015a). Does Anti-Diversification Pay? A One-Sided Matching Model of Microcredit. Cambridge Working Papers in Economics, #1521.

Zellner, A. (1986). On assessing prior distributions and Bayesian regression analysis with g-prior distributions, volume 6, pages 233–243. North-Holland, Amsterdam.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
## Not run: 
## --- SIMULATED EXAMPLE ---

## 1. Simulate one-sided matching data for 1,000 markets (m=1000) with 2 groups
##    per market (gpm=2) and 5 individuals per group (ind=5)

## 1-a. Simulate individual-level, independent variables
 idata <- stabsim(m=1000, ind=5, seed=123, gpm=2)
 head(idata)
 
## 1-b. Simulate group-level variables (takes a minute to complete...)
 mdata <- stabit(x=idata, simulation="NTU", method="model.frame",
                 selection = list(ieq="wst"),
                 outcome = list(ieq="wst"))$model.frame
 head(mdata$OUT)
 head(mdata$SEL)


## 2. Bias from sorting

## 2-a. Naive OLS estimation
 lm(R ~ wst.ieq, data=mdata$OUT)$coefficients

## 2-b. epsilon is correlated with independent variables
 with(mdata$OUT, cor(epsilon, wst.ieq))
 
## 2-c. but xi is uncorrelated with independent variables
 with(mdata$OUT, cor(xi, wst.ieq))

## 3. Correction of sorting bias when valuations V are observed

## 3-a. 1st stage: obtain fitted value for eta
lm.sel <- lm(V ~ -1 + wst.ieq, data=mdata$SEL)
lm.sel$coefficients

eta <- lm.sel$resid[mdata$SEL$D==1]

## 3-b. 2nd stage: control for eta
 lm(R ~ wst.ieq + eta, data=mdata$OUT)$coefficients


## 4. Run Gibbs sampler
 fit1 <- stabit(x=idata, selection = list(ieq="wst"), 
        outcome = list(ieq="wst"), method="NTU", 
        simulation="NTU", niter=2000)


## 5. Coefficient table
 summary(fit1)


## --- REPLICATION, Klein (2015a) ---

## 1. Load data 
 data(baac00); head(baac00)
 
## 2. Run Gibbs sampler
 klein15a <- stabit(x=baac00, selection = list(inv="pi",ieq="wst"), 
        outcome = list(add="pi",inv="pi",ieq="wst",
        add=c("loan_size","loan_size2","lngroup_agei")), offsetOut=1,
        method="NTU", binary=TRUE, gPrior=TRUE, marketFE=TRUE, niter=800000)

## 3. Marginal effects
 summary(klein15a, mfx=TRUE)

## End(Not run)