# Structural Matching Model to correct for sample selection bias in two-sided matching markets

### Description

The function provides a Gibbs sampler for a structural matching model that corrects for sample selection bias when the selection process is a two-sided matching game; i.e., a matching of students to colleges.

The structural model consists of a selection and an outcome equation. The *Selection Equation*
determines which matches are observed (*D=1*) and which are not (*D=0*).

* D = 1[V in Γ] with V = Wβ + η
*

Here, *V* is a vector of latent valuations of *all feasible* matches, ie observed and
unobserved, and *1[.]* is the Iverson bracket.
A match is observed if its match valuation is in the set of valuations *Γ*
that satisfy the equilibrium condition (see Sorensen, 2007).
The match valuation *V* is a linear function of *W*, a matrix of characteristics for
*all feasible* matches, and *η*, a vector of random errors. *β* is a paramter
vector to be estimated.

The *Outcome Equation* determines the outcome for *observed* matches. The dependent
variable can either be continuous or binary, dependent on the value of the `binary`

argument. In the binary case, the dependent variable *R* is determined by a threshold
rule for the latent variable *Y*.

* R = 1[Y > c] with Y = Xα + ε
*

Here, *Y* is a linear function of *X*, a matrix of characteristics for *observed*
matches, and *ε*, a vector of random errors. *α* is a paramter vector to
be estimated.

The structural model imposes a linear relationship between the error terms of both equations
as *ε = κη + ν*, where *ν* is a vector of random errors and *κ*
is the covariance paramter to be estimated. If *κ* were zero, the marginal distributions
of *ε* and *η* would be independent and the selection problem would vanish.
That is, the observed outcomes would be a random sample from the population of interest.

### Usage

1 2 3 |

### Arguments

`OUT` |
data frame with characteristics of all observed matches, including
market identifier |

`SEL` |
optional: data frame with characteristics of all observed and unobserved matches, including
market identifier |

`colleges` |
character vector of variable names for college characteristics. These variables carry the same value for any college. |

`students` |
character vector of variable names for student characteristics. These variables carry the same value for any student. |

`outcome` |
formula for match outcomes. |

`selection` |
formula for match valuations. |

`binary` |
logical: if |

`niter` |
number of iterations to use for the Gibbs sampler. |

`gPrior` |
logical: if |

`censored` |
draws of the |

`thin` |
integer indicating the level of thinning in the MCMC draws. The default |

`...` |
. |

### Author(s)

Thilo Klein

### References

Sorensen, M. (2007). How Smart is Smart Money? A Two-Sided Matching Model of Venture Capital.
*Journal of Finance*, 62 (6): 2725-2762.

### Examples

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 | ```
## --- SIMULATED EXAMPLE ---
## Not run:
## 1. Simulate two-sided matching data for 20 markets (m=20) with 100 students
## (nStudents=100) per market and 20 colleges with quotas of 5 students, each
## (nSlots=rep(5,20)). True parameters in selection and outcome equations are
## all equal to 1.
xdata <- stabsim2(m=20, nStudents=100, nSlots=rep(5,20),
colleges = "c1",
students = "s1",
outcome = ~ c1:s1 + eta + nu,
selection = ~ -1 + c1:s1 + eta
)
head(xdata$OUT)
## 2. Correction for sorting bias when match valuations V are observed
## 2-a. Bias from sorting
lm1 <- lm(y ~ c1:s1, data=xdata$OUT)
summary(lm1)
## 2-b. Cause of the bias
with(xdata$OUT, cor(c1*s1, eta))
## 2-c. Correction for sorting bias
lm2a <- lm(V ~ -1 + c1:s1, data=xdata$SEL); summary(lm2a)
etahat <- lm2a$residuals[xdata$SEL$D==1]
lm2b <- lm(y ~ c1:s1 + etahat, data=xdata$OUT)
summary(lm2b)
## 3. Correction for sorting bias when match valuations V are unobserved
## 3-a. Run Gibbs sampler (when SEL is given)
fit2 <- stabit2(OUT = xdata$OUT,
SEL = xdata$SEL,
outcome = y ~ c1:s1,
selection = ~ -1 + c1:s1,
niter=1000
)
## 3-b. Alternatively: Run Gibbs sampler (when SEL is not given)
fit2 <- stabit2(OUT = xdata$OUT,
colleges = "c1",
students = "s1",
outcome = y ~ c1:s1,
selection = ~ -1 + c1:s1,
niter=1000
)
## 4. Implemented methods
## 4-a. Get coefficients
fit2
## 4-b. Coefficient table
summary(fit2)
## 4-c. Get marginal effects
summary(fit2, mfx=TRUE)
## 4-d. Also try the following functions
coef(fit2)
fitted(fit2)
residuals(fit2)
predict(fit, newdata=NULL)
## 5. Plot MCMC draws for coefficients in outcome equation
res <- as.data.frame(t(fit2$draws$alphadraws))
res$iteration <- 1:nrow(res)
library(tidyr)
res.long <- gather(res, condition, measurement, 1:(ncol(res)-1))
library(lattice)
lattice.options(default.args=list(as.table=TRUE),
default.theme=standard.theme(color=FALSE))
xyplot(measurement ~ iteration | factor(condition),
data = res.long, scales=list(relation="free"),
xlab = "iterations",
ylab = "paramter draws", type = "l")
## End(Not run)
``` |

Want to suggest features or report bugs for rdrr.io? Use the GitHub issue tracker. Vote for new features on Trello.