# stabit: Matching model and selection correction for group formation In matchingMarkets: Analysis of Stable Matchings

## Description

The function provides a Gibbs sampler for a structural matching model that estimates preferences and corrects for sample selection bias when the selection process is a one-sided matching game; that is, group/coalition formation.

The input is individual-level data of all group members from one-sided matching marktes; that is, from group/coalition formation games.

In a first step, the function generates a model matrix with characteristics of all feasible groups of the same size as the observed groups in the market.

For example, in the stable roommates problem with n=4 students {1,2,3,4} sorting into groups of 2, we have choose(4,2) = 6 feasible groups: (1,2)(3,4) (1,3)(2,4) (1,4)(2,3).

In the group formation problem with n=6 students {1,2,3,4,5,6} sorting into groups of 3, we have choose(6,3) = 20 feasible groups. For the same students sorting into groups of sizes 2 and 4, we have choose(6,2) + choose(6,4) = 30 feasible groups.

The structural model consists of a selection and an outcome equation. The Selection Equation determines which matches are observed (D=1) and which are not (D=0).

D = 1[V in Γ] with V = Wα + η

Here, V is a vector of latent valuations of all feasible matches, ie observed and unobserved, and 1[.] is the Iverson bracket. A match is observed if its match valuation is in the set of valuations Γ that satisfy the equilibrium condition (see Klein, 2015a). This condition differs for matching games with transferable and non-transferable utility and can be specified using the `method` argument. The match valuation V is a linear function of W, a matrix of characteristics for all feasible groups, and η, a vector of random errors. α is a paramter vector to be estimated.

The Outcome Equation determines the outcome for observed matches. The dependent variable can either be continuous or binary, dependent on the value of the `binary` argument. In the binary case, the dependent variable R is determined by a threshold rule for the latent variable Y.

R = 1[Y > c] with Y = Xβ + ε

Here, Y is a linear function of X, a matrix of characteristics for observed matches, and ε, a vector of random errors. β is a paramter vector to be estimated.

The structural model imposes a linear relationship between the error terms of both equations as ε = δη + ξ, where ξ is a vector of random errors and δ is the covariance paramter to be estimated. If δ were zero, the marginal distributions of ε and η would be independent and the selection problem would vanish. That is, the observed outcomes would be a random sample from the population of interest.

## Usage

 ```1 2 3 4 5 6``` ```stabit(x, m.id = "m.id", g.id = "g.id", R = "R", selection = NULL, outcome = NULL, simulation = "none", seed = 123, max.combs = Inf, method = "NTU", binary = FALSE, offsetOut = 0, offsetSel = 0, marketFE = FALSE, censored = 0, gPrior = FALSE, dropOnes = FALSE, interOut = 0, interSel = 0, standardize = 0, niter = 10, verbose = FALSE) ```

## Arguments

 `x` data frame with individual-level characteristics of all group members including market- and group-identifiers. `m.id` character string giving the name of the market identifier variable. Defaults to `"m.id"`. `g.id` character string giving the name of the group identifier variable. Defaults to `"g.id"`. `R` dependent variable in outcome equation. Defaults to `"R"`. `selection` list containing variables and pertaining operators in the selection equation. The format is `operation = "variable"`. See the Details and Examples sections. `outcome` list containing variables and pertaining operators in the outcome equation. The format is `operation = "variable"`. See the Details and Examples sections. `simulation` should the values of dependent variables in selection and outcome equations be simulated? Options are `"none"` for no simulation, `"NTU"` for non-transferable utility matching, `"TU"` for transferable utility or `"random"` for random matching of individuals to groups. Simulation settings are (i) all model coefficients set to `alpha=beta=1`; (ii) covariance between error terms `delta=0.5`; (iii) error terms `eta` and `xi` are draws from a standard normal distribution. `seed` integer setting the state for random number generation if `simulation=TRUE`. `max.combs` integer (divisible by two) giving the maximum number of feasible groups to be used for generating group-level characteristics. `method` estimation method to be used. Either `"NTU"` or `"TU"` for selection correction using non-transferable or transferable utility matching as selection rule; `"outcome"` for estimation of the outcome equation only; or `"model.frame"` for no estimation. `binary` logical: if `TRUE` outcome variable is taken to be binary; if `FALSE` outcome variable is taken to be continuous. `offsetOut` vector of integers indicating the indices of columns in `X` for which coefficients should be forced to 1. Use 0 for none. `offsetSel` vector of integers indicating the indices of columns in `W` for which coefficients should be forced to 1. Use 0 for none. `marketFE` logical: if `TRUE` market-level fixed effects are used in outcome equation; if `FALSE` no market fixed effects are used. `censored` draws of the `delta` parameter that estimates the covariation between the error terms in selection and outcome equation are 0:not censored, 1:censored from below, 2:censored from above. `gPrior` logical: if `TRUE` the g-prior (Zellner, 1986) is used for the variance-covariance matrix. `dropOnes` logical: if `TRUE` one-group-markets are exluded from estimation. `interOut` two-colum matrix indicating the indices of columns in `X` that should be interacted in estimation. Use 0 for none. `interSel` two-colum matrix indicating the indices of columns in `W` that should be interacted in estimation. Use 0 for none. `standardize` numeric: if `standardize>0` the independent variables will be standardized by dividing by `standardize` times their standard deviation. Defaults to no standardization `standardize=0`. `niter` number of iterations to use for the Gibbs sampler. `verbose` .

## Details

Operators for variable transformations in `selection` and `outcome` arguments.

`add`

sum over all group members and divide by group size.

`int`

sum over all possible two-way interactions x*y of group members and divide by the number of those, given by `choose(n,2)`.

`ieq`

sum over all possible two-way equality assertions 1[x=y] and divide by the number of those.

`ive`

sum over all possible two-way interactions of vectors of variables of group members and divide by number of those.

`inv`

...

`dst`

sum over all possible two-way distances between players and divide by number of those, where distance is defined as exp(-|x-y|).

Thilo Klein

## References

Klein, T. (2015a). Does Anti-Diversification Pay? A One-Sided Matching Model of Microcredit. Cambridge Working Papers in Economics, #1521.

Zellner, A. (1986). On assessing prior distributions and Bayesian regression analysis with g-prior distributions, volume 6, pages 233–243. North-Holland, Amsterdam.

## Examples

 ``` 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73``` ```## Not run: ## --- SIMULATED EXAMPLE --- ## 1. Simulate one-sided matching data for 200 markets (m=200) with 2 groups ## per market (gpm=2) and 5 individuals per group (ind=5). True parameters ## in selection equation is wst=1, in outcome equation wst=0. ## 1-a. Simulate individual-level, independent variables idata <- stabsim(m=200, ind=5, seed=123, gpm=2) head(idata) ## 1-b. Simulate group-level variables mdata <- stabit(x=idata, simulation="NTU", method="model.frame", selection = list(add="wst"), outcome = list(add="wst"), verbose=FALSE) head(mdata\$OUT) head(mdata\$SEL) ## 2. Bias from sorting ## 2-a. Naive OLS estimation lm(R ~ wst.add, data=mdata\$OUT)\$coefficients ## 2-b. epsilon is correlated with independent variables with(mdata\$OUT, cor(epsilon, wst.add)) ## 2-c. but xi is uncorrelated with independent variables with(mdata\$OUT, cor(xi, wst.add)) ## 3. Correction of sorting bias when valuations V are observed ## 3-a. 1st stage: obtain fitted value for eta lm.sel <- lm(V ~ -1 + wst.add, data=mdata\$SEL) lm.sel\$coefficients eta <- lm.sel\$resid[mdata\$SEL\$D==1] ## 3-b. 2nd stage: control for eta lm(R ~ wst.add + eta, data=mdata\$OUT)\$coefficients ## 4. Run Gibbs sampler fit1 <- stabit(x=idata, method="NTU", simulation="NTU", censored=1, selection = list(add="wst"), outcome = list(add="wst"), niter=2000, verbose=FALSE) ## 5. Coefficient table summary(fit1) ## 6. Plot MCMC draws for coefficients plot(fit1) ## --- REPLICATION, Klein (2015a) --- ## 1. Load data data(baac00); head(baac00) ## 2. Run Gibbs sampler klein15a <- stabit(x=baac00, selection = list(inv="pi",ieq="wst"), outcome = list(add="pi",inv="pi",ieq="wst", add=c("loan_size","loan_size2","lngroup_agei")), offsetOut=1, method="NTU", binary=TRUE, gPrior=TRUE, marketFE=TRUE, niter=800000) ## 3. Marginal effects summary(klein15a, mfx=TRUE) ## 4. Plot MCMC draws for coefficients plot(klein15a) ## End(Not run) ```

matchingMarkets documentation built on Jan. 11, 2018, 3 p.m.