OOI: Outside Option Index
In OOI: Outside Option Index

Description Usage Arguments Value Examples

View source: R/OOI.R

calculates the 'outside option index' (defined as -∑ P(Z|X) * log(P(Z|X) / P(Z))) for workers, using employer-employee data.

OOI(
  formula = NULL,
  X,
  Z = NULL,
  X.location = NULL,
  Z.location = NULL,
  wgt = rep(1, nrow(X)),
  pred = TRUE,
  method = "logit",
  sim.factor = 1,
  dist.fun = geo_dist,
  dist.order = NULL,
  seed = runif(1, 0, .Machine$integer.max)
)

`formula`	a formula describing the model to be fitted in order to estimate P(Z\|X) / P(Z). This formula uses a syntax similar to STATA, and so "x_" refers to all variables with the prefix "x", while "z_" refers to all variables with the prefix "z". Similarly, "d" refers to the distance polynomial (see the example below).
`X`	matrix or data frame with workers characteristics. Note that all column names should start with "x" (necessary for the inner function 'coef_reshape').
`Z`	an optional matrix or data frame with jobs characteristics. Note that all column names should start with "z" (necessary for the inner function 'coef_reshape').
`X.location`	an optional matrix or data frame with location for workers. Could be geographical location (i.e., geo-coordinates) or any other feature that can be used in order to measure distance between worker and job using 'dist.fun'. Currently the package supports only numeric inputs.
`Z.location`	same as 'X.location' but for jobs.
`wgt`	an optional numeric vector of weights.
`pred`	logical. If TRUE (default), predicts the ooi for the provided data.
`method`	a method for estimating P(Z\|X) / P(Z). Currently not in use.
`sim.factor`	a variable that determines how much fake data to simulate (relative to real data).
`dist.fun`	a distance function to calculate the distance between X.location and Z.location. Users interested in using more than one distance metric should provide a function that returns for each row of X.location and Z.location a vector with all the necessary metrics. Also - the function should use columns by their index and not by their names. The default function is `geo_dist`, which is suitable for data with geo-coordinates.
`dist.order`	a numeric vector specifying for each distance metric an order of the distance polynomial.
`seed`	the seed of the random number generator.

An "ooi" object. This object is a list containing the following components:

`coeffs`	coefficients from the estimated logit.
`coeffs_sd`	coefficients SE.
`pseudo_r2`	McFadden's pseudo-R squared for the estimated logit.
`standardized_coeffs`	standardized coefficients.
`ooi`	the Outside Option Index.
`hhi`	the Herfindahl-Hirschman Index, an alternative measure for outside options.
`job_worker_prob`	the log probability of each worker to work at his specific job (rahter than to work at a job with his specific z)
`orig_arg`	a list containing the original arguments (necessary for `predict.ooi`).

#generate data
#worker and job characteristics:
n <- 100
men <- rbinom(n, 1, 0.5)
size <- 1 + rgeom(n, 0.1)
size[men == 0] <- size[men == 0] + 2
worker_resid <- data.frame(r = round(runif(n, 0, 20), 1))
job_location <- data.frame(l = round(runif(n, 20, 40), 1))
#prepare data
#define distance function:
dist_metric <- function(x, y){abs(y - x)}
X <- data.frame(men = men)
Z <- data.frame(size = size)
#add "x" / "z" to column names:
X <- add_prefix(X, "x.")
Z <- add_prefix(Z, "z.")
#estimate P(Z|X) / P(Z) and calculate the ooi:
ooi_object <- OOI(formula = ~ x_*z_ + x_*d + z_*d, X = X, Z = Z,
                  X.location = worker_resid, Z.location = job_location,
                  sim.factor = 3, dist.fun = dist_metric, dist.order = 3)
#we can extract the ooi using predict():
ooi <- predict(ooi_object)
summary(ooi)