lcmcross: Latent class stochastic frontier using cross-section data

View source: R/lcmcross.R

lcmcrossR Documentation

Latent class stochastic frontier using cross-section data

Description

lcmcross is a symbolic formula based function for the estimation of the latent class stochastic frontier model (LCM) in the case of cross-sectional or pooled cross-section data. The model is estimated using maximum likelihood (ML). See Orea and Kumbhakar (2004), Parmeter and Kumbhakar (2014, p282).

Only the half-normal distribution is possible for the one-sided error term. Nine optimization algorithms are available.

The function also accounts for heteroscedasticity in both one-sided and two-sided error terms, as in Reifschneider and Stevenson (1991), Caudill and Ford (1993), Caudill et al. (1995) and Hadri (1999).

The model can estimate up to five classes.

Usage

lcmcross(formula, uhet, vhet, thet, logDepVar = TRUE, data, subset, S = 1, 
  udist = "hnormal", start = NULL, lcmClasses = 2, method = "bfgs", hessianType = 1,
  itermax = 2000, printInfo = FALSE, tol = 1e-12, gradtol = 1e-06, stepmax = 0.1, 
  qac = "marquardt", initStart = FALSE, initAlg = "nlminb", initIter = 100,
  initFactorLB = 0.5, initFactorUB = 1.5)

Arguments

formula

A symbolic description of the model to be estimated based on the generic function formula (see section ‘Details’).

uhet

A one-part formula to account for heteroscedasticity in the one-sided error variance (see section ‘Details’).

vhet

A one-part formula to account for heteroscedasticity in the two-sided error variance (see section ‘Details’).

thet

A one-part formula to account for technological heterogeneity in the construction of the classes.

logDepVar

Logical. Informs whether the dependent variable is logged (TRUE) or not (FALSE). Default = TRUE.

data

The data frame containing the data.

subset

An optional vector specifying a subset of observations to be used in the optimization process.

S

If S = 1 (default), a production (profit) frontier is estimated: ε_i = v_i-u_i. If S = -1, a cost frontier is estimated: ε_i = v_i+u_i.

udist

Character string. Distribution specification for the one-sided error term. Only the half normal distribution "hnormal" (Aigner et al., 1977, Meeusen and Vandenbroeck, 1977) is currently implemented.

start

Numeric vector. Optional starting values for the maximum likelihood (ML) estimation.

lcmClasses

Number of classes to be estimated (default = 2). A maximum of five classes can be estimated.

method

Optimization algorithm used for the estimation. Default = "bfgs". 9 algorithms are available:

  • "bfgs", for Broyden-Fletcher-Goldfarb-Shanno (see maxBFGS)

  • "bhhh", for Berndt-Hall-Hall-Hausman (see maxBHHH)

  • "nr", for Newton-Raphson (see maxNR)

  • "nm", for Nelder-Mead (see maxNM)

  • "ucminf", implements a quasi-Newton type with BFGS updating of the inverse Hessian and soft line search with a trust region type monitoring of the input to the line search algorithm (see ucminf)

  • "mla", for general-purpose optimization based on Marquardt-Levenberg algorithm (see mla)

  • "sr1", for Symmetric Rank 1 (see trust.optim)

  • "sparse", for trust regions and sparse Hessian (see trust.optim)

  • "nlminb", for optimization using PORT routines (see nlminb)

hessianType

Integer. If 1 (default), analytic Hessian is returned for all the distributions except "gamma", "lognormal" and "weibull" for which the numeric Hessian is returned. If 2, bhhh Hessian is estimated (g'g). If 3, robust Hessian is computed (H^{-1}GH^{-1}).

itermax

Maximum number of iterations allowed for optimization. Default = 2000.

printInfo

Logical. Print information during optimization. Default = FALSE.

tol

Numeric. Convergence tolerance. Default = 1e-12.

gradtol

Numeric. Convergence tolerance for gradient. Default = 1e-06.

stepmax

Numeric. Step max for ucminf algorithm. Default = 0.1.

qac

Character. Quadratic Approximation Correction for "bhhh" and "nr" algorithms. If "qac = stephalving", the step length is decreased but the direction is kept. If "qac = marquardt" (default), the step length is decreased while also moving closer to the pure gradient direction. See maxBHHH and maxNR.

initStart

Logical. If TRUE, the model is jump-started using an alternative algorithm ("nlminb") within certain bounds. Default = FALSE.

initAlg

Character. Algorithm used to jump-start the latent class model. Only "nlminb" is currently available.

initIter

Maximum number of iterations for the algorihtm when initStart = TRUE. Default = 100.

initFactorLB

A numeric value indicating by which factor the starting value should be multiplied to define the lower bounds for the jump-start algorithm. Default = 0.5.

initFactorUB

A numeric value indicating by which factor the starting value should be multiplied to define the upper bounds for the jump-start algorithm. Default = 1.5.

Details

LCM is an estimation of a finite mixture of production functions:

y_i = α_j + x'_iβ_j + v_{i|j} - Su_{i|j}

ε_{i|j} = v_{i|j} -Su_{i|j}

where i is the observation, j is the class, y is the output (cost, revenue, profit), x is the vector of main explanatory variables (inputs and other control variables), u is the one-sided error term with variance σ_{u}^2, and v is the two-sided error term with variance σ_{v}^2.

S = 1 in the case of production (profit) frontier function and S = -1 in the case of cost frontier function.

The contribution of observation i to the likelihood conditional on class j is defined as:

P(i|j) = \frac{2}{√{σ_{u|j}^2 + σ_{v|j}^2}} φ≤ft(\frac{Sε_{i|j}}{√{σ_{u|j}^2 + σ_{v|j}^2}}\right) Φ≤ft(\frac{μ_{i*|j}}{σ_{*|j}}\right)

where

μ_{i*|j}=\frac{- Sε_{i|j}σ_{u|j}^2}{σ_{u|j}^2 + σ_{v|j}^2}

and

σ_*^2 = \frac{σ_{u|j}^2 σ_{v|j}^2}{σ_{u|j}^2 + σ_{v|j}^2}

The prior probability of using a particular technology can depend on some covariates (namely the variables separating the observations into classes) using a logit specification:

π(i,j) = \frac{\exp{(θ_j'Z_h)}}{∑_{m=1}^{J}\exp{(θ_m'Z_h)}}

with Z_h the covariates, θ the coefficients estimated for the covariates, and \exp(θ_J'Z_h)=1.

The unconditional likelihood of observation i is simply the average over the J classes:

P(i) = ∑_{m=1}^{J}π(i,m)P(i|m)

The number of classes can be retained based on information criterion (see for instance ic).

Class assignment is based on the largest posterior probability. This probability is obtained using Bayes' rule, as follows for class j:

w≤ft(j|i\right)=\frac{P≤ft(i|j\right)π≤ft(i, j\right)}{∑_{m=1}^JP≤ft(i|m\right)π≤ft(i, m\right)}

To accommodate heteroscedasticity in the variance parameters of the error terms, a single part (right) formula can also be specified. To impose the positivity on these parameters, the variances are modelled respectively as: σ^2_{u|j} = \exp{(δ_j'Z_u)} and σ^2_{v|j} = \exp{(φ_j'Z_v)}, where Z_u and Z_v are the heteroscedasticity variables (inefficiency drivers in the case of Z_u) and δ and φ the coefficients. In the case of heterogeneity in the truncated mean μ, it is modelled as μ=ω'Z_{μ}.

Value

lcmcross returns a list of class 'lcmcross' containing the following elements:

call

The matched call.

formula

Multi parts formula describing the estimated model.

S

The argument 'S'. See the section ‘Arguments’.

typeSfa

Character string. "Latent Class Production/Profit Frontier, e = v - u" when S = 1 and "Latent Class Cost Frontier, e = v + u" when S = -1.

Nobs

Number of observations used for optimization.

nXvar

Number of main explanatory variables.

nZHvar

Number of variables in the logit specification of the finite mixture model (i.e. number of covariates).

logDepVar

The argument 'logDepVar'. See the section ‘Arguments’.

nuZUvar

Number of variables explaining heteroscedasticity in the one-sided error term.

nvZVvar

Number of variables explaining heteroscedasticity in the two-sided error term.

nParm

Total number of parameters estimated.

udist

The argument 'udist'. See the section ‘Arguments’.

startVal

Numeric vector. Starting value for ML estimation.

dataTable

A data frame (tibble format) containing information on data used for optimization along with residuals and fitted values of the OLS and ML estimations, and the individual observation log-likelihood.

InitHalf

When start = NULL. Initial ML estimation with half normal distribution for the one-sided error term. Model to construct the starting values for the latent class estimation. Object of class 'maxLik' and 'maxim' returned.

optType

The optimization algorithm used.

nIter

Number of iterations of the ML estimation.

optStatus

An optimization algorithm termination message.

startLoglik

Log-likelihood at the starting values.

nClasses

The number of classes estimated.

mlLoglik

Log-likelihood value of the ML estimation.

mlParam

Numeric vector. Parameters obtained from ML estimation.

gradient

Numeric vector. Each variable gradient of the ML estimation.

gradL_OBS

Matrix. Each variable individual observation gradient of the ML estimation.

gradientNorm

Numeric. Gradient norm of the ML estimation.

invHessian

The covariance matrix of the parameters obtained from the ML estimation.

hessianType

The argument 'hessianType'. See the section ‘Arguments’.

mlDate

Date and time of the estimated model.

Note

In the case of panel data, lcmcross estimates a pooled cross-section where the probability of belonging to a class a priori is not permanent (not fixed over time).

Author(s)

K Hervé Dakpo, Yann Desjeux and Laure Latruffe

References

Aigner, D., Lovell, C. A. K., and P. Schmidt. 1977. Formulation and estimation of stochastic frontier production function models. Journal of Econometrics, 6(1), 21–37.

Caudill, S. B., and J. M. Ford. 1993. Biases in frontier estimation due to heteroscedasticity. Economics Letters, 41(1), 17–20.

Caudill, S. B., Ford, J. M., and D. M. Gropper. 1995. Frontier estimation and firm-specific inefficiency measures in the presence of heteroscedasticity. Journal of Business & Economic Statistics, 13(1), 105–111.

Hadri, K. 1999. Estimation of a doubly heteroscedastic stochastic frontier cost function. Journal of Business & Economic Statistics, 17(3), 359–363.

Meeusen, W., and J. Vandenbroeck. 1977. Efficiency estimation from Cobb-Douglas production functions with composed error. International Economic Review, 18(2), 435–445.

Orea, L., and S.C. Kumbhakar. 2004. Efficiency measurement using a latent class stochastic frontier model. Empirical Economics, 29, 169–183.

Parmeter, C.F., and S.C. Kumbhakar. 2014. Efficiency analysis: A primer on recent advances. Foundations and Trends in Econometrics, 7, 191–385.

Reifschneider, D., and R. Stevenson. 1991. Systematic departures from the frontier: A framework for the analysis of firm inefficiency. International Economic Review, 32(3), 715–723.

See Also

summary for creating and printing summary results.

coef for extracting coefficients of the estimation.

efficiencies for computing (in-)efficiency estimates.

fitted for extracting the fitted frontier values.

ic for extracting information criteria.

logLik for extracting log-likelihood value(s) of the estimation.

marginal for computing marginal effects of inefficiency drivers.

residuals for extracting residuals of the estimation.

vcov for computing the variance-covariance matrix of the coefficients.

Examples

## Using data on eighty-two countries production (DGP)
# LCM Cobb Douglas (production function) half normal distribution
# Intercept and initStat used as separating variables
cb_2c_h1 <- lcmcross(formula = ly ~ lk + ll + yr, thet = ~initStat, data = worldprod)
  summary(cb_2c_h1)

# summary of the initial ML model
  summary(cb_2c_h1$InitHalf)

# same result by jump-starting the estimation
cb_2c_h2 <- lcmcross(formula = ly ~ lk + ll + yr, data = worldprod, initStart = TRUE)
  summary(cb_2c_h2)

# Only the intercept is used as the separating variable and only variable 
# initStat is used as inefficiency driver
cb_2c_h3 <- lcmcross(formula = ly ~ lk + ll + yr, uhet = ~initStat, data = worldprod)
  summary(cb_2c_h3)

sfaR documentation built on May 3, 2022, 3 p.m.