sfacross: Stochastic frontier estimation using cross-sectional data

View source: R/sfacross.R

sfacrossR Documentation

Stochastic frontier estimation using cross-sectional data

Description

sfacross is a symbolic formula-based function for the estimation of stochastic frontier models in the case of cross-sectional or pooled cross-sectional data, using maximum (simulated) likelihood - M(S)L.

The function accounts for heteroscedasticity in both one-sided and two-sided error terms as in Reifschneider and Stevenson (1991), Caudill and Ford (1993), Caudill et al. (1995) and Hadri (1999), but also heterogeneity in the mean of the pre-truncated distribution as in Kumbhakar et al. (1991), Huang and Liu (1994) and Battese and Coelli (1995).

Ten distributions are possible for the one-sided error term and eleven optimization algorithms are available.

The truncated normal - normal distribution with scaling property as in Wang and Schmidt (2002) is also implemented.

Usage

sfacross(
  formula,
  muhet,
  uhet,
  vhet,
  logDepVar = TRUE,
  data,
  subset,
  weights,
  wscale = TRUE,
  S = 1L,
  udist = "hnormal",
  scaling = FALSE,
  start = NULL,
  method = "bfgs",
  hessianType = 1L,
  simType = "halton",
  Nsim = 100,
  prime = 2L,
  burn = 10,
  antithetics = FALSE,
  seed = 12345,
  itermax = 2000,
  printInfo = FALSE,
  tol = 1e-12,
  gradtol = 1e-06,
  stepmax = 0.1,
  qac = "marquardt"
)

## S3 method for class 'sfacross'
print(x, ...)

## S3 method for class 'sfacross'
bread(x, ...)

## S3 method for class 'sfacross'
estfun(x, ...)

Arguments

formula

A symbolic description of the model to be estimated based on the generic function formula (see section ‘Details’).

muhet

A one-part formula to consider heterogeneity in the mean of the pre-truncated distribution (see section ‘Details’).

uhet

A one-part formula to consider heteroscedasticity in the one-sided error variance (see section ‘Details’).

vhet

A one-part formula to consider heteroscedasticity in the two-sided error variance (see section ‘Details’).

logDepVar

Logical. Informs whether the dependent variable is logged (TRUE) or not (FALSE). Default = TRUE.

data

The data frame containing the data.

subset

An optional vector specifying a subset of observations to be used in the optimization process.

weights

An optional vector of weights to be used for weighted log-likelihood. Should be NULL or numeric vector with positive values. When NULL, a numeric vector of 1 is used.

wscale

Logical. When weights is not NULL, a scaling transformation is used such that the weights sum to the sample size. Default TRUE. When FALSE no scaling is used.

S

If S = 1 (default), a production (profit) frontier is estimated: \epsilon_i = v_i-u_i. If S = -1, a cost frontier is estimated: \epsilon_i = v_i+u_i.

udist

Character string. Default = 'hnormal'. Distribution specification for the one-sided error term. 10 different distributions are available:

  • 'hnormal', for the half normal distribution (Aigner et al. 1977, Meeusen and Vandenbroeck 1977)

  • 'exponential', for the exponential distribution

  • 'tnormal' for the truncated normal distribution (Stevenson 1980)

  • 'rayleigh', for the Rayleigh distribution (Hajargasht 2015)

  • 'uniform', for the uniform distribution (Li 1996, Nguyen 2010)

  • 'gamma', for the Gamma distribution (Greene 2003)

  • 'lognormal', for the log normal distribution (Migon and Medici 2001, Wang and Ye 2020)

  • 'weibull', for the Weibull distribution (Tsionas 2007)

  • 'genexponential', for the generalized exponential distribution (Papadopoulos 2020)

  • 'tslaplace', for the truncated skewed Laplace distribution (Wang 2012).

scaling

Logical. Only when udist = 'tnormal' and scaling = TRUE, the scaling property model (Wang and Schmidt 2002) is estimated. Default = FALSE. (see section ‘Details’).

start

Numeric vector. Optional starting values for the maximum likelihood (ML) estimation.

method

Optimization algorithm used for the estimation. Default = 'bfgs'. 11 algorithms are available:

  • 'bfgs', for Broyden-Fletcher-Goldfarb-Shanno (see maxBFGS)

  • 'bhhh', for Berndt-Hall-Hall-Hausman (see maxBHHH)

  • 'nr', for Newton-Raphson (see maxNR)

  • 'nm', for Nelder-Mead (see maxNM)

  • 'cg', for Conjugate Gradient (see maxCG)

  • 'sann', for Simulated Annealing (see maxSANN)

  • 'ucminf', for a quasi-Newton type optimisation with BFGS updating of the inverse Hessian and soft line search with a trust region type monitoring of the input to the line search algorithm (see ucminf)

  • 'mla', for general-purpose optimization based on Marquardt-Levenberg algorithm (see mla)

  • 'sr1', for Symmetric Rank 1 (see trust.optim)

  • 'sparse', for trust regions and sparse Hessian (see trust.optim)

  • 'nlminb', for optimization using PORT routines (see nlminb)

hessianType

Integer. If 1 (Default), analytic Hessian is returned for all the distributions. If 2, bhhh Hessian is estimated (g'g).

simType

Character string. If simType = 'halton' (Default), Halton draws are used for maximum simulated likelihood (MSL). If simType = 'ghalton', Generalized-Halton draws are used for MSL. If simType = 'sobol', Sobol draws are used for MSL. If simType = 'uniform', uniform draws are used for MSL. (see section ‘Details’).

Nsim

Number of draws for MSL. Default 100.

prime

Prime number considered for Halton and Generalized-Halton draws. Default = 2.

burn

Number of the first observations discarded in the case of Halton draws. Default = 10.

antithetics

Logical. Default = FALSE. If TRUE, antithetics counterpart of the uniform draws is computed. (see section ‘Details’).

seed

Numeric. Seed for the random draws.

itermax

Maximum number of iterations allowed for optimization. Default = 2000.

printInfo

Logical. Print information during optimization. Default = FALSE.

tol

Numeric. Convergence tolerance. Default = 1e-12.

gradtol

Numeric. Convergence tolerance for gradient. Default = 1e-06.

stepmax

Numeric. Step max for ucminf algorithm. Default = 0.1.

qac

Character. Quadratic Approximation Correction for 'bhhh' and 'nr' algorithms. If 'stephalving', the step length is decreased but the direction is kept. If 'marquardt' (default), the step length is decreased while also moving closer to the pure gradient direction. See maxBHHH and maxNR.

x

an object of class sfacross (returned by the function sfacross).

...

additional arguments of frontier are passed to sfacross; additional arguments of the print, bread, estfun, nobs methods are currently ignored.

Details

The stochastic frontier model for the cross-sectional data is defined as:

y_i = \alpha + \mathbf{x_i^{\prime}}\bm{\beta} + v_i - Su_i

with

\epsilon_i = v_i -Su_i

where i is the observation, y is the output (cost, revenue, profit), \mathbf{x} is the vector of main explanatory variables (inputs and other control variables), u is the one-sided error term with variance \sigma_{u}^2, and v is the two-sided error term with variance \sigma_{v}^2.

S = 1 in the case of production (profit) frontier function and S = -1 in the case of cost frontier function.

The model is estimated using maximum likelihood (ML) for most distributions except the Gamma, Weibull and log-normal distributions for which maximum simulated likelihood (MSL) is used. For this latter, several draws can be implemented namely Halton, Generalized Halton, Sobol and uniform. In the case of uniform draws, antithetics can also be computed: first Nsim/2 draws are obtained, then the Nsim/2 other draws are obtained as counterpart of one (1-draw).

To account for heteroscedasticity in the variance parameters of the error terms, a single part (right) formula can also be specified. To impose the positivity to these parameters, the variances are modelled as: \sigma^2_u = \exp{(\bm{\delta}'\mathbf{Z}_u)} or \sigma^2_v = \exp{(\bm{\phi}'\mathbf{Z}_v)}, where \mathbf{Z}_u and \mathbf{Z}_v are the heteroscedasticity variables (inefficiency drivers in the case of \mathbf{Z}_u) and \bm{\delta} and \bm{\phi} the coefficients. In the case of heterogeneity in the truncated mean \mu, it is modelled as \mu=\bm{\omega}'\mathbf{Z}_{\mu}. The scaling property can be applied for the truncated normal distribution: u \sim h(\mathbf{Z}_u, \delta)u where u follows a truncated normal distribution N^+(\tau, \exp{(cu)}).

In the case of the truncated normal distribution, the convolution of u_i and v_i is:

f(\epsilon_i)=\frac{1}{\sqrt{\sigma_u^2 + \sigma_v^2}}\phi\left(\frac{S\epsilon_i + \mu}{\sqrt{ \sigma_u^2 + \sigma_v^2}}\right)\Phi\left(\frac{ \mu_{i*}}{\sigma_*}\right)\Big/\Phi\left(\frac{ \mu}{\sigma_u}\right)

where

\mu_{i*}=\frac{\mu\\\sigma_v^2 - S\epsilon_i\sigma_u^2}{\sigma_u^2 + \sigma_v^2}

and

\sigma_*^2 = \frac{\sigma_u^2 \sigma_v^2}{\sigma_u^2 + \sigma_v^2}

In the case of the half normal distribution the convolution is obtained by setting \mu=0.

sfacross allows for the maximization of weighted log-likelihood. When option weights is specified and wscale = TRUE, the weights are scaled as:

new_{weights} = sample_{size} \times \frac{old_{weights}}{\sum(old_{weights})}

For complex problems, non-gradient methods (e.g. nm or sann) can be used to warm start the optimization and zoom in the neighborhood of the solution. Then a gradient-based methods is recommended in the second step. In the case of sann, we recommend to significantly increase the iteration limit (e.g. itermax = 20000). The Conjugate Gradient (cg) can also be used in the first stage.

A set of extractor functions for fitted model objects is available for objects of class 'sfacross' including methods to the generic functions print, summary, coef, fitted, logLik, residuals, vcov, efficiencies, ic, marginal, skewnessTest, estfun and bread (from the sandwich package), lmtest::coeftest() (from the lmtest package).

Value

sfacross returns a list of class 'sfacross' containing the following elements:

call

The matched call.

formula

The estimated model.

S

The argument 'S'. See the section ‘Arguments’.

typeSfa

Character string. 'Stochastic Production/Profit Frontier, e = v - u' when S = 1 and 'Stochastic Cost Frontier, e = v + u' when S = -1.

Nobs

Number of observations used for optimization.

nXvar

Number of explanatory variables in the production or cost frontier.

nmuZUvar

Number of variables explaining heterogeneity in the truncated mean, only if udist = 'tnormal' or 'lognormal'.

scaling

The argument 'scaling'. See the section ‘Arguments’.

logDepVar

The argument 'logDepVar'. See the section ‘Arguments’.

nuZUvar

Number of variables explaining heteroscedasticity in the one-sided error term.

nvZVvar

Number of variables explaining heteroscedasticity in the two-sided error term.

nParm

Total number of parameters estimated.

udist

The argument 'udist'. See the section ‘Arguments’.

startVal

Numeric vector. Starting value for M(S)L estimation.

dataTable

A data frame (tibble format) containing information on data used for optimization along with residuals and fitted values of the OLS and M(S)L estimations, and the individual observation log-likelihood. When weights is specified an additional variable is also provided in dataTable.

olsParam

Numeric vector. OLS estimates.

olsStder

Numeric vector. Standard errors of OLS estimates.

olsSigmasq

Numeric. Estimated variance of OLS random error.

olsLoglik

Numeric. Log-likelihood value of OLS estimation.

olsSkew

Numeric. Skewness of the residuals of the OLS estimation.

olsM3Okay

Logical. Indicating whether the residuals of the OLS estimation have the expected skewness.

CoelliM3Test

Coelli's test for OLS residuals skewness. (See Coelli, 1995).

AgostinoTest

D'Agostino's test for OLS residuals skewness. (See D'Agostino and Pearson, 1973).

isWeights

Logical. If TRUE weighted log-likelihood is maximized.

optType

Optimization algorithm used.

nIter

Number of iterations of the ML estimation.

optStatus

Optimization algorithm termination message.

startLoglik

Log-likelihood at the starting values.

mlLoglik

Log-likelihood value of the M(S)L estimation.

mlParam

Parameters obtained from M(S)L estimation.

gradient

Each variable gradient of the M(S)L estimation.

gradL_OBS

Matrix. Each variable individual observation gradient of the M(S)L estimation.

gradientNorm

Gradient norm of the M(S)L estimation.

invHessian

Covariance matrix of the parameters obtained from the M(S)L estimation.

hessianType

The argument 'hessianType'. See the section ‘Arguments’.

mlDate

Date and time of the estimated model.

simDist

The argument 'simDist', only if udist = 'gamma', 'lognormal' or , 'weibull'. See the section ‘Arguments’.

Nsim

The argument 'Nsim', only if udist = 'gamma', 'lognormal' or , 'weibull'. See the section ‘Arguments’.

FiMat

Matrix of random draws used for MSL, only if udist = 'gamma', 'lognormal' or , 'weibull'.

Note

For the Halton draws, the code is adapted from the mlogit package.

References

Aigner, D., Lovell, C. A. K., and Schmidt, P. 1977. Formulation and estimation of stochastic frontier production function models. Journal of Econometrics, 6(1), 21–37.

Battese, G. E., and Coelli, T. J. 1995. A model for technical inefficiency effects in a stochastic frontier production function for panel data. Empirical Economics, 20(2), 325–332.

Caudill, S. B., and Ford, J. M. 1993. Biases in frontier estimation due to heteroscedasticity. Economics Letters, 41(1), 17–20.

Caudill, S. B., Ford, J. M., and Gropper, D. M. 1995. Frontier estimation and firm-specific inefficiency measures in the presence of heteroscedasticity. Journal of Business & Economic Statistics, 13(1), 105–111.

Coelli, T. 1995. Estimators and hypothesis tests for a stochastic frontier function - a Monte-Carlo analysis. Journal of Productivity Analysis, 6:247–268.

D'Agostino, R., and E.S. Pearson. 1973. Tests for departure from normality. Empirical results for the distributions of b_2 and \sqrt{b_1}. Biometrika, 60:613–622.

Greene, W. H. 2003. Simulated likelihood estimation of the normal-Gamma stochastic frontier function. Journal of Productivity Analysis, 19(2-3), 179–190.

Hadri, K. 1999. Estimation of a doubly heteroscedastic stochastic frontier cost function. Journal of Business & Economic Statistics, 17(3), 359–363.

Hajargasht, G. 2015. Stochastic frontiers with a Rayleigh distribution. Journal of Productivity Analysis, 44(2), 199–208.

Huang, C. J., and Liu, J.-T. 1994. Estimation of a non-neutral stochastic frontier production function. Journal of Productivity Analysis, 5(2), 171–180.

Kumbhakar, S. C., Ghosh, S., and McGuckin, J. T. 1991) A generalized production frontier approach for estimating determinants of inefficiency in U.S. dairy farms. Journal of Business & Economic Statistics, 9(3), 279–286.

Li, Q. 1996. Estimating a stochastic production frontier when the adjusted error is symmetric. Economics Letters, 52(3), 221–228.

Meeusen, W., and Vandenbroeck, J. 1977. Efficiency estimation from Cobb-Douglas production functions with composed error. International Economic Review, 18(2), 435–445.

Migon, H. S., and Medici, E. V. 2001. Bayesian hierarchical models for stochastic production frontier. Lacea, Montevideo, Uruguay.

Nguyen, N. B. 2010. Estimation of technical efficiency in stochastic frontier analysis. PhD dissertation, Bowling Green State University, August.

Papadopoulos, A. 2021. Stochastic frontier models using the generalized exponential distribution. Journal of Productivity Analysis, 55:15–29.

Reifschneider, D., and Stevenson, R. 1991. Systematic departures from the frontier: A framework for the analysis of firm inefficiency. International Economic Review, 32(3), 715–723.

Stevenson, R. E. 1980. Likelihood Functions for Generalized Stochastic Frontier Estimation. Journal of Econometrics, 13(1), 57–66.

Tsionas, E. G. 2007. Efficiency measurement with the Weibull stochastic frontier. Oxford Bulletin of Economics and Statistics, 69(5), 693–706.

Wang, K., and Ye, X. 2020. Development of alternative stochastic frontier models for estimating time-space prism vertices. Transportation.

Wang, H.J., and Schmidt, P. 2002. One-step and two-step estimation of the effects of exogenous variables on technical efficiency levels. Journal of Productivity Analysis, 18:129–144.

Wang, J. 2012. A normal truncated skewed-Laplace model in stochastic frontier analysis. Master thesis, Western Kentucky University, May.

See Also

print for printing sfacross object.

summary for creating and printing summary results.

coef for extracting coefficients of the estimation.

efficiencies for computing (in-)efficiency estimates.

fitted for extracting the fitted frontier values.

ic for extracting information criteria.

logLik for extracting log-likelihood value(s) of the estimation.

marginal for computing marginal effects of inefficiency drivers.

residuals for extracting residuals of the estimation.

skewnessTest for conducting residuals skewness test.

vcov for computing the variance-covariance matrix of the coefficients.

bread for bread for sandwich estimator.

estfun for gradient extraction for each observation.

skewnessTest for implementing skewness test.

Examples


## Using data on fossil fuel fired steam electric power generation plants in the U.S.
# Translog (cost function) half normal with heteroscedasticity
tl_u_h <- sfacross(formula = log(tc/wf) ~ log(y) + I(1/2 * (log(y))^2) +
log(wl/wf) + log(wk/wf) + I(1/2 * (log(wl/wf))^2) + I(1/2 * (log(wk/wf))^2) +
I(log(wl/wf) * log(wk/wf)) + I(log(y) * log(wl/wf)) + I(log(y) * log(wk/wf)),
udist = 'hnormal', uhet = ~ regu, data = utility, S = -1, method = 'bfgs')
summary(tl_u_h)

# Translog (cost function) truncated normal with heteroscedasticity
tl_u_t <- sfacross(formula = log(tc/wf) ~ log(y) + I(1/2 * (log(y))^2) +
log(wl/wf) + log(wk/wf) + I(1/2 * (log(wl/wf))^2) + I(1/2 * (log(wk/wf))^2) +
I(log(wl/wf) * log(wk/wf)) + I(log(y) * log(wl/wf)) + I(log(y) * log(wk/wf)),
udist = 'tnormal', muhet = ~ regu, data = utility, S = -1, method = 'bhhh')
summary(tl_u_t)

# Translog (cost function) truncated normal with scaling property
tl_u_ts <- sfacross(formula = log(tc/wf) ~ log(y) + I(1/2 * (log(y))^2) +
log(wl/wf) + log(wk/wf) + I(1/2 * (log(wl/wf))^2) + I(1/2 * (log(wk/wf))^2) +
I(log(wl/wf) * log(wk/wf)) + I(log(y) * log(wl/wf)) + I(log(y) * log(wk/wf)),
udist = 'tnormal', muhet = ~ regu, uhet = ~ regu, data = utility, S = -1,
scaling = TRUE, method = 'mla')
summary(tl_u_ts)

## Using data on Philippine rice producers
# Cobb Douglas (production function) generalized exponential, and Weibull 
# distributions

cb_p_ge <- sfacross(formula = log(PROD) ~ log(AREA) + log(LABOR) + log(NPK) +
log(OTHER), udist = 'genexponential', data = ricephil, S = 1, method = 'bfgs')
summary(cb_p_ge)

## Using data on U.S. electric utility industry
# Cost frontier Gamma distribution
tl_u_g <- sfacross(formula = log(cost/fprice) ~ log(output) + I(log(output)^2) +
I(log(lprice/fprice)) + I(log(cprice/fprice)), udist = 'gamma', uhet = ~ 1,
data = electricity, S = -1, method = 'bfgs', simType = 'halton', Nsim = 200,
hessianType = 2)
summary(tl_u_g)


sfaR documentation built on Oct. 29, 2024, 9:07 a.m.