sf: Stochastic Frontier Models Using Cross-Sectional and Panel...
In npsf: Nonparametric and Stochastic Efficiency and Productivity Analysis

Description Usage Arguments Details Value Author(s) References See Also Examples

sf performs maximum likelihood estimation of the parameters and technical or cost efficiencies in cross-sectional stochastic (production or cost) frontier models with half-normal or truncated normal distributional assumption imposed on inefficiency error component.

sf(formula, data, it = NULL, subset,
 prod = TRUE, model = "K1990", distribution = c("h"),
 eff.time.invariant = FALSE, 
 mean.u.0i.zero     = FALSE,
 mean.u.0i          = NULL,
 ln.var.u.0i        = NULL,
 ln.var.v.0i        = NULL,
 ln.var.v.it        = NULL,  
 simtype = c("halton", "rnorm"), halton.base = NULL, R = 500,
 simtype_GHK = c("halton", "runif"), R_GHK = 500,
 random.primes = FALSE,
 cost.eff.less.one  = FALSE, level = 95, marg.eff = FALSE,
 start.val = NULL, maxit = 199, report.ll.optim = 10, 
 reltol = 1e-8, lmtol = sqrt(.Machine$double.eps),
 digits = 4, print.level = 4, seed = 17345168,
 only.data = FALSE)

`formula`	an object of class “formula” (or one that can be coerced to that class): a symbolic description of the model. The details of model specification are given under ‘Details’.
`data`	an optional data frame containing variables in the model. If not found in data, the variables are taken from environment (`formula`), typically the environment from which `sf` is called.
`it`	vector with two character entries. E.g., c("ID", "TIME"), where "ID" defines individuals that are observed in time periods defined by "TIME". The default is `NULL`. At default, cross-sectional model will be estimated.
`subset`	an optional vector specifying a subset of observations for which technical or cost efficiencies are to be computed.
`prod`	logical. If `TRUE`, the estimates of parameters of stochastic production frontier model and of technical efficiencies are returned; if `FALSE`, the estimates of parameters of stochastic cost frontier model and of cost efficiencies are returned.
`model`	character. Five panel data models are estimated for now. "K1990" and "K1990modified" (see Kumbhakar, 1990), "BC1992" (see Battese and Coelli, 1992), "4comp" (see Badunenko and Kumbhakar (2016) and Filippini and Greene, 2016). They specify common evolution of inefficiency. Deffault is "K1990". The time functions are "( 1 + exp(bt + ct^2) )^-1", "1 + d(t-T_i) + e(t-T_i)^2", and "exp( -f*(t-T_i) )", respectively.
`distribution`	either `"h"` (half-normal), `"t"` (truncated normal), or `"e"` (exponential, only crosssectional models), specifying the distribution of inefficiency term.
`eff.time.invariant`	logical. If `TRUE`, the 1st generation of panel data models is estimated, otherwise, the 2nd generation or 4 components panel data model is estimated.
`mean.u.0i.zero`	logical. If `TRUE`, normal-half normal model is estimated, otherwise, normal-truncated model is estimated.
`mean.u.0i`	one-sided formula; e.g. `mean.u.0i ~ z1 + z2`. Specifies whether the mean of pre-truncated normal distribution of inefficiency term is a linear function of exogenous variables. In cross-sectional models, used only when `distribution = "t"`. If `NULL`, mean is assumed to be constant for all ids.
`ln.var.u.0i`	one-sided formula; e.g. `ln.var.u.0i ~ z1 + z2`. Specifies exogenous variables entering the expression for the log of variance of inefficiency error component. If `NULL`, inefficiency term is assumed to be homoskedastic, i.e. σ_u^2 = exp(γ[0]). Time invariant variables are expected.
`ln.var.v.0i`	one-sided formula; e.g. `ln.var.v.0i ~ z1 + z2`. Specifies exogenous variables entering the expression for variance of random noise error component. If `NULL`, random noise component is assumed to be homoskedastic, i.e. σ_v^2 = exp(γ[0]). Time invariant variables are expected.
`ln.var.v.it`	one-sided formula; e.g. `ln.var.v.it ~ z1 + z2`. Specifies exogenous variables entering the expression for variance of random noise error component. If `NULL`, random noise component is assumed to be homoskedastic, i.e. σ_v^2 = exp(γ[0]). Time invariant variables are expected.
`simtype`	character. Type of random deviates for the 4 components model. 'halton' draws are default. One can specify 'rnorm.'
`halton.base`	numeric. The prime number which is the base for the Halton draws. If not used, different bases are used for each id.
`R`	numeric. Number of draws. Default is 500. Can be time consuming.
`simtype_GHK`	character. Type of random deviates for use in GHK for efficiency estimating by approximation. 'halton' draws are default. One can specify 'runif.'
`R_GHK`	numeric. Number of draws for GHK. Default is 500. Can be time consuming.
`random.primes`	logical. If `TRUE`, and `halton.base = NULL`, the primes are chosen on a random basis for each ID from 100008 available prime numbers.
`cost.eff.less.one`	logical. If `TRUE`, and `prod = FALSE`, reported cost efficiencies are larger than one. Interpretation: by what factor is total cost larger than the potential total cost.
`level`	numeric. Defines level% two-sided prediction interval for technical or cost efficiencies (see Horrace and Schmidt 1996). Default is 95.
`marg.eff`	logical. If `TRUE`, unit-specific marginal effects of exogenous variables on the mean of distribution of inefficiency term are returned.
`start.val`	numeric. Starting values to be supplied to the optimization routine. If `NULL`, OLS and method of moments estimates are used (see Kumbhakar and Lovell 2000).
`maxit`	numeric. Maximum number of iterations. Default is 199.
`report.ll.optim`	numeric. Not used for now.
`reltol`	numeric. One of convergence criteria. Not used for now.
`lmtol`	numeric. Tolerance for the scaled gradient in ML optimization. Default is sqrt(.Machine$double.eps).
`digits`	numeric. Number of digits to be displayed in estimation results and for efficiency estimates. Default is 4.
`print.level`	numeric. 1 - print estimation results. 2 - print optimization details. 3 - print summary of point estimates of technical or cost efficiencies. 7 - print unit-specific point and interval estimates of technical or cost efficiencies. Default is 4.
`seed`	set seed for replicability. Default is 17345168.
`only.data`	logical. If `TRUE`, only data are returned. Default is `FALSE`

Models for sf are specified symbolically. A typical model has the form y ~ x1 + ..., where y represents the logarithm of outputs or total costs and {x1,...} is a series of inputs or outputs and input prices (in logs).

Options ln.var.u.0i and ln.var.v.0i can be used if multiplicative heteroskedasticity of either inefficiency or random noise component (or both) is assumed; i.e. if their variances can be expressed as exponential functions of (e.g. size-related) exogenous variables (including intercept) (see Caudill et al. 1995).

If marg.eff = TRUE and distribution = "h", the marginal effect of kth exogenous variable on the expected value of inefficiency term of unit i is computed as: γ[k]σ[i]/√2π, where σ_u[i] = √ exp(z[i]'γ). If distribution = "t", marginal effects are returned if either mean.u.0i or ln.var.u.0i are not NULL. If the same exogenous variables are specified under both options, (non-monotonic) marginal effects are computed as explained in Wang (2002).

See references and links below.

sf returns a list of class npsf containing the following elements:

`coef`	numeric. Named vector of ML parameter estimates.
`vcov`	matrix. Estimated covariance matrix of ML estimator.
`ll`	numeric. Value of log-likelihood at ML estimates.
`efficiencies`	data frame. Contains point estimates of unit-specific technical or cost efficiencies: exp(-E(u\|e)) of Jondrow et al. (1982), E(exp(-u)\|e) of Battese and Coelli (1988), and exp(-M(u\|e)), where M(u\|e) is the mode of conditional distribution of inefficiency term. In addition, estimated lower and upper bounds of (1-α)100% two-sided prediction intervals are returned.
`marg.effects`	data frame. Contains unit-specific marginal effects of exogenous variables on the expected value of inefficiency term.
`sigmas_u`	matrix. Estimated unit-specific variances of inefficiency term. Returned if `ln.var.u.0i` is not `NULL`.
`sigmas_v`	matrix. Estimated unit-specific variances of random noise component. Returned if `ln.var.v.0i` is not `NULL`.
`mu`	matrix. Estimated unit-specific means of pre-truncated normal distribution of inefficiency term. Returned if `mean.u.0i` is not `NULL`.
`esample`	logical. Returns TRUE if the observation in user supplied data is in the estimation subsample and FALSE otherwise.

Oleg Badunenko <oleg.badunenko@brunel.ac.uk>

Badunenko, O. and Kumbhakar, S.C. (2016), When, Where and How to Estimate Persistent and Transient Efficiency in Stochastic Frontier Panel Data Models, European Journal of Operational Research, 255(1), 272–287, doi: 10.1016/j.ejor.2016.04.049.

Battese, G., Coelli, T. (1988), Prediction of firm-level technical effiiencies with a generalized frontier production function and panel data. Journal of Econometrics, 38, 387–399.

Battese, G., Coelli, T. (1992), Frontier production functions, technical efficiency and panel data: With application to paddy farmers in India. Journal of Productivity Analysis, 3, 153–169.

Caudill, S., Ford, J., Gropper, D. (1995), Frontier estimation and firm-specific inefficiency measures in the presence of heteroscedasticity. Journal of Business and Economic Statistics, 13, 105–111.

Filippini, M. and Greene, W.H. (2016), Persistent and transient productive inefficiency: A maximum simulated likelihood approach. Journal of Productivity Analysis, 45 (2), 187–196.

Horrace, W. and Schmidt, P. (1996), On ranking and selection from independent truncated normal distributions. Journal of Productivity Analysis, 7, 257–282.

Jondrow, J., Lovell, C., Materov, I., Schmidt, P. (1982), On estimation of technical inefficiency in the stochastic frontier production function model. Journal of Econometrics, 19, 233–238.

Kumbhakar, S. (1990), Production Frontiers, Panel Data, and Time-varying Technical Inefficiency. Journal of Econometrics, 46, 201–211.

Kumbhakar, S. and Lovell, C. (2003), Stochastic Frontier Analysis. Cambridge: Cambridge University Press, doi: 10.1017/CBO9781139174411.

Wang, H.-J. (2002), Heteroskedasticity and non-monotonic efficiency effects of a stochastic frontier model. Journal of Productivity Analysis, 18, 241–253.

teradial, tenonradial, teradialbc, tenonradialbc, nptestrts, nptestind, halton, primes

 
require( npsf )

# Cross-sectional examples begin ------------------------------------------

# Load Penn World Tables 5.6 dataset

data( pwt56 )
head( pwt56 )

# Create some missing values

pwt56 [4, "K"] <- NA 

# Stochastic production frontier model with 
# homoskedastic error components (half-normal)

# Use subset of observations - for year 1965

m1 <- sf(log(Y) ~ log(L) + log(K), data = pwt56, 
         subset = year == 1965, distribution = "h")

m1 <- sf(log(Y) ~ log(L) + log(K), data = pwt56, 
         subset = year == 1965, distribution = "e")

# Test CRS: 'car' package in required for that
## Not run: linearHypothesis(m1, "log(L) + log(K) = 1")

# Write efficiencies to the data frame using 'esample':

pwt56$BC[ m1$esample ] <- m1$efficiencies$BC
## Not run: View(pwt56)

# Computation using matrices

Y1 <- as.matrix(log(pwt56[pwt56$year == 1965, 
                          c("Y"), drop = FALSE]))
X1 <- as.matrix(log(pwt56[pwt56$year == 1965,
                          c("K", "L"), drop = FALSE]))

X1 [51, 2] <- NA # create missing
X1 [49, 1] <- NA # create missing

m2 <- sf(Y1 ~ X1, distribution = "h")

# Load U.S. commercial banks dataset

data(banks05)
head(banks05)

# Doubly heteroskedastic stochastic cost frontier 
# model (truncated normal)

# Print summaries of cost efficiencies' estimates

m3 <- sf(lnC ~ lnw1 + lnw2 + lny1 + lny2, ln.var.u.0i = ~ ER, 
         ln.var.v.0i = ~ LA, data = banks05, distribution = "t", 
         prod = FALSE, print.level = 3)

m3 <- sf(lnC ~ lnw1 + lnw2 + lny1 + lny2, ln.var.u.0i = ~ ER, 
         ln.var.v.0i = ~ LA, data = banks05, distribution = "e", 
         prod = FALSE, print.level = 3)
         
# Non-monotonic marginal effects of equity ratio on 
# the mean of distribution of inefficiency term

m4 <- sf(lnC ~ lnw1 + lnw2 + lny1 + lny2, ln.var.u.0i = ~ ER,
         mean.u.0i = ~ ER, data = banks05, distribution = "t", 
         prod = FALSE, marg.eff = TRUE)

summary(m4$marg.effects)


# Cross-sectional examples end --------------------------------------------

## Not run: 

# Panel data examples begin -----------------------------------------------

# The only way to differentiate between cross-sectional and panel-data
# models is by specifying "it".
# If "it" is not specified, cross-sectional model will be estimated.
# Example is below.

# Read data ---------------------------------------------------------------

# Load U.S. commercial banks dataset

data(banks00_07)
head(banks00_07, 5)

banks00_07$trend <- banks00_07$year - min(banks00_07$year) + 1

# Model specification -----------------------------------------------------

my.prod     <- FALSE
my.it       <- c("id","year")

# my.model = "BC1992"
# my.model = "K1990modified"
# my.model = "K1990"

# translog ----------------------------------------------------------------
formu <- log(TC) ~ (log(Y1) + log(Y2) + log(W1) + log(W2) + trend)^2 +
 I(0.5*log(Y1)^2) + I(0.5*log(Y2)^2) + I(0.5*log(W1)^2) + I(0.5*log(W2)^2) +
 trend + I(0.5*trend^2)

# Cobb-Douglas ------------------------------------------------------------
# formu <- log(TC) ~ log(Y1) + log(Y2) + log(W1) + log(W2) + trend + I(trend^2)

ols <- lm(formu, data = banks00_07)
# just to mark the results of the OLS model
summary(ols)

# Components specifications -----------------------------------------------

ln.var.v.it <- ~ log(TA)
ln.var.u.0i <- ~ ER_ave
mean.u.0i_1 <- ~ LLP_ave + LA_ave
mean.u.0i_2 <- ~ LLP_ave + LA_ave - 1

# Suppose "it" is not specified
# Then it is a cross-sectional model

t0_1st_0 <- sf(formu, data = banks00_07, subset = year > 2000,
               prod = my.prod,
               ln.var.v.it = ln.var.v.it,
               ln.var.u.0i = ln.var.u.0i,
               eff.time.invariant = TRUE,
               mean.u.0i.zero = TRUE)

# 1st generation models ---------------------------------------------------

# normal-half normal ------------------------------------------------------

# the same as above but "it" is specified

t0_1st_0 <- sf(formu, data = banks00_07, it = my.it, subset = year > 2000,
               prod = my.prod,
               ln.var.v.it = ln.var.v.it,
               ln.var.u.0i = ln.var.u.0i, 
               eff.time.invariant = TRUE, 
               mean.u.0i.zero = TRUE)

# Note efficiencies are time-invariant

# confidence intervals for efficiencies -----------------------------------

head(t0_1st_0$efficiencies, 20)


# normal-truncated normal -------------------------------------------------

# truncation point is constant (for all ids) ------------------------------

t0_1st_1 <- sf(formu, data = banks00_07, it = my.it, subset = year > 2000,
               prod = my.prod,
               eff.time.invariant = TRUE,
               mean.u.0i.zero = FALSE,
               ln.var.v.it = ln.var.v.it,
               ln.var.u.0i = ln.var.u.0i,
               mean.u.0i = NULL,
               cost.eff.less.one = TRUE)



# truncation point is determined by variables -----------------------------

t0_1st_2 <- sf(formu, data = banks00_07, it = my.it, subset = year > 2000,
               prod = my.prod,
               eff.time.invariant = TRUE,
               mean.u.0i.zero = FALSE,
               mean.u.0i = mean.u.0i_1,
               ln.var.v.it = ln.var.v.it,
               ln.var.u.0i = ln.var.u.0i,
               cost.eff.less.one = TRUE)



# the same, but without intercept in mean.u.0i

t0_1st_3 <- sf(formu, data = banks00_07, it = my.it, subset = year > 2000,
               prod = my.prod,
               eff.time.invariant = TRUE,
               mean.u.0i.zero = FALSE,
               mean.u.0i = mean.u.0i_2,
               ln.var.v.it = ln.var.v.it,
               ln.var.u.0i = ln.var.u.0i,
               cost.eff.less.one = TRUE)

# 2nd generation models ---------------------------------------------------

# normal-half normal ------------------------------------------------------

# Kumbhakar (1990) model --------------------------------------------------

t_nhn_K1990 <- sf(formu, data = banks00_07, it = my.it, subset = year > 2000,
                  prod = my.prod,
                  eff.time.invariant = FALSE,
                  mean.u.0i.zero = TRUE, 
                  ln.var.v.it = ln.var.v.it,
                  ln.var.u.0i = ln.var.u.0i, 
                  cost.eff.less.one = FALSE)


# Kumbhakar (1990) modified model -----------------------------------------

t_nhn_K1990modified <- sf(formu, data = banks00_07, it = my.it, subset = year > 2000,
                          prod = my.prod, model = "K1990modified",
                          eff.time.invariant = FALSE,
                          mean.u.0i.zero = TRUE, 
                          ln.var.v.it = ln.var.v.it,
                          ln.var.u.0i = ln.var.u.0i, 
                          cost.eff.less.one = FALSE)


# Battese and Coelli (1992) model -----------------------------------------

t_nhn_BC1992 <- sf(formu, data = banks00_07, it = my.it, subset = year > 2000,
                   prod = my.prod, model = "BC1992",
                   eff.time.invariant = FALSE,
                   mean.u.0i.zero = TRUE, 
                   ln.var.v.it = ln.var.v.it,
                   ln.var.u.0i = ln.var.u.0i, 
                   cost.eff.less.one = FALSE)

# normal-truncated normal -------------------------------------------------

# Kumbhakar (1990) model --------------------------------------------------

# truncation point is constant (for all ids) ------------------------------

t_ntn_K1990_0 <- sf(formu, data = banks00_07, it = my.it, subset = year > 2000,
                    prod = my.prod,
                    eff.time.invariant = FALSE, 
                    mean.u.0i.zero = FALSE,
                    ln.var.v.it = ln.var.v.it, 
                    ln.var.u.0i = ln.var.u.0i,
                    cost.eff.less.one = FALSE)


# truncation point is determined by variables -----------------------------

t_ntn_K1990_1 <- sf(formu, data = banks00_07, it = my.it, subset = year > 2000,
                    prod = my.prod,
                    eff.time.invariant = FALSE, 
                    mean.u.0i.zero = FALSE, 
                    mean.u.0i = mean.u.0i_1,
                    ln.var.v.it = ln.var.v.it, 
                    ln.var.u.0i = ln.var.u.0i,
                    cost.eff.less.one = FALSE)

# Efficiencies are tiny, since empirically truncation points are quite big.
# Try withouth an intercept in conditional mean f-n

t_ntn_K1990_2 <- sf(formu, data = banks00_07, it = my.it, subset = year > 2000,
                    prod = my.prod,
                    eff.time.invariant = FALSE, 
                    mean.u.0i.zero = FALSE, 
                    mean.u.0i = mean.u.0i_2,
                    ln.var.v.it = ln.var.v.it, 
                    ln.var.u.0i = ln.var.u.0i,
                    cost.eff.less.one = FALSE)

# Kumbhakar (1990) modified model -----------------------------------------

t_ntn_K1990modified <- sf(formu, data = banks00_07, it = my.it, subset = year > 2000,
                          prod = my.prod, model = "K1990modified",
                          eff.time.invariant = FALSE, 
                          mean.u.0i.zero = FALSE, 
                          mean.u.0i = mean.u.0i_1,
                          ln.var.v.it = ln.var.v.it, 
                          ln.var.u.0i = ln.var.u.0i,
                          cost.eff.less.one = FALSE)

# Battese and Coelli (1992) model -----------------------------------------


t_ntn_BC1992 <- sf(formu, data = banks00_07, it = my.it, subset = year > 2000,
                   prod = my.prod, model = "BC1992",
                   eff.time.invariant = FALSE, 
                   mean.u.0i.zero = FALSE, 
                   mean.u.0i = mean.u.0i_1,
                   ln.var.v.it = ln.var.v.it, 
                   ln.var.u.0i = ln.var.u.0i,
                   cost.eff.less.one = FALSE)

# The next one (without "subset = year > 2000" option) converges OK

t_ntn_BC1992 <- sf(formu, data = banks00_07, it = my.it,
                   prod = my.prod, model = "BC1992",
                   eff.time.invariant = FALSE, 
                   mean.u.0i.zero = FALSE, 
                   mean.u.0i = mean.u.0i_1,
                   ln.var.v.it = ln.var.v.it, 
                   ln.var.u.0i = ln.var.u.0i,
                   cost.eff.less.one = FALSE)

# 4 component model ------------------------------------------------------

# Note, R should better be more than 200, this is just for illustration.
# This is the model that takes long to be estimated.  
# For the following example, 'mlmaximize' required 357 iterations and
# took 8 minutes.  
# The time will increase with more data and more parameters.

formu <- log(TC) ~ log(Y1) + log(Y2) + log(W1) + log(W2) + trend

t_4comp <- sf(formu, data = banks00_07, it = my.it, 
              subset = year >= 2001 & year < 2006,
              prod = my.prod, model = "4comp",
              R = 500, initialize.halton = TRUE, 
              lmtol = 1e-5, maxit = 500, print.level = 4)

# With R = 500, 'mlmaximize' required 124 iterations and
# took 7 minutes.  
# The time will increase with more data and more parameters.

formu <- log(TC) ~ log(Y1) + log(Y2) + log(W1) + log(W2) + trend

t_4comp_500 <- sf(formu, data = banks00_07, it = my.it, 
              subset = year >= 2001 & year < 2006,
              prod = my.prod, model = "4comp",
              R = 500, initialize.halton = TRUE, 
              lmtol = 1e-5, maxit = 500, print.level = 4)
              
# @e_i0, @e_it, and @e_over give efficiencies, 
# where @ is either 'c' or 't' for cost or production function.
# e.g., t_ntn_4comp$ce_i0 from last model, gives persistent cost efficiencies

# Panel data examples end -------------------------------------------------


## End(Not run)