Description Usage Arguments Details Value Author(s) References See Also Examples
sf
performs maximum likelihood estimation of the parameters and technical or cost efficiencies in cross-sectional stochastic (production or cost) frontier models with half-normal or truncated normal distributional assumption imposed on inefficiency error component.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 | sf(formula, data, it = NULL, subset,
prod = TRUE, model = "K1990", distribution = c("h"),
eff.time.invariant = FALSE,
mean.u.0i.zero = FALSE,
mean.u.0i = NULL,
ln.var.u.0i = NULL,
ln.var.v.0i = NULL,
ln.var.v.it = NULL,
simtype = c("halton", "rnorm"), halton.base = NULL, R = 500,
simtype_GHK = c("halton", "runif"), R_GHK = 500,
random.primes = FALSE,
cost.eff.less.one = FALSE, level = 95, marg.eff = FALSE,
start.val = NULL, maxit = 199, report.ll.optim = 10,
reltol = 1e-8, lmtol = sqrt(.Machine$double.eps),
digits = 4, print.level = 4, seed = 17345168,
only.data = FALSE)
|
formula |
an object of class “formula” (or one that can be coerced to that class): a symbolic description of the model. The details of model specification are given under ‘Details’. |
data |
an optional data frame containing variables in the model. If not found in data, the variables are taken from environment ( |
it |
vector with two character entries. E.g., c("ID", "TIME"), where "ID" defines individuals that are observed in time periods defined by "TIME". The default is |
subset |
an optional vector specifying a subset of observations for which technical or cost efficiencies are to be computed. |
prod |
logical. If |
model |
character. Five panel data models are estimated for now. "K1990" and "K1990modified" (see Kumbhakar, 1990), "BC1992" (see Battese and Coelli, 1992), "4comp" (see Badunenko and Kumbhakar (2016) and Filippini and Greene, 2016). They specify common evolution of inefficiency. Deffault is "K1990". The time functions are "( 1 + exp(b*t + c*t^2) )^-1", "1 + d*(t-T_i) + e*(t-T_i)^2", and "exp( -f*(t-T_i) )", respectively. |
distribution |
either |
eff.time.invariant |
logical. If |
mean.u.0i.zero |
logical. If |
mean.u.0i |
one-sided formula; e.g. |
ln.var.u.0i |
one-sided formula; e.g. |
ln.var.v.0i |
one-sided formula; e.g. |
ln.var.v.it |
one-sided formula; e.g. |
simtype |
character. Type of random deviates for the 4 components model. 'halton' draws are default. One can specify 'rnorm.' |
halton.base |
numeric. The prime number which is the base for the Halton draws. If not used, different bases are used for each id. |
R |
numeric. Number of draws. Default is 500. Can be time consuming. |
simtype_GHK |
character. Type of random deviates for use in GHK for efficiency estimating by approximation. 'halton' draws are default. One can specify 'runif.' |
R_GHK |
numeric. Number of draws for GHK. Default is 500. Can be time consuming. |
random.primes |
logical. If |
cost.eff.less.one |
logical. If |
level |
numeric. Defines level% two-sided prediction interval for technical or cost efficiencies (see Horrace and Schmidt 1996). Default is 95. |
marg.eff |
logical. If |
start.val |
numeric. Starting values to be supplied to the optimization routine. If |
maxit |
numeric. Maximum number of iterations. Default is 199. |
report.ll.optim |
numeric. Not used for now. |
reltol |
numeric. One of convergence criteria. Not used for now. |
lmtol |
numeric. Tolerance for the scaled gradient in ML optimization. Default is sqrt(.Machine$double.eps). |
digits |
numeric. Number of digits to be displayed in estimation results and for efficiency estimates. Default is 4. |
print.level |
numeric. 1 - print estimation results. 2 - print optimization details. 3 - print summary of point estimates of technical or cost efficiencies. 7 - print unit-specific point and interval estimates of technical or cost efficiencies. Default is 4. |
seed |
set seed for replicability. Default is 17345168. |
only.data |
logical. If |
Models for sf
are specified symbolically. A typical model has the form y ~ x1 + ...
, where y
represents the logarithm of outputs or total costs and {x1,...}
is a series of inputs or outputs and input prices (in logs).
Options ln.var.u.0i
and ln.var.v.0i
can be used if multiplicative heteroskedasticity of either inefficiency or random noise component (or both) is assumed; i.e. if their variances can be expressed as exponential functions of (e.g. size-related) exogenous variables (including intercept) (see Caudill et al. 1995).
If marg.eff = TRUE
and distribution = "h"
, the marginal effect of kth exogenous variable on the expected value of inefficiency term of unit i is computed as: γ[k]σ[i]/√2π, where σ_u[i] = √ exp(z[i]'γ). If distribution = "t"
, marginal effects are returned if either mean.u.0i
or ln.var.u.0i
are not NULL
. If the same exogenous variables are specified under both options, (non-monotonic) marginal effects are computed as explained in Wang (2002).
See references and links below.
sf
returns a list of class npsf
containing the following elements:
coef |
numeric. Named vector of ML parameter estimates. |
vcov |
matrix. Estimated covariance matrix of ML estimator. |
ll |
numeric. Value of log-likelihood at ML estimates. |
efficiencies |
data frame. Contains point estimates of unit-specific technical or cost efficiencies: exp(-E(u|e)) of Jondrow et al. (1982), E(exp(-u)|e) of Battese and Coelli (1988), and exp(-M(u|e)), where M(u|e) is the mode of conditional distribution of inefficiency term. In addition, estimated lower and upper bounds of (1-α)100% two-sided prediction intervals are returned. |
marg.effects |
data frame. Contains unit-specific marginal effects of exogenous variables on the expected value of inefficiency term. |
sigmas_u |
matrix. Estimated unit-specific variances of inefficiency term. Returned if |
sigmas_v |
matrix. Estimated unit-specific variances of random noise component. Returned if |
mu |
matrix. Estimated unit-specific means of pre-truncated normal distribution of inefficiency term. Returned if |
esample |
logical. Returns TRUE if the observation in user supplied data is in the estimation subsample and FALSE otherwise. |
Oleg Badunenko <oleg.badunenko@brunel.ac.uk>
Badunenko, O. and Kumbhakar, S.C. (2016), When, Where and How to Estimate Persistent and Transient Efficiency in Stochastic Frontier Panel Data Models, European Journal of Operational Research, 255(1), 272–287, doi: 10.1016/j.ejor.2016.04.049.
Battese, G., Coelli, T. (1988), Prediction of firm-level technical effiiencies with a generalized frontier production function and panel data. Journal of Econometrics, 38, 387–399.
Battese, G., Coelli, T. (1992), Frontier production functions, technical efficiency and panel data: With application to paddy farmers in India. Journal of Productivity Analysis, 3, 153–169.
Caudill, S., Ford, J., Gropper, D. (1995), Frontier estimation and firm-specific inefficiency measures in the presence of heteroscedasticity. Journal of Business and Economic Statistics, 13, 105–111.
Filippini, M. and Greene, W.H. (2016), Persistent and transient productive inefficiency: A maximum simulated likelihood approach. Journal of Productivity Analysis, 45 (2), 187–196.
Horrace, W. and Schmidt, P. (1996), On ranking and selection from independent truncated normal distributions. Journal of Productivity Analysis, 7, 257–282.
Jondrow, J., Lovell, C., Materov, I., Schmidt, P. (1982), On estimation of technical inefficiency in the stochastic frontier production function model. Journal of Econometrics, 19, 233–238.
Kumbhakar, S. (1990), Production Frontiers, Panel Data, and Time-varying Technical Inefficiency. Journal of Econometrics, 46, 201–211.
Kumbhakar, S. and Lovell, C. (2003), Stochastic Frontier Analysis. Cambridge: Cambridge University Press, doi: 10.1017/CBO9781139174411.
Wang, H.-J. (2002), Heteroskedasticity and non-monotonic efficiency effects of a stochastic frontier model. Journal of Productivity Analysis, 18, 241–253.
teradial
, tenonradial
, teradialbc
, tenonradialbc
, nptestrts
, nptestind
, halton
, primes
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 |
require( npsf )
# Cross-sectional examples begin ------------------------------------------
# Load Penn World Tables 5.6 dataset
data( pwt56 )
head( pwt56 )
# Create some missing values
pwt56 [4, "K"] <- NA
# Stochastic production frontier model with
# homoskedastic error components (half-normal)
# Use subset of observations - for year 1965
m1 <- sf(log(Y) ~ log(L) + log(K), data = pwt56,
subset = year == 1965, distribution = "h")
m1 <- sf(log(Y) ~ log(L) + log(K), data = pwt56,
subset = year == 1965, distribution = "e")
# Test CRS: 'car' package in required for that
## Not run: linearHypothesis(m1, "log(L) + log(K) = 1")
# Write efficiencies to the data frame using 'esample':
pwt56$BC[ m1$esample ] <- m1$efficiencies$BC
## Not run: View(pwt56)
# Computation using matrices
Y1 <- as.matrix(log(pwt56[pwt56$year == 1965,
c("Y"), drop = FALSE]))
X1 <- as.matrix(log(pwt56[pwt56$year == 1965,
c("K", "L"), drop = FALSE]))
X1 [51, 2] <- NA # create missing
X1 [49, 1] <- NA # create missing
m2 <- sf(Y1 ~ X1, distribution = "h")
# Load U.S. commercial banks dataset
data(banks05)
head(banks05)
# Doubly heteroskedastic stochastic cost frontier
# model (truncated normal)
# Print summaries of cost efficiencies' estimates
m3 <- sf(lnC ~ lnw1 + lnw2 + lny1 + lny2, ln.var.u.0i = ~ ER,
ln.var.v.0i = ~ LA, data = banks05, distribution = "t",
prod = FALSE, print.level = 3)
m3 <- sf(lnC ~ lnw1 + lnw2 + lny1 + lny2, ln.var.u.0i = ~ ER,
ln.var.v.0i = ~ LA, data = banks05, distribution = "e",
prod = FALSE, print.level = 3)
# Non-monotonic marginal effects of equity ratio on
# the mean of distribution of inefficiency term
m4 <- sf(lnC ~ lnw1 + lnw2 + lny1 + lny2, ln.var.u.0i = ~ ER,
mean.u.0i = ~ ER, data = banks05, distribution = "t",
prod = FALSE, marg.eff = TRUE)
summary(m4$marg.effects)
# Cross-sectional examples end --------------------------------------------
## Not run:
# Panel data examples begin -----------------------------------------------
# The only way to differentiate between cross-sectional and panel-data
# models is by specifying "it".
# If "it" is not specified, cross-sectional model will be estimated.
# Example is below.
# Read data ---------------------------------------------------------------
# Load U.S. commercial banks dataset
data(banks00_07)
head(banks00_07, 5)
banks00_07$trend <- banks00_07$year - min(banks00_07$year) + 1
# Model specification -----------------------------------------------------
my.prod <- FALSE
my.it <- c("id","year")
# my.model = "BC1992"
# my.model = "K1990modified"
# my.model = "K1990"
# translog ----------------------------------------------------------------
formu <- log(TC) ~ (log(Y1) + log(Y2) + log(W1) + log(W2) + trend)^2 +
I(0.5*log(Y1)^2) + I(0.5*log(Y2)^2) + I(0.5*log(W1)^2) + I(0.5*log(W2)^2) +
trend + I(0.5*trend^2)
# Cobb-Douglas ------------------------------------------------------------
# formu <- log(TC) ~ log(Y1) + log(Y2) + log(W1) + log(W2) + trend + I(trend^2)
ols <- lm(formu, data = banks00_07)
# just to mark the results of the OLS model
summary(ols)
# Components specifications -----------------------------------------------
ln.var.v.it <- ~ log(TA)
ln.var.u.0i <- ~ ER_ave
mean.u.0i_1 <- ~ LLP_ave + LA_ave
mean.u.0i_2 <- ~ LLP_ave + LA_ave - 1
# Suppose "it" is not specified
# Then it is a cross-sectional model
t0_1st_0 <- sf(formu, data = banks00_07, subset = year > 2000,
prod = my.prod,
ln.var.v.it = ln.var.v.it,
ln.var.u.0i = ln.var.u.0i,
eff.time.invariant = TRUE,
mean.u.0i.zero = TRUE)
# 1st generation models ---------------------------------------------------
# normal-half normal ------------------------------------------------------
# the same as above but "it" is specified
t0_1st_0 <- sf(formu, data = banks00_07, it = my.it, subset = year > 2000,
prod = my.prod,
ln.var.v.it = ln.var.v.it,
ln.var.u.0i = ln.var.u.0i,
eff.time.invariant = TRUE,
mean.u.0i.zero = TRUE)
# Note efficiencies are time-invariant
# confidence intervals for efficiencies -----------------------------------
head(t0_1st_0$efficiencies, 20)
# normal-truncated normal -------------------------------------------------
# truncation point is constant (for all ids) ------------------------------
t0_1st_1 <- sf(formu, data = banks00_07, it = my.it, subset = year > 2000,
prod = my.prod,
eff.time.invariant = TRUE,
mean.u.0i.zero = FALSE,
ln.var.v.it = ln.var.v.it,
ln.var.u.0i = ln.var.u.0i,
mean.u.0i = NULL,
cost.eff.less.one = TRUE)
# truncation point is determined by variables -----------------------------
t0_1st_2 <- sf(formu, data = banks00_07, it = my.it, subset = year > 2000,
prod = my.prod,
eff.time.invariant = TRUE,
mean.u.0i.zero = FALSE,
mean.u.0i = mean.u.0i_1,
ln.var.v.it = ln.var.v.it,
ln.var.u.0i = ln.var.u.0i,
cost.eff.less.one = TRUE)
# the same, but without intercept in mean.u.0i
t0_1st_3 <- sf(formu, data = banks00_07, it = my.it, subset = year > 2000,
prod = my.prod,
eff.time.invariant = TRUE,
mean.u.0i.zero = FALSE,
mean.u.0i = mean.u.0i_2,
ln.var.v.it = ln.var.v.it,
ln.var.u.0i = ln.var.u.0i,
cost.eff.less.one = TRUE)
# 2nd generation models ---------------------------------------------------
# normal-half normal ------------------------------------------------------
# Kumbhakar (1990) model --------------------------------------------------
t_nhn_K1990 <- sf(formu, data = banks00_07, it = my.it, subset = year > 2000,
prod = my.prod,
eff.time.invariant = FALSE,
mean.u.0i.zero = TRUE,
ln.var.v.it = ln.var.v.it,
ln.var.u.0i = ln.var.u.0i,
cost.eff.less.one = FALSE)
# Kumbhakar (1990) modified model -----------------------------------------
t_nhn_K1990modified <- sf(formu, data = banks00_07, it = my.it, subset = year > 2000,
prod = my.prod, model = "K1990modified",
eff.time.invariant = FALSE,
mean.u.0i.zero = TRUE,
ln.var.v.it = ln.var.v.it,
ln.var.u.0i = ln.var.u.0i,
cost.eff.less.one = FALSE)
# Battese and Coelli (1992) model -----------------------------------------
t_nhn_BC1992 <- sf(formu, data = banks00_07, it = my.it, subset = year > 2000,
prod = my.prod, model = "BC1992",
eff.time.invariant = FALSE,
mean.u.0i.zero = TRUE,
ln.var.v.it = ln.var.v.it,
ln.var.u.0i = ln.var.u.0i,
cost.eff.less.one = FALSE)
# normal-truncated normal -------------------------------------------------
# Kumbhakar (1990) model --------------------------------------------------
# truncation point is constant (for all ids) ------------------------------
t_ntn_K1990_0 <- sf(formu, data = banks00_07, it = my.it, subset = year > 2000,
prod = my.prod,
eff.time.invariant = FALSE,
mean.u.0i.zero = FALSE,
ln.var.v.it = ln.var.v.it,
ln.var.u.0i = ln.var.u.0i,
cost.eff.less.one = FALSE)
# truncation point is determined by variables -----------------------------
t_ntn_K1990_1 <- sf(formu, data = banks00_07, it = my.it, subset = year > 2000,
prod = my.prod,
eff.time.invariant = FALSE,
mean.u.0i.zero = FALSE,
mean.u.0i = mean.u.0i_1,
ln.var.v.it = ln.var.v.it,
ln.var.u.0i = ln.var.u.0i,
cost.eff.less.one = FALSE)
# Efficiencies are tiny, since empirically truncation points are quite big.
# Try withouth an intercept in conditional mean f-n
t_ntn_K1990_2 <- sf(formu, data = banks00_07, it = my.it, subset = year > 2000,
prod = my.prod,
eff.time.invariant = FALSE,
mean.u.0i.zero = FALSE,
mean.u.0i = mean.u.0i_2,
ln.var.v.it = ln.var.v.it,
ln.var.u.0i = ln.var.u.0i,
cost.eff.less.one = FALSE)
# Kumbhakar (1990) modified model -----------------------------------------
t_ntn_K1990modified <- sf(formu, data = banks00_07, it = my.it, subset = year > 2000,
prod = my.prod, model = "K1990modified",
eff.time.invariant = FALSE,
mean.u.0i.zero = FALSE,
mean.u.0i = mean.u.0i_1,
ln.var.v.it = ln.var.v.it,
ln.var.u.0i = ln.var.u.0i,
cost.eff.less.one = FALSE)
# Battese and Coelli (1992) model -----------------------------------------
t_ntn_BC1992 <- sf(formu, data = banks00_07, it = my.it, subset = year > 2000,
prod = my.prod, model = "BC1992",
eff.time.invariant = FALSE,
mean.u.0i.zero = FALSE,
mean.u.0i = mean.u.0i_1,
ln.var.v.it = ln.var.v.it,
ln.var.u.0i = ln.var.u.0i,
cost.eff.less.one = FALSE)
# The next one (without "subset = year > 2000" option) converges OK
t_ntn_BC1992 <- sf(formu, data = banks00_07, it = my.it,
prod = my.prod, model = "BC1992",
eff.time.invariant = FALSE,
mean.u.0i.zero = FALSE,
mean.u.0i = mean.u.0i_1,
ln.var.v.it = ln.var.v.it,
ln.var.u.0i = ln.var.u.0i,
cost.eff.less.one = FALSE)
# 4 component model ------------------------------------------------------
# Note, R should better be more than 200, this is just for illustration.
# This is the model that takes long to be estimated.
# For the following example, 'mlmaximize' required 357 iterations and
# took 8 minutes.
# The time will increase with more data and more parameters.
formu <- log(TC) ~ log(Y1) + log(Y2) + log(W1) + log(W2) + trend
t_4comp <- sf(formu, data = banks00_07, it = my.it,
subset = year >= 2001 & year < 2006,
prod = my.prod, model = "4comp",
R = 500, initialize.halton = TRUE,
lmtol = 1e-5, maxit = 500, print.level = 4)
# With R = 500, 'mlmaximize' required 124 iterations and
# took 7 minutes.
# The time will increase with more data and more parameters.
formu <- log(TC) ~ log(Y1) + log(Y2) + log(W1) + log(W2) + trend
t_4comp_500 <- sf(formu, data = banks00_07, it = my.it,
subset = year >= 2001 & year < 2006,
prod = my.prod, model = "4comp",
R = 500, initialize.halton = TRUE,
lmtol = 1e-5, maxit = 500, print.level = 4)
# @e_i0, @e_it, and @e_over give efficiencies,
# where @ is either 'c' or 't' for cost or production function.
# e.g., t_ntn_4comp$ce_i0 from last model, gives persistent cost efficiencies
# Panel data examples end -------------------------------------------------
## End(Not run)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.