# semsfa: Semiparametric Estimation of Stochastic Frontier Models In semsfa: Semiparametric Estimation of Stochastic Frontier Models

## Description

Semiparametric Estimation of Stochastic Frontier Models following the two step procedure proposed by Fan et al (1996) and further developed by Vidoli and Ferrara (2015) and Ferrara and Vidoli (2017). In the first step semiparametric or nonparametric regression techniques are used to relax parametric restrictions regards the functional form of the frontier and in the second step variance parameters are obtained by pseudolikelihood or method of moments estimators. Monotonicity restrinctions can be imposed by means of P-splines.

## Usage

 1 2 semsfa(formula, data = list(), sem.method = "gam", var.method = "fan", ineffDecrease=TRUE, tol = 1e-05, n.boot=0,...) 

## Arguments

 formula an object of class "formula": a symbolic description of the model to be fitted. The details of model specification are given under 'Details' data a data frame containing the variables in the model sem.method a character string indicating the type of estimation method to be used in the first step for the semiparametric or nonparametric regression; possible values are "gam" (default), "gam.mono" for monotone gam, "kernel" or "loess" var.method the type of estimation method to be used in the second step for the variance components: "fan" (default) for Fan et al. (1996) approach and "mm" for method of moments ineffDecrease logical: TRUE (default) for estimating a production function, FALSE for estimating a cost function; this is done for usage compatibility with frontier package tol numeric. Convergence tolerance for pseudolikelihood estimators of variance parameters of the composed error term n.boot numeric. Number of bootstrap replicates to calculate standard error for the variance components, by default bootstrap standard errors will not be calculated (n.boot=0) ... further arguments accepted by mgcv::gam, gamlss::gamlss, np::npreg or loess

## Details

Parametric stochastic production frontier models, introduced by Aigner et al. (1977) and Meeusen and van den Broeck (1977), specify output in terms of a response function and a composite error term. The composite error term consists of a two-sided error representing random effects and a one-sided term representing technical inefficiency. The production stochastic frontier model can be written, in general terms, as:

y_i = f (x_i)+v_i - u_i,\quad \quad i = 1, ..., n,

where Y_i\in R^+ is the single output of unit i, X_i\in R^{+}_{p} is the vector of inputs, f(.) defines a production frontier relationship between inputs X and the single output Y. In following common practice, we assume that v and u are each identically independently distributed (iid) with v~ N(0,σ_v) and u distributed half-normally on the non-negative part of the real number line: u~ N^{+}(0,σ_u); furthermore, the probability density function of the composite disturbance can be rewritten in terms of λ = σ_u/σ_v and σ^2 = σ_v^2+σ_u^2 for the estimation algorithm. To overcome drawbacks due to the specification of a particular production function f(\cdot) we consider the estimation of a Semiparametric Stochastic Production Frontier Models through a two step procedure originally proposed by Fan et al (1996): in the first step a semiparametric or nonparametric regression technique is used to estimate the conditional expectation, while in the second step λ and σ parameters are estimated by pseudolikelihood (via optimize) or by method of moments estimators (var.method argument). In the case of a cost function frontier (ineffDecrease=FALSE) the composite error term is ε = v + u. Vidoli and Ferrara (2015) suggest a Generalized Additive Model (GAM) framework in the first step even if any semiparametric or nonparametric tecnique may be used (Fan et al., 1996). The avalaible methods for the first step are:

• sem.method="gam" invokes gam() from mgcv;

• sem.method="gam.mono" invokes gamlss() from gamlss to impose monotonicity restrictions on inputs;

• sem.method="kernel" invokes npreg() from np;

• sem.method="loess" invokes loess() from stats.

Since in the first step different estimation procedure may be invoked from different packages, the formula argument has to be compatible with the corresponding function. The avalaible methods for the second step are:

• var.method="fan" pseudolikelihood;

• var.method="mm" Method of Moments.

## Value

semsfa() returns an object of class semsfa. An semsfa object is a list containing the following components:

 formula the formula used y the response variable used as specified in formula data the data frame used call the matched call sem.method the type of semiparametric or nonparametric regression as given by sem.method ("gam", "gam.mono", "kernel", "loess") var.method the type of error component estimator ("fan", "mm") ineffDecrease logical, as given by ineffDecrease reg an object of class "gam", "gamlss" (monotone gam), "np"(kernel) or "loess" depending on sem.method reg.fitted fitted values on the "mean" frontier (semiparametric/non parametric regression) regkewness asymmetry index calculated on residuals obtained in the first step lambda λ estimate sigma σ estimate fitted fitted values on the frontier tol convergence tolerance for pseudolikelihood estimators used in optimize residual.df residual degree of freedom of the model bic 'Bayesian Information Criterion' according to the formula -2*log-likelihood+ log(n)*npar where npar represents the number of parameters in the fitted model and n the number of observations n.boot number of bootstrap replicates used (default n.boot=0) boot.mat a matrix containing λ and σ values from each bootstrap replicate (if n.boot>0) b.se boostrapped standard errors for λ and σ (if n.boot>0)

## Note

The function summary (i.e. summary.semsfa) can be used to obtain a summary of the results, efficiencies.semsfa to calculate efficiency scores and plot (i.e. plot.semsfa) to graph efficiency previsions and regression components (i.e. the first step).

You must take the natural logarithm of the response variable before fitting a stochastic frontier production or cost model.

## Author(s)

Giancarlo Ferrara

## References

Aigner., D., Lovell, C.A.K., Schmidt, P., 1977. Formulation and estimation of stochastic frontier production function models. Journal of Econometrics 6:21-37

Fan, Y., Li, Q., Weersink, A., 1996. Semiparametric estimation of stochastic production frontier models. Journal of Business & Economic Statistics 14:460-468

Ferrara, G., Vidoli, F., 2017. Semiparametric stochastic frontier models: A generalized additive model approach. European Journal of Operational Research, 258:761-777.

Hastie, T., Tibshirani, R., 1990. Generalized additive models. Chapman & Hall

Kumbhakar, S.C., Lovell, C.A.K, 2000. Stochastic Frontier Analysis. Cambridge University Press, U.K

Meeusen, W., van den Broeck, J., 1977. Efficiency estimation from Cobb-Douglas production functions with composed error. International Economic Review, 18:435-444

Vidoli, F., Ferrara, G., 2015. Analyzing Italian citrus sector by semi-nonparametric frontier efficiency models. Empirical Economics, 49:641-658

summary.semsfa, efficiencies.semsfa, plot.semsfa.
  1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51  set.seed(0) n<-200 x<- runif(n, 1, 2) v<- rnorm(n, 0, 1) u<- abs(rnorm(n,0,2.5)) #cost frontier fy<- 2+30*x+5*x^2 y <- fy + v + u dati<-data.frame(y,x) #first-step: gam, second-step: fan o<-semsfa(y~s(x),dati,sem.method="gam",ineffDecrease=FALSE) #first-step: gam, second-step: mm ## Not run: o<-semsfa(y~s(x),dati,sem.method="gam",ineffDecrease=FALSE,var.method="mm") plot(x,y) curve(2+30*x+5*x^2,add=TRUE) points(sort(x),o$fitted[order(x)],col=3,type="l") #production frontier fy<- 2+30*x-5*x^2 y <- fy + v - u dati<-data.frame(y,x) #first-step: gam, second-step: fan o<-semsfa(y~s(x),dati,sem.method="gam",ineffDecrease=TRUE) plot(x,y) curve(2+30*x-5*x^2,add=TRUE) points(sort(x),o$fitted[order(x)],col=3,type="l") #imposing monotonicity restrictions on inputs set.seed(25) n=150 x=runif(n,0,3) u=abs(rnorm(n,0,1)) v=rnorm(n,0,.75*((pi-2)/pi)) #production frontier fy<-10-5*exp(-x) y <- fy+v-u dati<-data.frame(y,x) #first-step: monotone gam, second-step: fan o<-semsfa(y~pbm(x,mono="up"),sem.method = "gam.mono",dati) plot(x,y) curve(10-5*exp(-x),add=TRUE) points(sort(x),o\$fitted[order(x)],col=3,type="l")