knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.width = 6, fig.height = 4 ) library('adea')
Variable selection in Data Envelopment Analysis (DEA) is a crucial consideration that requires careful attention before the results of an analysis can be applied in a real-world context. This is because the outcomes can vary significantly depending on the variables included in the model. Therefore, variable selection is a fundamental step in every DEA application.
One of the methods proposed for this is what is known as the feature selection method. This method constructs a linear programming problem to maximize some objective function related to the dmu efficiencies.
fsdea
function implements the feature selection method in the article [@Benites2020]
Let's load and inspect the "tokyo_libraries" dataset using the following code:
data(tokyo_libraries) head(tokyo_libraries)
Lets't take inputs as: Area.I1
, Books.I2
, Staff.I3
and Populations.I4
variables and outputs as Regist.O1
and Borrow.O2
input <- tokyo_libraries[, 1:4] output <- tokyo_libraries[, 5:6]
First, let's do a standard DEA analysis with:"
dea <- dea(input, output) dea
Summarizing DEA, we can calculate the average efficiency:
summary(dea)
Suposse that we want a model with only 5 variables, then the following call does the job:
dea5v <- fsdea(input, output, nvariables = 5) dea5v
We can calculate the average efficiency for the new model by summarizing it:
summary(dea5v)
deasummary <- summary(dea) deaaverage <- deasummary$`Eff. Med` dea5vsummary <- summary(dea5v) dea5vaverage <- dea5vsummary$`Eff. Med`
Observe that average efficiency has decreased from r deaaverage
to r dea5vaverage
.
This could indicate that the new model is not an improvement over the previous one.
To delve deeper into the results, we can plot the efficiencies:
plot(dea$eff, dea5v$eff) abline(a = 0, b = 1)
The graph reveals that most efficiencies are very similar, a fact that is confirmed by the correlation coefficient r cor(dea$eff, dea5v$eff)
which is very close to 1.
In the previous case, reducing the number of variables led to a decrease in one input.
We achieved the same result with the call fsdea(input, output, ninputs = 3)
.
However, perhaps our goal is to reduce an output.
Let's achieve this with the following call:
dea1o <- fsdea(input, output, noutputs = 1) dea1o
dea1osummary <- summary(dea1o) dea1oaverage <- dea1osummary$`Eff. Mean`
Observe that, again, average efficiency has decreased from r deaaverage
to r dea1oaverage
.
To delve deeper into the results, we can plot the new efficiencies:
plot(dea$eff, dea1o$eff) abline(a = 0, b = 1)
In this case, the differences are more significant compared to the previous two models. This is confirmed by a lower correlation coefficient r cor(dea$eff, dea1o$eff)
.
All these could indicate that to enhance the model, it may be necessary to remove multiple variables simultaneously.
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.