Variable selection in DEA is a question that requires full attention before the results of an analysis can be used in a real case, because its results can be significantly modified depending on the variables included in the model. So, variable selection is a keystone step in each DEA application.
The selection procedure can lead to remove a variable that decision maker could want to keep a variable in the model for political, tactical or any other reason. But the contribution of that variable will be negligible if nothing is done. cadea
function provides a way force the contribution of a variable to a model be at least a given value.
For more information about loads help of the package about adea
or see (Fernandez-Palacin, Lopez-Sanchez, and Munoz-Marquez 2018) and (Villanueva-Cantillo and Munoz-Marquez 2021).
Let’s load and have a look at the tokyo_libraries
dataset with
data(tokyo_libraries)
head(tokyo_libraries)
#> Area.I1 Books.I2 Staff.I3 Populations.I4 Regist.O1 Borrow.O2
#> 1 2.249 163.523 26 49.196 5.561 105.321
#> 2 4.617 338.671 30 78.599 18.106 314.682
#> 3 3.873 281.655 51 176.381 16.498 542.349
#> 4 5.541 400.993 78 189.397 30.810 847.872
#> 5 11.381 363.116 69 192.235 57.279 758.704
#> 6 10.086 541.658 114 194.091 66.137 1438.746
First of all let’s do an adea
with the following call
tokyo_libraries[, 1:4]
input <- tokyo_libraries[, 5:6]
output <- adea(input, output)
m <-summary(m)
#> Model name:
#> Orientation is input
#> Inputs: Area.I1 Books.I2 Staff.I3 Populations.I4
#> Outputs: Regist.O1 Borrow.O2
#> Input loads: 0.455467 1.337169 0.9818858 1.225478
#> Output loads: 0.7639428 1.236057
#> Model load: 0.455466997834608
#> #Efficients: 6
#> Efficiencies:
#> 1 2 3 4 5 6 7 8
#> 0.3500108 0.7918292 0.5733000 0.7186833 1.0000000 1.0000000 0.6967419 0.5803315
#> 9 10 11 12 13 14 15 16
#> 1.0000000 0.7051438 0.5689146 0.7583527 0.7474946 0.7215430 0.8440736 0.5822710
#> 17 18 19 20 21 22 23
#> 1.0000000 0.7867065 1.0000000 0.8485716 0.7872304 0.7849437 1.0000000
#> Summary of efficiencies:
#> Mean sd Min. 1st Qu. Median 3rd Qu. Max.
#> 0.7759192 0.1747024 0.3500108 0.7009429 0.7849437 0.9242858 1.0000000
It shows that Area.I1
has a load under 0.6, which means its contribution to DEA model is negligible.
With the following call to cadea
the contribution of Area.I1
is force to be higher than 0.6:
cadea(input, output, load.min = 0.6, load.max = 4)
mc <-summary(mc)
#> Model name:
#> Orientation is input
#> Inputs: Area.I1 Books.I2 Staff.I3 Populations.I4
#> Outputs: Regist.O1 Borrow.O2
#> Input loads: 0.6 1.164404 0.932502 1.303094
#> Output loads: 0.8001614 1.199839
#> Model load: 0.600000000000042
#> #Efficients: 6
#> Efficiencies:
#> 1 2 3 4 5 6 7 8
#> 0.3490718 0.7918292 0.5697767 0.7070362 1.0000000 1.0000000 0.6967419 0.5802858
#> 9 10 11 12 13 14 15 16
#> 1.0000000 0.7051438 0.5689146 0.7583527 0.7474946 0.7215430 0.8302530 0.5822710
#> 17 18 19 20 21 22 23
#> 1.0000000 0.7691173 1.0000000 0.8485716 0.7872304 0.7815638 1.0000000
#> Summary of efficiencies:
#> Mean sd Min. 1st Qu. Median 3rd Qu. Max.
#> 0.7737042 0.1749367 0.3490718 0.7009429 0.7691173 0.9242858 1.0000000
Note that the maximum value of a variable load is the maximum number of variables of its types, so load.max = 4
has no effect on results.
Now load level raises to the given value of 0.6, efficiency average decreases a little.
To compare both efficiency set, observe that Spearman correlation coefficient between them is 0.9918. This can also be seen in the next plot:
All these mean that in this case the change are small. Bigger change can be expected if load.min
grows.
Fernandez-Palacin, Fernando, Marı́a Auxiliadora Lopez-Sanchez, and Manuel Munoz-Marquez. 2018. “Stepwise selection of variables in DEA using contribution loads.” Pesquisa Operacional 38 (1): 31–52. http://dx.doi.org/10.1590/0101-7438.2018.038.01.0031.
Villanueva-Cantillo, Jeyms, and Manuel Munoz-Marquez. 2021. “Methodology for Calculating Critical Values of Relevance Measures in Variable Selection Methods in Data Envelopment Analysis.” European Journal of Operational Research 290 (2): 657–70. https://doi.org/10.1016/j.ejor.2020.08.021.
Universidad de Cádiz, fernando.fernandez@uca.es↩︎
Universidad de Cádiz, manuel.munoz@uca.es↩︎