nntrf hyper-parameter tuning

Ricardo Aler

2021-02-26

library(nntrf)
library(mlr)
library(mlrCPO)
library(FNN)

nntrf Hyper-parameter Tuning

nntrf has several hyper-parameters which are important in order to obtain good results. Those are:

Machine learning pipelines usually contain two kinds of steps: pre-processing and classifier/regressor. Both kinds of steps contain hyper-parameters and they are optimized together. nntrf is a preprocessing step. The classifier method that will be used after preprocessing is KNN, whose main hyper-parameter is the number of neighbors (k). Hyper-parameter tuning could be programmed from scratch, but it is more efficient to use the procedures already available in machine learning packages such as mlr or Caret. In this case, mlr will be used. Code to do that is described below.

The next piece of code has nothing to do with nntrf. It just establishes that the doughnutRandRotated dataset is going to be used (with target variable “V11”), that grid search is going to be used for hyper-parameter tuning, that an external 3-fold crossvalidation is going to be used to evaluate models, while an inner 3-fold crossvalidation is going to be used for hyper-parameter tuning.

data("doughnutRandRotated")
doughnut_task <- makeClassifTask(data = doughnutRandRotated, target = "V11")
control_grid <- makeTuneControlGrid()
inner_desc <- makeResampleDesc("CV", iter=3)
outer_desc <-  makeResampleDesc("CV", iter=3)
set.seed(0)
outer_inst <- makeResampleInstance(outer_desc, doughnut_task)

A mlr subpakage, called mlrCPO, is going to be used to combine pre-processing and learning into a single pipeline. In order to do that, nntrf must be defined as a pipeline step, as follows. Basically, it defines train and retrafo methods. The former, trains the neural networks and stores the hidden layer weights, the latter applies the transformation on a dataset. pSS is used to define the main nntrf hyper-parameters. The piece of code below can just be copied for use in other scripts.

cpo_nntrf = makeCPO("nntrfCPO",  
                       # Here, the hyper-parameters of nntrf are defined
                       pSS(repetitions = 1 : integer[1, ],
                           size: integer[1, ],
                           maxit = 100 : integer[1, ],
                           use_sigmoid = FALSE: logical),
                       dataformat = "numeric",
                       cpo.train = function(data, target, 
                                            repetitions, 
                                            size, maxit, use_sigmoid) {
                         data_and_class <- cbind(as.data.frame(data), class=target[[1]])
                         nnpo <- nntrf(repetitions=repetitions,
                                       formula=class~.,
                                       data=data_and_class,
                                       size=size, maxit=maxit, trace=FALSE)
                       },
                       cpo.retrafo = function(data, control, 
                                              repetitions, 
                                              size, maxit, use_sigmoid) {
                       
                         trf_x <- control$trf(x=data,use_sigmoid=use_sigmoid)
                         trf_x
                       })

Next, the pipeline of pre-processing + classifier method (KNN in this case) is defined.

# knn is the machine learning method. The knn available in the FNN package is used
knn_lrn <- makeLearner("classif.fnn")
# Then, knn is combined with nntrf's preprocessing into a pipeline
knn_nntrf <- cpo_nntrf() %>>% knn_lrn
# Just in case, we fix the values of the hyper-parameters that we do not require to optimize
# (not necessary, because they already have default values. Just to make their values explicit)
knn_nntrf <- setHyperPars(knn_nntrf, nntrfCPO.repetitions=1, nntrfCPO.maxit=100,
                          nntrfCPO.use_sigmoid=FALSE)

# However, we are going to use 2 repetitions here, instead of 1 (the default):

knn_nntrf <- setHyperPars(knn_nntrf, nntrfCPO.repetitions=2)

Next, the hyper-parameter space for the pipeline is defined. Only two hyper-parameters will be optimized: the number of KNN neighbors (k), from 1 to 7, and the number of hidden neurons (size), from 1 to 10. The remaining hyper-parameters are left to some default values.

ps <- makeParamSet(makeDiscreteParam("k", values = 1:7),
                   makeDiscreteParam("nntrfCPO.size", values = 1:10)
)

Next, a mlr wrapper is used to give the knn_nntrf pipeline the ability to do hyper-parameter tuning.

knn_nntrf_tune <- makeTuneWrapper(knn_nntrf, resampling = inner_desc, par.set = ps, 
                                     control = control_grid, measures = list(acc), show.info = FALSE)

Finally, the complete process (3-fold hyper-parameter tuning) and 3-fold outer model evaluation is run. It takes some time.

set.seed(0)
# Please, note that in order to save time, results have been precomputed
cached <- system.file("extdata", "error_knn_nntrf_tune.rda", package = "nntrf")
if(file.exists(cached)){load(cached)} else {
error_knn_nntrf_tune <- resample(knn_nntrf_tune, doughnut_task, outer_inst, 
                                 measures = list(acc), 
                                 extract = getTuneResult, show.info =  FALSE)
#save(error_knn_nntrf_tune, file="../inst/extdata/error_knn_nntrf_tune.rda")
}

Errors and optimal hyper-parameters are as follows (the 3-fold inner hyper-parameter tuning crossvalidation accuracy is also shown in acc.test.mean ). nntrfCPO.size is the number of hidden neurons selected by hyper-parameter tuning. Despite the optimal value is 2 (the actual dougnut is defined in two dimensions only), hyper-parameter tuning is not able to reducide dimensionality that much in this case. But it will be shown (later) that the accuracy obtained by nntrf+knn is good.

print(error_knn_nntrf_tune$extract)
#> [[1]]
#> Tune result:
#> Op. pars: k=4; nntrfCPO.size=10
#> acc.test.mean=0.9602523
#> 
#> [[2]]
#> Tune result:
#> Op. pars: k=4; nntrfCPO.size=8
#> acc.test.mean=0.9631010
#> 
#> [[3]]
#> Tune result:
#> Op. pars: k=3; nntrfCPO.size=5
#> acc.test.mean=0.9708971

The final outer 3-fold crossvalition accuracy is displayed in the next cell. Please, note that this acc.test.mean corresponds to the outer 3-fold crossvalidation, while the acc.test.mean above, corresponds to the inner 3-fold crossvalidation accuracy (computed during hyper-parameter tuning).

print(error_knn_nntrf_tune$aggr)
#> acc.test.mean 
#>     0.9655999

Although not required, mlr allows to display the results of the different hyper-parameter values, sorted by the inner 3-fold crossvalidation accuracy, from best to worse.

library(dplyr)
results_hyper <- generateHyperParsEffectData(error_knn_nntrf_tune)
head(arrange(results_hyper$data, -acc.test.mean))
#>   k nntrfCPO.size acc.test.mean iteration exec.time nested_cv_run
#> 1 3             5     0.9708971        31     2.821             3
#> 2 7             4     0.9668467        28     2.604             3
#> 3 4             8     0.9631010        53     4.002             2
#> 4 7             8     0.9610013        56     3.876             2
#> 5 4            10     0.9602523        67     4.363             1
#> 6 5             8     0.9601016        54     4.108             2

We can also check directly what would happen with only 4 neurons (and 5 neighbors).

knn_nntrf <- cpo_nntrf() %>>% makeLearner("classif.fnn")

knn_nntrf <- setHyperPars(knn_nntrf, nntrfCPO.repetitions=2, nntrfCPO.maxit=100,
                          nntrfCPO.use_sigmoid=FALSE, k=5, nntrfCPO.size=4)

set.seed(0)
# Please, note that in order to save time, results have been precomputed
cached <- system.file("extdata", "error_knn_nntrf.rda", package = "nntrf")
if(file.exists(cached)){load(cached)} else {
  error_knn_nntrf <- resample(knn_nntrf, doughnut_task, outer_inst, measures = list(acc), 
                            show.info =  FALSE)
#save(error_knn_nntrf, file="../inst/extdata/error_knn_nntrf.rda")
}
# First, the three evaluations of the outer 3-fold crossvalidation, one per fold:
print(error_knn_nntrf$measures.test)
#>   iter       acc
#> 1    1 0.9564956
#> 2    2 0.9741974
#> 3    3 0.9271146
# Second, their average
print(error_knn_nntrf$aggr)
#> acc.test.mean 
#>     0.9526025

Hyper-parameter tuning with PCA

In order to compare a supervised transformation method (nntrf) with an unsupervised one (PCA), it is very easy to do exactly the same pre-processing with PCA. In this case, the main hyper-parameters are k (number of KNN neighbors) and Pca.rank (the number of PCA components to be used, which would be the counterpart of size, the number of hidden neurons used by nntrf).

knn_pca <- cpoPca(center=TRUE, scale=TRUE, export=c("rank")) %>>% knn_lrn

ps_pca <- makeParamSet(makeDiscreteParam("k", values = 1:7),
                       makeDiscreteParam("pca.rank", values = 1:10)
)

knn_pca_tune <- makeTuneWrapper(knn_pca, resampling = inner_desc, par.set = ps_pca, 
                                     control = control_grid, measures = list(acc), show.info = FALSE)
set.seed(0)
# Please, note that in order to save time, results have been precomputed

cached <- system.file("extdata", "error_knn_pca_tune.rda", package = "nntrf")
if(file.exists(cached)){load(cached)} else {
error_knn_pca_tune <- resample(knn_pca_tune, doughnut_task, outer_inst, 
                               measures = list(acc), 
                               extract = getTuneResult, show.info =  FALSE)
#save(error_knn_pca_tune, file="../inst/extdata/error_knn_pca_tune.rda")
}

It can be seen below that while nntrf was able to get a high accuracy, PCA only gets to nearly 0.65. Also the number of components required by PCA is the maximum allowed (pca.rank=10)

print(error_knn_pca_tune$extract)
#> [[1]]
#> Tune result:
#> Op. pars: k=2; pca.rank=10
#> acc.test.mean=0.6338697
#> 
#> [[2]]
#> Tune result:
#> Op. pars: k=6; pca.rank=10
#> acc.test.mean=0.6401682
#> 
#> [[3]]
#> Tune result:
#> Op. pars: k=6; pca.rank=10
#> acc.test.mean=0.6398140
print(error_knn_pca_tune$aggr)
#> acc.test.mean 
#>     0.6384994
results_hyper <- generateHyperParsEffectData(error_knn_pca_tune)
head(arrange(results_hyper$data, -acc.test.mean))
#>   k pca.rank acc.test.mean iteration exec.time nested_cv_run
#> 1 6       10     0.6401682        69     1.880             2
#> 2 6       10     0.6398140        69     1.760             3
#> 3 4       10     0.6380138        67     1.634             3
#> 4 4       10     0.6362675        67     1.763             2
#> 5 2       10     0.6347687        65     1.762             2
#> 6 2       10     0.6338697        65     1.475             1

Hyper-parameter tuning with just KNN

For completeness sake, below are the results with no pre-processing, just KNN (results are very similar to the ones with PCA):


ps_knn <- makeParamSet(makeDiscreteParam("k", values = 1:7))


knn_tune <- makeTuneWrapper(knn_lrn, resampling = inner_desc, par.set = ps_knn, 
                                     control = control_grid, measures = list(acc), show.info = FALSE)

set.seed(0)
# Please, note that in order to save time, results have been precomputed
cached <- system.file("extdata", "error_knn_tune.rda", package = "nntrf")
if(file.exists(cached)){load(cached)} else {
error_knn_tune <- resample(knn_tune, doughnut_task, outer_inst, measures = list(acc), 
                           extract = getTuneResult, show.info =  FALSE)
#save(error_knn_tune, file="../inst/extdata/error_knn_tune.rda")
}
print(error_knn_tune$extract)
#> [[1]]
#> Tune result:
#> Op. pars: k=6
#> acc.test.mean=0.6362696
#> 
#> [[2]]
#> Tune result:
#> Op. pars: k=6
#> acc.test.mean=0.6343180
#> 
#> [[3]]
#> Tune result:
#> Op. pars: k=4
#> acc.test.mean=0.6336634
print(error_knn_tune$aggr)
#> acc.test.mean 
#>     0.6383997