This vignette provides a short overview on the functionalities for model selection in the onlineforecast package. It follows up on the vignettes setup-data and setup-and-use-model, and continues the building load forecast modelling presented there. If something is introduced in the present text, but not explained, then have a look in the two preceding vignettes to find an explanation.
First Load the package, setup the model and calculate the forecasts:
# Load the package
library(onlineforecast)
Just start by taking a rather short data set:
# The data, just a rather short period to keep running times short
<- subset(Dbuilding, c("2010-12-15", "2011-02-01"))
D # Set the score period
$scoreperiod <- in_range("2010-12-22", D$t)
D#
$tday <- make_tday(D$t, 1:36) D
As a test we generate a random sequence and will use that as an input. In the model selection this should not be selected in the final model:
# Generate an input which is just random noise, i.e. should be removed in the selection
set.seed(83792)
$noise <- make_input(rnorm(length(D$t)), 1:36) D
Set up a model, which will serve as the full model in the selection:
# The full model
<- forecastmodel$new()
model # Set the model output
$output = "heatload"
model# Inputs (transformation step)
$add_inputs(Ta = "Ta",
modelnoise = "noise",
mu_tday = "fs(tday/24, nharmonics=4)",
mu = "one()")
# Regression step parameters
$add_regprm("rls_prm(lambda=0.9)")
model# Optimization bounds for parameters
$add_prmbounds(lambda = c(0.9, 0.99, 0.9999)) model
Finally, set the horizons to run (just keep a vector for later use):
# Select a model, just run optimization and score for a single horizon
$kseq <- 5 model
Now we can use the step_optim()
function for the selection. In each step new models are generated, with either one removed input or one added input, and then all the generated models are optimized and their scores compared. If any new model have an improved score compared to the currently selected model, then the new is selected and the process is repeated until no new improvement is achieved.
In addition to selecting inputs, then integer parameters can also be stepped through, e.g. the order of basis splined or the number of harmonics in Fourier series. Set which parameters to change in the step:
# The range to select the number of harmonics parameter in
<- list(mu_tday__nharmonics = c(min=2, max=6)) prm
Several stepping procedures are implemented:
The default procedure is backward selection with stepping in both directions. To make compilation of the vignette feasible some arguments were set, for real applications change the argument “control=list(maxit=1)” and “mc.cores=1”:
# Run the default selection, which is "both" and equivalent to "backwadboth"
# Note the control argument, which is passed to optim, it's now set to few
# iterations in the prm optimization
<- step_optim(model, D, prm, direction="both", control=list(maxit=1), mc.cores=1) Lboth
We now have the models selected in each step in and we see that the final model is decreased:
getse(Lboth, "model")
## $step1
##
## Output: heatload
## Inputs: Ta = Ta
## noise = noise
## mu_tday = fs(tday/24, nharmonics=4)
## mu = one()
##
##
## $step2
##
## Output: heatload
## Inputs: Ta = Ta
## noise = noise
## mu_tday = fs(tday/24, nharmonics=5)
## mu = one()
##
##
## $final
##
## Output: heatload
## Inputs: Ta = Ta
## mu_tday = fs(tday/24, nharmonics=5)
## mu = one()
Forward selection:
<- step_optim(model, D, prm, "forward", control=list(maxit=1), mc.cores=1) Lforward
getse(Lforward, "model")
## $step1
##
## Output: heatload
## Inputs: mu = one()
##
##
## $step2
##
## Output: heatload
## Inputs: Ta = Ta
## mu = one()
##
##
## $step3
##
## Output: heatload
## Inputs: Ta = Ta
## mu_tday = fs(tday/24, nharmonics=2)
## mu = one()
##
##
## $step4
##
## Output: heatload
## Inputs: Ta = Ta
## mu_tday = fs(tday/24, nharmonics=3)
## mu = one()
##
##
## $step5
##
## Output: heatload
## Inputs: Ta = Ta
## mu_tday = fs(tday/24, nharmonics=4)
## mu = one()
##
##
## $final
##
## Output: heatload
## Inputs: Ta = Ta
## mu_tday = fs(tday/24, nharmonics=5)
## mu = one()
Same model is selected, which is also the case in backwards selection:
<- step_optim(model, D, prm, "backward", control=list(maxit=1), mc.cores=1) Lbackward
getse(Lbackward, "model")
## $step1
##
## Output: heatload
## Inputs: Ta = Ta
## noise = noise
## mu_tday = fs(tday/24, nharmonics=6)
## mu = one()
##
##
## $step2
##
## Output: heatload
## Inputs: Ta = Ta
## noise = noise
## mu_tday = fs(tday/24, nharmonics=5)
## mu = one()
##
##
## $final
##
## Output: heatload
## Inputs: Ta = Ta
## mu_tday = fs(tday/24, nharmonics=5)
## mu = one()
Give a starting model. The selection will start from this model:
# Clone the model to make a starting model
<- model$clone_deep()
modelstart # Remove two inputs
$inputs[2:3] <- NULL
modelstart# Run the selection
<- step_optim(model, D, prm, modelstart=modelstart, control=list(maxit=1), mc.cores=1) L
getse(L, "model")
## $step1
##
## Output: heatload
## Inputs: Ta = Ta
## mu = one()
##
##
## $step2
##
## Output: heatload
## Inputs: Ta = Ta
## mu_tday = fs(tday/24, nharmonics=4)
## mu = one()
##
##
## $final
##
## Output: heatload
## Inputs: Ta = Ta
## mu_tday = fs(tday/24, nharmonics=5)
## mu = one()
Note, that caching can be really smart (the cache files are located in the cachedir folder (folder in current working directory, can be removed with unlink(foldername))
. See e.g. ?rls_optim
for how the caching works. Give the arguments ‘cachedir=“cache”, cachererun=FALSE)’, which will be passed on to rls_optim()
.