Ensemble Algorithms for Time Series Forecasting with Modeltime
A modeltime
extension that that implements
ensemble forecasting methods including model
averaging, weighted averaging, and stacking. Let’s go through a guided
tour to kick the tires on modeltime.ensemble
.
We’ll perform the simplest type of forecasting: Using a simple average of the forecasted models.
Note that modeltime.ensemble
has capabilities for more
sophisticated model ensembling using:
Load libraries to complete this short tutorial.
# Time Series ML
library(tidymodels)
library(modeltime)
library(modeltime.ensemble)
# Core
library(tidyverse)
library(timetk)
<- FALSE interactive
We’ll use the m750
dataset that comes with
modeltime.ensemble
. We can visualize the dataset.
%>%
m750 plot_time_series(date, value, .color_var = id, .interactive = interactive)
We’ll split into a training and testing set.
<- time_series_split(m750, assess = "2 years", cumulative = TRUE)
splits
%>%
splits tk_time_series_cv_plan() %>%
plot_time_series_cv_plan(date, value, .interactive = interactive)
Once the data has been collected, we can move into modeling.
We’ll create a Feature Engineering Recipe that can be applied to the data to create features that machine learning models can key in on. This will be most useful for the Elastic Net (Model 3).
<- recipe(value ~ date, training(splits)) %>%
recipe_spec step_timeseries_signature(date) %>%
step_rm(matches("(.iso$)|(.xts$)")) %>%
step_normalize(matches("(index.num$)|(_year$)")) %>%
step_dummy(all_nominal()) %>%
step_fourier(date, K = 1, period = 12)
%>% prep() %>% juice() recipe_spec
#> # A tibble: 282 x 42
#> date value date_index.num date_year date_half date_quarter date_month
#> <date> <dbl> <dbl> <dbl> <int> <int> <int>
#> 1 1990-01-01 6370 -1.72 -1.66 1 1 1
#> 2 1990-02-01 6430 -1.71 -1.66 1 1 2
#> 3 1990-03-01 6520 -1.70 -1.66 1 1 3
#> 4 1990-04-01 6580 -1.69 -1.66 1 2 4
#> 5 1990-05-01 6620 -1.67 -1.66 1 2 5
#> 6 1990-06-01 6690 -1.66 -1.66 1 2 6
#> 7 1990-07-01 6000 -1.65 -1.66 2 3 7
#> 8 1990-08-01 5450 -1.64 -1.66 2 3 8
#> 9 1990-09-01 6480 -1.62 -1.66 2 3 9
#> 10 1990-10-01 6820 -1.61 -1.66 2 4 10
#> # ... with 272 more rows, and 35 more variables: date_day <int>,
#> # date_hour <int>, date_minute <int>, date_second <int>, date_hour12 <int>,
#> # date_am.pm <int>, date_wday <int>, date_mday <int>, date_qday <int>,
#> # date_yday <int>, date_mweek <int>, date_week <int>, date_week2 <int>,
#> # date_week3 <int>, date_week4 <int>, date_mday7 <int>,
#> # date_month.lbl_01 <dbl>, date_month.lbl_02 <dbl>, date_month.lbl_03 <dbl>,
#> # date_month.lbl_04 <dbl>, date_month.lbl_05 <dbl>, ...
First, we’ll make an ARIMA model using Auto ARIMA.
<- arima_reg() %>%
model_spec_arima set_engine("auto_arima")
<- workflow() %>%
wflw_fit_arima add_model(model_spec_arima) %>%
add_recipe(recipe_spec %>% step_rm(all_predictors(), -date)) %>%
fit(training(splits))
Next, we’ll make a Prophet Model.
<- prophet_reg() %>%
model_spec_prophet set_engine("prophet")
<- workflow() %>%
wflw_fit_prophet add_model(model_spec_prophet) %>%
add_recipe(recipe_spec %>% step_rm(all_predictors(), -date)) %>%
fit(training(splits))
Third, we’ll make an Elastic Net Model using glmnet
.
<- linear_reg(
model_spec_glmnet mixture = 0.9,
penalty = 4.36e-6
%>%
) set_engine("glmnet")
<- workflow() %>%
wflw_fit_glmnet add_model(model_spec_glmnet) %>%
add_recipe(recipe_spec %>% step_rm(date)) %>%
fit(training(splits))
With the models created, we can can create an Ensemble Average Model using a simple Mean Average.
Create a Modeltime Table using the modeltime
package.
<- modeltime_table(
m750_models
wflw_fit_arima,
wflw_fit_prophet,
wflw_fit_glmnet
)
m750_models
#> # Modeltime Table
#> # A tibble: 3 x 3
#> .model_id .model .model_desc
#> <int> <list> <chr>
#> 1 1 <workflow> ARIMA(0,1,1)(0,1,1)[12]
#> 2 2 <workflow> PROPHET
#> 3 3 <workflow> GLMNET
Then use ensemble_average()
to turn that Modeltime Table
into a Modeltime Ensemble. This is a
fitted ensemble specification containing the ingredients to
forecast future data and be refitted on data sets using the 3
submodels.
<- m750_models %>%
ensemble_fit ensemble_average(type = "mean")
ensemble_fit
#> -- Modeltime Ensemble -------------------------------------------
#> Ensemble of 3 Models (MEAN)
#>
#> # Modeltime Table
#> # A tibble: 3 x 3
#> .model_id .model .model_desc
#> <int> <list> <chr>
#> 1 1 <workflow> ARIMA(0,1,1)(0,1,1)[12]
#> 2 2 <workflow> PROPHET
#> 3 3 <workflow> GLMNET
To forecast, just follow the Modeltime Workflow.
# Calibration
<- modeltime_table(
calibration_tbl
ensemble_fit%>%
) modeltime_calibrate(testing(m750_splits))
# Forecast vs Test Set
%>%
calibration_tbl modeltime_forecast(
new_data = testing(m750_splits),
actual_data = m750
%>%
) plot_modeltime_forecast(.interactive = interactive)
Once satisfied with our ensemble model, we can
modeltime_refit()
on the full data set and forecast forward
gaining the confidence intervals in the process.
<- calibration_tbl %>%
refit_tbl modeltime_refit(m750)
%>%
refit_tbl modeltime_forecast(
h = "2 years",
actual_data = m750
%>%
) plot_modeltime_forecast(.interactive = interactive)
This was a very short tutorial on the simplest type of forecasting, but there’s a lot more to learn.
Become the forecasting expert for your organization
High-Performance Time Series Course
Time series is changing. Businesses now need 10,000+ time series forecasts every day. This is what I call a High-Performance Time Series Forecasting System (HPTSF) - Accurate, Robust, and Scalable Forecasting.
High-Performance Forecasting Systems will save companies by improving accuracy and scalability. Imagine what will happen to your career if you can provide your organization a “High-Performance Time Series Forecasting System” (HPTSF System).
I teach how to build a HPTFS System in my High-Performance Time Series Forecasting Course. You will learn:
Modeltime
- 30+ Models (Prophet, ARIMA, XGBoost, Random
Forest, & many more)GluonTS
(Competition Winners)Become the Time Series Expert for your organization.