In this vignette, functions in the VisitorCounts package are demonstrated using park visitation data from Yellowstone National Park.
park_visitation
and flickr_userdays
First, we load two datasets: park_visitation
stores 156 monthly observations spanning 2005 through 2017 of flickr user-days (PUD) and visitor counts by the national park service (NPS) for 20 popular national parks in the United States. Second, flickr_userdays
stores log US flickr user-days for the corresponding time period.
library(VisitorCounts)
## Registered S3 method overwritten by 'quantmod':
## method from
## as.zoo.data.frame zoo
data("park_visitation")
data("flickr_userdays")
For the purposes of this vignette, three time series are extracted from these datasets. First, log_yellowstone_pud
is a time series of 156 monthly observations of flickr photo-user-days geolocated within Yellowstone National Park. Second, log_yellowstone_nps
is a time series of 156 monthly observations of counts of park visitation by the national park service. Third, flickr_userdays
is a time series of 156 monthly observations of log flickr user-days taken within the United States.
<- park_visitation[park_visitation$park == "YELL",]$pud #photo user days
yellowstone_pud <- park_visitation[park_visitation$park == "YELL",]$nps #national park service counts
yellowstone_nps
<- ts(yellowstone_pud, start = 2005, freq = 12)
yellowstone_pud <- ts(yellowstone_nps, start = 2005, freq = 12)
yellowstone_nps
<- log(yellowstone_pud)
log_yellowstone_pud <- log(yellowstone_nps)
log_yellowstone_nps
<- log(flickr_userdays) log_flickr_userdays
plot(log_yellowstone_pud, main = "Yellowstone Photo-User-Days (PUD)", ylab = "PUD")
plot(log_yellowstone_nps, main = "Yellowstone National Park Service Visitation Counts (NPS)", ylab = "NPS")
plot(log_flickr_userdays, main = "Log US Flickr user-days", ylab = "UD")
The visitation_model()
function uses social media data, such as the log flickr photo-user-days in log_yellowstone_pud
, coupled with a popularity measure of the social media platform, like the log US flickr userdays in log_flickr_userdays
, to model percent changes in visitation counts. By default, visitation_model()
assumes that no visitation counts are available, communicated in the parameter ref_series = FALSE
.
<- visitation_model(log_yellowstone_pud,
yell_visitation_model log_flickr_userdays)
## All the forecasts will be made in the log scale.
## The additive constant for the model is assumed to be equal to zero.
## If a better constant is known, change the value in the constant argument.
## Instead, the actual series may be supplied in the ref_series argument.
## When omit_trend == TRUE, popularity_proxy will not be used.
If national park data is available, a reference series may be supplied to assist in parameter estimates:
<- visitation_model(log_yellowstone_pud,
yell_visitation_model_nps
log_flickr_userdays,ref_series = log_yellowstone_nps)
## All the forecasts will be made in the log scale.
## When omit_trend == TRUE, popularity_proxy will not be used.
By default, plot.visiation_model()
plots the differenced series. Typical graphical parameters may be passed to plot.visitation_model()
, such as line width:
<- diff(log_yellowstone_nps)
true_differences <- min(c(true_differences,diff(yell_visitation_model$visitation_fit)))-1
lower_bound <- max(c(true_differences,diff(yell_visitation_model$visitation_fit)))
upper_bound
plot(yell_visitation_model, ylim = c(lower_bound, upper_bound), lwd = 2)
lines(diff(log_yellowstone_nps), col = "red")
legend("bottom",c("Model Fit","True Differences"),col = c("black","red"),lty = c(1,1))
<- diff(log_yellowstone_nps)
true_differences <- min(c(true_differences,diff(yell_visitation_model_nps$visitation_fit)))-1
lower_bound <- max(c(true_differences,diff(yell_visitation_model_nps$visitation_fit)))
upper_bound
plot(yell_visitation_model_nps, ylim = c(lower_bound, upper_bound),
lwd = 2,
main = "Fitted Values for Visitation Model (NPS assisted)")
lines(diff(log_yellowstone_nps), col = "red")
legend("bottom",c("Model Fit","True Differences"),col = c("black","red"),lty = c(1,1))
Parameters can be inspected using summary.visitation_model()
. Two examples can be seen below:
summary(yell_visitation_model)
## Call: visitation_model(onsite_usage = log_yellowstone_pud, popularity_proxy = log_flickr_userdays)
##
## Parameter Estimates:
## ===============================
## Parameter: Estimate:
## ---------- ---------
## Beta: 1.3078
## Constant: 1
## Lag: 0
## Lag Criterion: cross-correlation
## Lag Criterion: MSE
## Lag Criterion: rank
## ===============================
summary(yell_visitation_model_nps)
## Call: visitation_model(onsite_usage = log_yellowstone_pud, popularity_proxy = log_flickr_userdays,
## ref_series = log_yellowstone_nps)
##
## Parameter Estimates:
## ===============================
## Parameter: Estimate:
## ---------- ---------
## Beta: 1.3931
## Constant: 101795.496692435
## Lag: 0
## Lag Criterion: cross-correlation
## Lag Criterion: MSE
## Lag Criterion: rank
## ===============================
Forecasts can be made using predict.visitation_model()
, whose output is a visitation_forecast
class object which can be inspected using plot
or summary
functions.
<- predict(yell_visitation_model, n_ahead = 12)
yellowstone_visitation_forecasts <- predict(yell_visitation_model_nps, n_ahead = 12)
yellowstone_visitation_forecasts_nps
<- predict(yell_visitation_model, n_ahead = 12, only_new = FALSE) yellowstone_visitation_forecasts_withpast
Forecasts can be plotted using plot.visitation_forecast()
:
plot(yellowstone_visitation_forecasts, difference = TRUE)
plot(yellowstone_visitation_forecasts_nps, main = "Forecasts for Visitation Model (NPS Assisted)")
plot(yellowstone_visitation_forecasts_withpast, difference = TRUE)
summary(yellowstone_visitation_forecasts)
## Visitation model forecasts:
##
## Parameter Estimates:
## ===============================
## Parameter: Estimate:
## ---------- ---------
## Beta: 1.308
## Constant: 1
## Lag:
## ===============================
## Criterion for Lag Estimate: cross-correlation
## Criterion for Lag Estimate: MSE
## Criterion for Lag Estimate: rank
## Number of Forecasts: 12
summary(yellowstone_visitation_forecasts_nps)
## Visitation model forecasts:
##
## Parameter Estimates:
## ===============================
## Parameter: Estimate:
## ---------- ---------
## Beta: 1.393
## Constant: 101795.497
## Lag:
## ===============================
## Criterion for Lag Estimate: cross-correlation
## Criterion for Lag Estimate: MSE
## Criterion for Lag Estimate: rank
## Number of Forecasts: 12
The automatic decomposition function uses singular-spectrum analysis, as implemented by the Rssa package, in conjunction with an automated procedure for classifying components to decompose a time series into trend, seasonality and noise.
<- auto_decompose(yellowstone_pud) yell_pud_decomposition
Several plot options are available for examining this decomposition.
plot(yell_pud_decomposition)
plot(yell_pud_decomposition, type = "period")
plot(yell_pud_decomposition, type = "classical")
The eigenvector grouping can be examined using summary.decomposition
.
summary(yell_pud_decomposition)
## Decomposition:
##
## Period or Component || Eigenvector Grouping
## =================== || ====================
## 12 || 2, 3
## 6 || 5, 6
## 4 || 9, 10
## 3 || 12, 13
## Trend || 1, 4
##
## Window Length: 72
## Number of Observations: 156
Forecasts can be made using predict.decomposition()
:
plot(predict(yell_pud_decomposition, n_ahead = 12)$forecast, main = "Decomposition 12-ahead Forecast", ylab = "Forecast Value")