mlr3spatiotempcv
Package website: release | dev
Spatiotemporal resampling methods for mlr3.
This package extends the mlr3 package framework with
spatiotemporal resampling and visualization methods.
If you prefer the tidymodels ecosystem, have a look
at the {spatialsample}
package for spatial sampling methods.
Installation
CRAN version
install.packages("mlr3spatiotempcv")
Development version
remotes::install_github("mlr-org/mlr3spatiotempcv")
# R Universe Repo
install.packages('mlr3spatiotempcv', mlrorg = 'https://mlr-org.r-universe.dev')
Get Started
See the “Get
Started” vignette for a quick introduction.
For more detailed information including an usage example see the “Spatiotemporal
Analysis” chapter in the mlr3book.
Article “Spatiotemporal
Visualization” shows how 3D subplots grids can be created.
Citation
To cite the package in publications, use the output of
citation("mlr3spatiotempcv")
.
Resources
This list does not claim to be comprehensive.
(Disclaimer: Because CRAN does not like DOI URLs in their automated
checks, direct linking to scientific articles is not possible…)
FAQ
Which resampling method should I use?
There is no single-best resampling method. It depends on your
dataset characteristics and what your model should is about to predict
on. The resampling scheme should reflect the final purpose of the model
- this concept is called “target-oriented” resampling. For example, if
the model was trained on multiple forest plots and its purpose is to
predict something on unknown forest stands, the resampling structure
should reflect this.
Are there more resampling methods than the one {mlr3spatiotempcv}
offers?
{mlr3spatiotempcv} aims to offer all resampling methods that exist
in R. Though this does not mean that it covers all resampling methods.
If there are some that you are missing, feel free to open an issue.
How can I use the “blocking” concept of the old {mlr}?
This concept is now supported via the “column roles” concept
available in {mlr3} Task objects.
See this
documentation for more information.
For the methods that offer buffering, how can an appropriate value be
chosen?
There is no easy answer to this question. Buffering train and test
sets reduces the similarity between both. The degree of this reduction
depends on the dataset itself and there is no general approach how to
choosen an appropriate buffer size. Some studies used the distance at
which the autocorrelation levels off. This buffer distance often removes
quite a lot of observations and needs to be calculated first.