library("fundiversity")
data("traits_birds", package = "fundiversity")
data("site_sp_birds", package = "fundiversity")
Within fundiversity
the computation of most indices can be parallelized using the future
package. The indices that currently support parallelization are: FRic, FDis, FDiv, and FEve. The goal of this vignette is to explain how to toggle and use parallelization in fundiversity
.
The future
package provides a simple and general framework to allow asynchronous computation depending on the resources available for the user. The first vignette of future
gives a general overview of all its features. The main idea being that the user should write the code once and that it would run seamlessly sequentially, or in parallel on a single computer, or on a cluster, or distributed over several computers. fundiversity
can thus run on all these different backends following the user’s choice.
By default the fundiversity
code will run sequentially on a single core. To trigger parallelization the user needs to define a future::plan()
object with a parallel backend such as future::multisession
to split the execution across multiple R sessions.
# Sequential execution
<- fd_fric(traits_birds)
fric1
# Parallel execution
::plan(future::multisession) # Plan definition
future<- fd_fric(traits_birds) # The code resolve in similar fashion
fric2
identical(fric1, fric2)
#> [1] TRUE
Within the future::multisession
backend you can specify the number of cores on which the function should be parallelized over through the argument workers
, you can change it in the future::plan()
call:
::plan(future::multisession, workers = 2) # Only 2 cores are used
future<- fd_fric(traits_birds)
fric3
identical(fric3, fric2)
#> [1] TRUE
To learn more about the different backends available and the related arguments needed, please refer to the documentation of future::plan()
and the overview vignette of future
.
We can now compare the difference in performance to see the performance gain thanks to parallelization:
::plan(future::sequential)
future<- microbenchmark::microbenchmark(
non_parallel_bench non_parallel = {
fd_fric(traits_birds)
},times = 20
)
::plan(future::multisession)
future<- microbenchmark::microbenchmark(
parallel_bench parallel = {
fd_fric(traits_birds)
},times = 20
)
rbind(non_parallel_bench, parallel_bench)
#> Unit: microseconds
#> expr min lq mean median uq max neval
#> non_parallel 514.915 539.2130 623.7170 563.7615 633.5185 975.669 20
#> parallel 472.501 476.6675 504.0272 482.8980 493.9275 817.955 20
The non parallelized code runs faster than the parallelized one! Indeed, the parallelization in fundiversity
parallelize the computation across different sites. So parallelization should be used when you have many sites on which you want to compute similar indices.
# Function to make a bigger site-sp dataset
<- function(n) {
make_more_sites <- do.call(rbind, replicate(n, site_sp_birds, simplify = FALSE))
site_sp rownames(site_sp) <- paste0("s", seq_len(nrow(site_sp)))
site_sp }
For example with a dataset 5000 times bigger:
<- make_more_sites(5000)
bigger_site
::microbenchmark(
microbenchmarkseq = {
::plan(future::sequential)
futurefd_fric(traits_birds, bigger_site)
},multisession = {
::plan(future::multisession, workers = 4)
futurefd_fric(traits_birds, bigger_site)
},multicore = {
::plan(future::multicore, workers = 4)
futurefd_fric(traits_birds, bigger_site)
times = 20
},
)#> Unit: seconds
#> expr min lq mean median uq max neval
#> seq 4.954230 5.070038 5.142720 5.173176 5.198141 5.311713 20
#> multisession 6.300440 6.487385 6.549981 6.563890 6.607202 6.739610 20
#> multicore 4.880869 5.025615 5.111846 5.101368 5.202465 5.489126 20
#> seconds needed to generate this document: 343.203 sec elapsed
#> ─ Session info ──────────────────────────────────────────────────────────────────────────────────────────────────────────
#> setting value
#> version R version 4.0.3 (2020-10-10)
#> os macOS High Sierra 10.13.6
#> system x86_64, darwin17.0
#> ui RStudio
#> language (EN)
#> collate fr_FR.UTF-8
#> ctype fr_FR.UTF-8
#> tz Europe/Paris
#> date 2021-05-12
#>
#> ─ Packages ──────────────────────────────────────────────────────────────────────────────────────────────────────────────
#> package * version date lib source
#> abind 1.4-5 2016-07-21 [1] CRAN (R 4.0.2)
#> assertthat 0.2.1 2019-03-21 [1] CRAN (R 4.0.2)
#> audio 0.1-7 2020-03-09 [1] CRAN (R 4.0.2)
#> beepr 1.3 2018-06-04 [1] CRAN (R 4.0.2)
#> cachem 1.0.4 2021-02-13 [1] CRAN (R 4.0.3)
#> callr 3.7.0 2021-04-20 [1] CRAN (R 4.0.3)
#> cli 2.5.0 2021-04-26 [1] CRAN (R 4.0.2)
#> codetools 0.2-18 2020-11-04 [1] CRAN (R 4.0.2)
#> crayon 1.4.1 2021-02-08 [1] CRAN (R 4.0.3)
#> curl 4.3.1 2021-04-30 [1] CRAN (R 4.0.2)
#> desc 1.3.0 2021-03-05 [1] CRAN (R 4.0.3)
#> devtools 2.4.1 2021-05-05 [1] CRAN (R 4.0.2)
#> digest 0.6.27 2020-10-24 [1] CRAN (R 4.0.2)
#> ellipsis 0.3.2 2021-04-29 [1] CRAN (R 4.0.3)
#> evaluate 0.14 2019-05-28 [1] CRAN (R 4.0.1)
#> fansi 0.4.2 2021-01-15 [1] CRAN (R 4.0.2)
#> fastmap 1.1.0 2021-01-25 [1] CRAN (R 4.0.3)
#> fs 1.5.0 2020-07-31 [1] CRAN (R 4.0.2)
#> fundiversity * 0.0.2 2021-02-24 [1] local
#> future 1.21.0 2020-12-10 [1] CRAN (R 4.0.3)
#> geometry 0.4.5 2019-12-04 [1] CRAN (R 4.0.2)
#> globals 0.14.0 2020-11-22 [1] CRAN (R 4.0.3)
#> glue 1.4.2 2020-08-27 [1] CRAN (R 4.0.2)
#> htmltools 0.5.1.1 2021-01-22 [1] CRAN (R 4.0.3)
#> httr 1.4.2 2020-07-20 [1] CRAN (R 4.0.2)
#> jsonlite 1.7.2 2020-12-09 [1] CRAN (R 4.0.3)
#> knitr 1.33 2021-04-24 [1] CRAN (R 4.0.2)
#> lattice 0.20-44 2021-05-02 [1] CRAN (R 4.0.3)
#> lifecycle 1.0.0 2021-02-15 [1] CRAN (R 4.0.3)
#> listenv 0.8.0 2019-12-05 [1] CRAN (R 4.0.2)
#> magic 1.5-9 2018-09-17 [1] CRAN (R 4.0.2)
#> magrittr 2.0.1 2020-11-17 [1] CRAN (R 4.0.2)
#> Matrix 1.3-3 2021-05-04 [1] CRAN (R 4.0.3)
#> memoise 2.0.0 2021-01-26 [1] CRAN (R 4.0.3)
#> microbenchmark 1.4-7 2019-09-24 [1] CRAN (R 4.0.2)
#> parallelly 1.25.0 2021-04-30 [1] CRAN (R 4.0.2)
#> parsedate 1.2.1 2021-04-20 [1] CRAN (R 4.0.3)
#> pillar 1.6.0 2021-04-13 [1] CRAN (R 4.0.2)
#> pkgbuild 1.2.0 2020-12-15 [1] CRAN (R 4.0.3)
#> pkgconfig 2.0.3 2019-09-22 [1] CRAN (R 4.0.2)
#> pkgload 1.2.1 2021-04-06 [1] CRAN (R 4.0.2)
#> prettyunits 1.1.1 2020-01-24 [1] CRAN (R 4.0.2)
#> processx 3.5.2 2021-04-30 [1] CRAN (R 4.0.2)
#> ps 1.6.0 2021-02-28 [1] CRAN (R 4.0.3)
#> purrr 0.3.4 2020-04-17 [1] CRAN (R 4.0.2)
#> R6 2.5.0 2020-10-28 [1] CRAN (R 4.0.2)
#> rappdirs 0.3.3 2021-01-31 [1] CRAN (R 4.0.2)
#> rcmdcheck 1.3.3 2019-05-07 [1] CRAN (R 4.0.2)
#> Rcpp 1.0.6 2021-01-15 [1] CRAN (R 4.0.2)
#> rematch 1.0.1 2016-04-21 [1] CRAN (R 4.0.2)
#> remotes 2.3.0 2021-04-01 [1] CRAN (R 4.0.2)
#> rhub 1.1.1 2019-04-08 [1] CRAN (R 4.0.2)
#> rlang 0.4.11 2021-04-30 [1] CRAN (R 4.0.2)
#> rmarkdown 2.8 2021-05-07 [1] CRAN (R 4.0.2)
#> rprojroot 2.0.2 2020-11-15 [1] CRAN (R 4.0.2)
#> rstudioapi 0.13 2020-11-12 [1] CRAN (R 4.0.2)
#> sessioninfo 1.1.1 2018-11-05 [1] CRAN (R 4.0.2)
#> stringi 1.6.1 2021-05-10 [1] CRAN (R 4.0.3)
#> stringr 1.4.0 2019-02-10 [1] CRAN (R 4.0.2)
#> testthat 3.0.2 2021-02-14 [1] CRAN (R 4.0.3)
#> tibble 3.1.1 2021-04-18 [1] CRAN (R 4.0.3)
#> tictoc 1.0.1 2021-04-19 [1] CRAN (R 4.0.2)
#> urlchecker 1.0.0 2021-03-04 [1] CRAN (R 4.0.3)
#> usethis 2.0.1 2021-02-10 [1] CRAN (R 4.0.3)
#> utf8 1.2.1 2021-03-12 [1] CRAN (R 4.0.2)
#> uuid 0.1-4 2020-02-26 [1] CRAN (R 4.0.2)
#> vctrs 0.3.8 2021-04-29 [1] CRAN (R 4.0.3)
#> whoami 1.3.0 2019-03-19 [1] CRAN (R 4.0.2)
#> withr 2.4.2 2021-04-18 [1] CRAN (R 4.0.3)
#> xfun 0.22 2021-03-11 [1] CRAN (R 4.0.2)
#> xml2 1.3.2 2020-04-23 [1] CRAN (R 4.0.2)
#> xopen 1.0.0 2018-09-17 [1] CRAN (R 4.0.2)
#> yaml 2.2.1 2020-02-01 [1] CRAN (R 4.0.2)
#>
#> [1] /Library/Frameworks/R.framework/Versions/4.0/Resources/library