Version 1.4.0 brings in three major changes in how galah
works:
galah_call
Below we discuss each of these changes in turn. Please note, however, that these changes are by no means set in stone - it is absolutely possible to change syntax in future versions of galah
if alternative names are easier to use and understand. We would appreciate any feedback from users about what works or what doesn’t work. It is our goal to create a package that is as easy and intuitive for users as possible!
dplyr
galah_
functions now evaluate arguments just like dplyr
. To see what we mean, let’s look at an example of how dplyr::filter()
works. Notice how dplyr::filter
and galah_filter
both require logical arguments to be added by using the ==
sign:
library(dplyr)
%>%
mtcars filter(mpg == 21)
## mpg cyl disp hp drat wt qsec vs am gear carb
## Mazda RX4 21 6 160 110 3.9 2.620 16.46 0 1 4 4
## Mazda RX4 Wag 21 6 160 110 3.9 2.875 17.02 0 1 4 4
galah_call() %>%
galah_filter(year == 2021) %>%
atlas_counts()
## # A tibble: 1 × 1
## count
## <int>
## 1 1161557
As another example, notice how galah_group_by()
+ atlas_counts()
works very similarly to dplyr::group_by()
+ dplyr::count()
:
%>%
mtcars group_by(vs) %>%
count()
## # A tibble: 2 × 2
## # Groups: vs [2]
## vs n
## <dbl> <int>
## 1 0 18
## 2 1 14
galah_call() %>%
galah_group_by(biome) %>%
atlas_counts()
## # A tibble: 2 × 2
## biome count
## <chr> <int>
## 1 TERRESTRIAL 93590939
## 2 MARINE 3519480
We made this move towards tidy evaluation to make it possible to use piping for building queries to the Atlas of Living Australia. In practice, this means that data queries can be filtered just like how you might filter a data.frame
with the tidyverse
suite of functions.
Prior to version 1.4.0, galah
naming conventions had two major problems:
ala
, but actually could query many other living atlasesselect
, but this is reserved in dplyr
(and elsewhere) for operations specifically on columnsTo address these concerns (and other smaller points), we have completed a rewrite of our function names to increase clarity (see table below). Deprecated function names will now return a warning message when used, suggesting to users that they switch to the new syntax.
galah 1.3.1 and earlier | galah 1.4.0 |
---|---|
galah_call | |
select_taxa | galah_identify |
select_filters | galah_filter |
select_columns | galah_select |
select_locations | galah_geolocate |
galah_group_by | |
galah_down_to | |
ala_counts | atlas_counts |
ala_occurrences | atlas_occurrences |
ala_species | atlas_species |
ala_media | atlas_media |
ala_taxonomy | atlas_taxonomy |
ala_citation | atlas_citation |
select_taxa | search_taxa, search_identifiers |
search_fields | search_fields |
show_all_fields | |
find_profiles | show_all_profiles |
find_ranks | show_all_ranks |
find_atlases | show_all_atlases |
find_reasons | show_all_reasons |
find_cached_files | show_all_cached_files |
find_field_values | search_field_values |
find_profile_attributes | search_profile_attributes |
galah_call()
Perhaps the largest change from galah
1.4.0 is the implementation of piping using galah_call()
.
Beginning a query with galah_call()
(be sure to add the parentheses!) tells galah
that you will be using pipes to construct your query. Follow this with your preferred pipe (|>
from base
or %>%
from magrittr
). You can then narrow your query line-by-line using galah_
functions. Finally, end with an atlas_
function to identify what type of data you want from your query.
Unlike old function names, which will be removed from future versions, we do intend to continue supported un-piped syntax in future, although piping only works with revamped function names. If you’re new to piping, here’s a comparison against code from previous versions of galah
.
Previously, if you wanted to look up the number of records of each bandicoot species every year from 2010 to 2021, you’d have had to do something like this:
library(purrr)
library(dplyr)
<- ala_species(taxa = select_taxa("perameles"))$species
taxa <- select_filters(year = seq(2010:2021))
years
%>%
taxa map_dfr( ~ ala_counts(
taxa = select_taxa(list(species = .x)),
filters = years,
group_by = "year")
Not very easy because you had to use multiple atlas_
functions and you had to use loops. However, now with piping you can do it like this:
galah_call() %>%
galah_identify("perameles") %>%
galah_filter(year > 2010) %>%
galah_group_by(species, year) %>%
atlas_counts()
And a second example, if you wanted to download occurrence records of bandicoots in 2021, and also to include information on which records had zero coordinates, previously you would have had to do this:
atlas_occurrences(taxa = select_taxa("perameles"),
filters = select_filters(year = 2021),
columns = select_columns(group = "basic", "ZERO_COORDINATE"))
Now with piping:
galah_call() %>%
galah_identify("perameles") %>%
galah_filter(year == 2021) %>%
galah_select(group = "basic", ZERO_COORDINATE) %>%
atlas_occurrences()