Download data

Martin Westgate & Dax Kellie

2022-01-24

The atlas_ functions are used to return data from the atlas chosen using galah_config(). They are:

The final atlas_ function - atlas_citation - is unusual in that it does not return any new data. Instead it provides a citation for an existing dataset ( downloaded using atlas_occurrences) that has an associated DOI. The other functions are described below.

Record counts

atlas_counts() provides summary counts on records in the specified atlas, without needing to download all the records.

# Total number of records in the ALA
atlas_counts()
## # A tibble: 1 × 1
##       count
##       <int>
## 1 102070026

In addition to the filter arguments, it has an optional group_by argument, which provides counts binned by the requested field.

galah_call() |>
  galah_group_by(kingdom) |>
  atlas_counts()
## # A tibble: 10 × 2
##    kingdom      count
##    <chr>        <int>
##  1 Animalia  76305582
##  2 Plantae   21998260
##  3 Fungi      1893778
##  4 Chromista   915235
##  5 Protista     67365
##  6 Bacteria     58156
##  7 Protozoa     23635
##  8 Archaea       1103
##  9 Eukaryota      735
## 10 Virus          472

Species lists

A common use case of atlas data is to identify which species occur in a specified region, time period, or taxonomic group. atlas_species() is similar to search_taxa, in that it returns taxonomic information and unique identifiers in a tibble. It differs in not being able to return information on taxonomic levels other than the species; but also in being more flexible by supporting filtering:

species <- galah_call() |>
  galah_identify("Rodentia") |>
  galah_filter(stateProvince == "Northern Territory") |>
  atlas_species()
  
species |> head()
## # A tibble: 6 × 10
##   kingdom  phylum   class    order    family  genus        species                     author   species_guid       vernacular_name
##   <chr>    <chr>    <chr>    <chr>    <chr>   <chr>        <chr>                       <chr>    <chr>              <chr>          
## 1 Animalia Chordata Mammalia Rodentia Muridae Mesembriomys Mesembriomys gouldii        (J.E. G… urn:lsid:biodiver… Black-footed T…
## 2 Animalia Chordata Mammalia Rodentia Muridae Zyzomys      Zyzomys argurus             (Thomas… urn:lsid:biodiver… Common Rock-rat
## 3 Animalia Chordata Mammalia Rodentia Muridae Pseudomys    Pseudomys hermannsburgensis (Waite,… urn:lsid:biodiver… Sandy Inland M…
## 4 Animalia Chordata Mammalia Rodentia Muridae Notomys      Notomys alexis              Thomas,… urn:lsid:biodiver… Spinifex Hoppi…
## 5 Animalia Chordata Mammalia Rodentia Muridae Melomys      Melomys burtoni             (Ramsay… urn:lsid:biodiver… Grassland Melo…
## 6 Animalia Chordata Mammalia Rodentia Muridae Mus          Mus musculus                Linnaeu… urn:lsid:biodiver… House Mouse

Occurrence data

To download occurrence data you will need to specify your email in galah_config(). This email must be associated with an active ALA account. See more information in the config section

galah_config(email = "your_email@email.com", atlas = "Australia")

Download occurrence records for Eolophus roseicapilla

occ <- galah_call() |>
  galah_identify("Eolophus roseicapilla") |>
  galah_filter(
    stateProvince == "Australian Capital Territory",
    year >= 2010,
    profile = "ALA"
  ) |>
  galah_select(institutionID, group = "basic") |>
  atlas_occurrences()
  
occ |> head()
## # A tibble: 6 × 8
##   decimalLatitude decimalLongitude eventDate              scientificName  taxonConceptID   recordID dataResourceName institutionID
##             <dbl>            <dbl> <chr>                  <chr>           <chr>            <chr>    <chr>            <chr>        
## 1           -35.9             149. ""                     Eolophus rosei… urn:lsid:biodiv… 17f46d4… eBird Australia  ""           
## 2           -35.9             149. ""                     Eolophus rosei… urn:lsid:biodiv… ef2b906… eBird Australia  ""           
## 3           -35.9             149. "2012-01-18T13:00:00Z" Eolophus rosei… urn:lsid:biodiv… 4f7cd71… BirdLife Austra… ""           
## 4           -35.9             149. ""                     Eolophus rosei… urn:lsid:biodiv… 3236c47… eBird Australia  ""           
## 5           -35.8             149. ""                     Eolophus rosei… urn:lsid:biodiv… eec91d8… eBird Australia  ""           
## 6           -35.8             149. ""                     Eolophus rosei… urn:lsid:biodiv… e340c42… eBird Australia  ""

Media downloads

In addition to text data describing individual occurrences and their attributes, ALA stores images, sounds and videos associated with a given record. These can be downloaded to R using atlas_media() and the same set of filters as the other data download functions.

media_data <- galah_call() |>
  galah_identify("Eolophus roseicapilla") |>
  galah_filter(year == 2020) |>
  atlas_media(download_dir = "media")

Taxonomic trees

atlas_taxonomy provides a way to build taxonomic trees from one clade down to another using ALA’s internal taxonomy. Specify which taxonomic level your tree will go down to with galah_down_to.

classes <- galah_call() |>
  galah_identify("chordata") |>
  galah_down_to(class) |>
  atlas_taxonomy()

This function is unique within galah as it is the only function that returns a data.tree, rather than a tibble.

##                             levelName
## 1  Chordata                          
## 2   ¦--Cephalochordata               
## 3   ¦   °--Amphioxi                  
## 4   ¦--Craniata                      
## 5   ¦   °--Agnatha                   
## 6   ¦       ¦--Cephalasipidomorphi   
## 7   ¦       °--Myxini                
## 8   ¦--Tunicata                      
## 9   ¦   ¦--Appendicularia            
## 10  ¦   ¦--Ascidiacea                
## 11  ¦   °--Thaliacea                 
## 12  °--Vertebrata                    
## 13      °--Gnathostomata             
## 14          ¦--Amphibia              
## 15          ¦--Aves                  
## 16          ¦--Mammalia              
## 17          ¦--Pisces                
## 18          ¦   ¦--Actinopterygii    
## 19          ¦   ¦--Chondrichthyes    
## 20          ¦   ¦--Cephalaspidomorphi
## 21          ¦   °--Sarcopterygii     
## 22          °--Reptilia

Although the tree format is useful, converting to a data.frame is straightforward.

data.tree::ToDataFrameTypeCol(classes, type = "rank") |> head()
##   rank_phylum  rank_subphylum rank_superclass rank_informal          rank_class
## 1    Chordata Cephalochordata            <NA>          <NA>            Amphioxi
## 2    Chordata        Craniata         Agnatha          <NA> Cephalasipidomorphi
## 3    Chordata        Craniata         Agnatha          <NA>              Myxini
## 4    Chordata        Tunicata            <NA>          <NA>      Appendicularia
## 5    Chordata        Tunicata            <NA>          <NA>          Ascidiacea
## 6    Chordata        Tunicata            <NA>          <NA>           Thaliacea

Configuring galah

Various aspects of the galah package can be customized. To preserve configuration for future sessions, set profile_path to a location of a .Rprofile file.

Email

To download occurrence records, you will need to provide an email address registered with the ALA. You can create an account here. Once an email is registered with the ALA, it should be stored in the config:

galah_config(email = "myemail@gmail.com")

Caching

galah can cache most results to local files. This means that if the same code is run multiple times, the second and subsequent iterations will be faster.

By default, this caching is session-based, meaning that the local files are stored in a temporary directory that is automatically deleted when the R session is ended. This behaviour can be altered so that caching is permanent, by setting the caching directory to a non-temporary location.

galah_config(cache_directory = "example/dir")

By default, caching is turned off. To turn caching on, run

galah_config(caching = FALSE)

Setting the download reason

ALA requires that you provide a reason when downloading occurrence data (via the galah atlas_occurrences() function). The reason is set as “scientific research” by default, but you can change this using galah_config(). See show_all_reasons() for valid download reasons.

galah_config(download_reason_id = your_reason_id)

Debugging

If things aren’t working as expected, more detail (particularly about web requests and caching behaviour) can be obtained by setting the verbose configuration option:

galah_config(verbose = TRUE)