Genesys PGR is the global database on plant genetic resources maintained ex situ in national, regional and international genebanks around the world.
genesysr uses the Genesys API to query Genesys data. The API is accessible at https://api.genesys-pgr.org.
Accessing data with genesysr is similar to downloading data in CSV or Excel format and loading it into R.
Accession passport data is retrieved with the get_accessions
function.
The database is queried by providing a filter
(see Filters below):
## Setup: use Genesys Sandbox environment
# genesysr::setup_sandbox()
# genesysr::setup_production() # This is initialized by default when loading genesysr
# Open a browser: login to Genesys and authorize access
genesysr::user_login()
# Retrieve first 1000 accessions for genus *Musa*
musa <- get_accessions(filters = list(taxonomy = list(genus = c('Musa'))), at.least = 1000)
# Or retrieve all accession data for genus *Musa*
musa <- get_accessions(filters = list(taxonomy = list(genus = c('Musa'))))
# Retrieve all accession data for the Musa International Transit Center, Bioversity International
itc <- get_accessions(list(institute = list(code = c('BEL084'))))
# Retrieve all accession data for the Musa International Transit Center, Bioversity International (BEL084) and the International Center for Tropical Agriculture (COL003)
some <- get_accessions(list(institute = list(code = c('BEL084','COL003'))))
genesysr provides utility functions to create filter
objects using Multi-Crop Passport Descriptors (MCPD) definitions:
# Retrieve data by country of origin (MCPD)
get_accessions(mcpd_filter(ORIGCTY = c("DEU", "SVN")))
The data is provided by Genesys as CSV. Where multiple values are possible for a column, there will be multiple columns. For example, accession STORAGE
may be provided as:
… | storage1 | storage2 | storage3 |
---|---|---|---|
… | 10 | 20 | 30 |
… | 30 | 40 | NA |
… | 30 | NA | NA |
… | 10 | 20 | 30 |
The filter
object is a named list()
where names match a Genesys filter and the value specifies the criteria to match.
The records returned by Genesys match all filters provided (AND operation), while individual filters allow for specifying multiple criteria (OR operation):
# (GENUS == Musa) AND ((ORIGCTY == NGA) OR (ORIGCTY == CIV))
filter <- list(taxonomy = list(genus = c('Musa'), species = c('aa')), countryOfOrigin = list(iso3 = c('NGA', 'CIV')))
# OR
filter <- list();
filter$taxonomy$genus = c('Musa')
filter$taxonomy$species = c('aa')
filter$countryOfOrigin$iso3 = c('NGA', 'CIV')
# See filter object as JSON
jsonlite::toJSON(filters)
There are a number of filtering options to retrieve data from Genesys. Best explore how filtering works on the actual website https://www.genesys-pgr.org/a/overview by inspecting the HTTP requests sent by your browser to the API server and then replicating them here.
taxonomy$genus
filters by a list of genera.
filters <- list(taxonomy = list(genus = c('Hordeum', 'Musa')))
# Print
jsonlite::toJSON(filters)
taxonomy$species
filters by a list of species.
filters <- list(taxonomy = list(genus = c('Hordeum'), species = c('vulgare')))
# Print
jsonlite::toJSON(filters)
countryOfOrigin$iso3
filters by ISO3 code of country of origin of PGR material.
# Material originating from Germany (DEU) and France (FRA)
filters <- list(countryOfOrigin = list(iso3 = c('DEU', 'FRA')))
geo.latitude
and geo.longitude
filters by latitude/longitude (in decimal format) of the collecting site.
# TBD
filters <- list(geo = list(latitude = genesysr::range(-10, 30), longitude = genesysr::range(30, 50)))
institute$code
filters by a list of FAO WIEWS institute codes of the holding institutes.
# Filter for ITC (BEL084) and CIAT (COL003)
list(institute = list(code = c('BEL084', 'COL003')))
institute$country$iso3
filters by a list of ISO3 country codes of country of the holding institute.
# Filter for genebanks in Slovenia (SVN) and Belgium (BEL)
list(institute = list(country = list(iso3 = c('SVN', 'BEL'))))
Genesys API returns a lot of variables for accession passport data. To reduce the amount of data to be processed and kept in memory, select the columns of interest the fields
vector:
# Fetch only accession id, storage and taxonomic data for *Musa*
musa <- genesysr::get_accessions(list(taxonomy = list(genus = c('Musa'))), fields = c("taxonomy", "storage", "id"))
To list the variable names returned by the Genesys APIs, test the response and select columns of interest:
# fetch_accessions uses the JSON format
accn <- fetch_accessions(filters = list(), at.least = 100)
# Print names used in JSON response from Genesys
sort(unique(names(unlist(accn$content))))
Let’s take a look of all the process of fetching accession passport data from Genesys.
library(genesysr)
setup_sandbox()
user_login()
musa <- genesysr::get_accessions(list(taxonomy = list(genus = c('Musa'))), at.least = 1000)
names(musa)