genesysr Tutorial

Matija Obreza & Nora Castaneda

2019-11-22

Querying Genesys PGR

Genesys PGR is the global database on plant genetic resources maintained ex situ in national, regional and international genebanks around the world.

genesysr uses the Genesys API to query Genesys data. The API is accessible at https://api.genesys-pgr.org.

Accessing data with genesysr is similar to downloading data in CSV or Excel format and loading it into R.

For the impatient

Accession passport data is retrieved with the get_accessions function.

The database is queried by providing a filter (see Filters below):

## Setup: use Genesys Sandbox environment
# genesysr::setup_sandbox()
# genesysr::setup_production() # This is initialized by default when loading genesysr

# Open a browser: login to Genesys and authorize access
genesysr::user_login()

# Retrieve first 1000 accessions for genus *Musa*
musa <- get_accessions(filters = list(taxonomy = list(genus = c('Musa'))), at.least = 1000)
# Or retrieve all accession data for genus *Musa*
musa <- get_accessions(filters = list(taxonomy = list(genus = c('Musa'))))

# Retrieve all accession data for the Musa International Transit Center, Bioversity International
itc <- get_accessions(list(institute = list(code = c('BEL084'))))

# Retrieve all accession data for the Musa International Transit Center, Bioversity International (BEL084) and the International Center for Tropical Agriculture (COL003)
some <- get_accessions(list(institute = list(code = c('BEL084','COL003'))))

genesysr provides utility functions to create filter objects using Multi-Crop Passport Descriptors (MCPD) definitions:

# Retrieve data by country of origin (MCPD)
get_accessions(mcpd_filter(ORIGCTY = c("DEU", "SVN")))

Processing fetched data

The data is provided by Genesys as CSV. Where multiple values are possible for a column, there will be multiple columns. For example, accession STORAGE may be provided as:

storage1 storage2 storage3
10 20 30
30 40 NA
30 NA NA
10 20 30

Filters

The filter object is a named list() where names match a Genesys filter and the value specifies the criteria to match.

The records returned by Genesys match all filters provided (AND operation), while individual filters allow for specifying multiple criteria (OR operation):

# (GENUS == Musa) AND ((ORIGCTY == NGA) OR (ORIGCTY == CIV))
filter <- list(taxonomy = list(genus = c('Musa'), species = c('aa')), countryOfOrigin = list(iso3 = c('NGA', 'CIV')))

# OR
filter <- list();
filter$taxonomy$genus = c('Musa')
filter$taxonomy$species = c('aa')
filter$countryOfOrigin$iso3 = c('NGA', 'CIV')

# See filter object as JSON
jsonlite::toJSON(filters)

There are a number of filtering options to retrieve data from Genesys. Best explore how filtering works on the actual website https://www.genesys-pgr.org/a/overview by inspecting the HTTP requests sent by your browser to the API server and then replicating them here.

Taxonomy

taxonomy$genus filters by a list of genera.

filters <- list(taxonomy = list(genus = c('Hordeum', 'Musa')))
# Print
jsonlite::toJSON(filters)

taxonomy$species filters by a list of species.

filters <- list(taxonomy = list(genus = c('Hordeum'), species = c('vulgare')))
# Print
jsonlite::toJSON(filters)

Origin of material

countryOfOrigin$iso3 filters by ISO3 code of country of origin of PGR material.

# Material originating from Germany (DEU) and France (FRA)
filters <- list(countryOfOrigin = list(iso3 = c('DEU', 'FRA')))

geo.latitude and geo.longitude filters by latitude/longitude (in decimal format) of the collecting site.

# TBD
filters <- list(geo = list(latitude = genesysr::range(-10, 30), longitude = genesysr::range(30, 50)))

Holding institute

institute$code filters by a list of FAO WIEWS institute codes of the holding institutes.

# Filter for ITC (BEL084) and CIAT (COL003)
list(institute = list(code = c('BEL084', 'COL003')))

institute$country$iso3 filters by a list of ISO3 country codes of country of the holding institute.

# Filter for genebanks in Slovenia (SVN) and Belgium (BEL)
list(institute = list(country = list(iso3 = c('SVN', 'BEL'))))

Selecting columns

Genesys API returns a lot of variables for accession passport data. To reduce the amount of data to be processed and kept in memory, select the columns of interest the fields vector:

# Fetch only accession id, storage and taxonomic data for *Musa*
musa <- genesysr::get_accessions(list(taxonomy = list(genus = c('Musa'))), fields = c("taxonomy", "storage", "id"))

To list the variable names returned by the Genesys APIs, test the response and select columns of interest:

# fetch_accessions uses the JSON format
accn <- fetch_accessions(filters = list(), at.least = 100)

# Print names used in JSON response from Genesys
sort(unique(names(unlist(accn$content))))

Step-by-step example

Let’s take a look of all the process of fetching accession passport data from Genesys.

  1. Load genesysr
library(genesysr)
  1. Setup using user credentials
setup_sandbox()
user_login()
  1. Fetch data
musa <- genesysr::get_accessions(list(taxonomy = list(genus = c('Musa'))), at.least = 1000)
  1. Identify columns of interest
names(musa)