eBird Status Data Products Changelog

Tom Auer

2020 Changelog

Data Version: 2020 (available June 2022)

Citation:
Fink, D., T. Auer, A. Johnston, M. Strimas-Mackey, O. Robinson, S. Ligocki, W. Hochachka, L. Jaromczyk, C. Wood, I. Davies, M. Iliff, L. Seitz. 2021. eBird Status and Trends, Data Version: 2020; Released: 2021. Cornell Lab of Ornithology, Ithaca, New York. https://doi.org/10.2173/ebirdst.2020

Data Inputs

eBird Checklists

  • CHANGED: checklists are included for January 1 2006 through December 31 2020, updated from January 1 2005 through April 15 2020.
  • CHANGED: all species now use all data globally and are not run for spatial subsets. Previously, primarily Western Hemisphere species were run only for that spatial extent.
  • CHANGED: checklists using the Stationary protocol now include tracks and are used as long as the distance of the track for this protocol type is less than 700 meters.
  • CHANGED: The spatial location for checklists at eBird Hotspots has been changed from the user-reported location to the centroid of all tracks associated with the hotspot.
  • FIXED: Previously, some historical checklists that lacked complete effort information had been included. These have now been excluded.

Environmental Covariates

  • CHANGED: SRTM15+ ~250m elevation and bathymetry replaces the ~1 kilometer SRTM30+ elevation and bathymetry product.
  • CHANGED: The single year of Nighttime Lights has been replaced with by-year assignment for 2014-2020 using the EOG Annual VNL v2 product.
  • CHANGED: The Global Intertidal Change dataset has been updated to version 1.2 which includes a new three-year time step covering 2017 through 2019.
  • CHANGED: Continents now have unique identifiers in the island categorization. Previously, all continents were treated as the same “mainland” value.
  • ADDED: Hourly weather variables have been assigned at 30 kilometer spatial resolution using the Copernicus ERA5 reanalysis product.
  • ADDED: 90m eastness and northness (combined slope and aspect) topographic variables from Amatulli et al. 2020 are included in addition to 1 kilometer eastness and northness.
  • FIXED: Source data updated for 2017-2019 for MCD12Q1 which had reported classification errors.

Workflow and Code Changes

Spatiotemporal Partitioning

  • CHANGED: The adaptive partitioning algorithm (AdaSTEM) now uses an Icosahedron Gnomic projection that generates partitions with largely conformal stixel boundaries across the globe.
  • CHANGED: The temporal width of AdaSTEM partitions has been changed from 30.5 days to 28 days.

Model Ensemble

  • CHANGED: The percent above threshold (PAT) cutoff for 3km grid cells to be reported as present has changed from 0.1 to 0.143, to accommodate increased occurrence rates as a result of including hourly weather to account for variation in detection rates.

Resident Methodology

  • CHANGED: Residents now have a suite of independent settings designed for species with strong spatiotemporal stationarity. These include the following:
    • Each stixel loads the full year of training and test data, not just the 28 day window associated with the given stixel.
    • The DAY predictor is encoding cyclically using sin and cosin transformation to allow the model to wrap the year.
    • The spatiotemporal grid sampling now seeks a maximum sample size of 65,000 checklists in a given stixel (for migrants this value is 5,000).

Data Products

  • CHANGED: The count model prediction values for effort variables are now set at 1 hour and 1 kilometer. Previously, the effort variables used for the count model prediction were the same as those used for the occurrence model, which sought to maximize detection by optimizing the distance and duration effort variables to capture as much signal as possible, up to 12 hours (6 hours in this version) and 10 kilometers.
  • CHANGED: Zeroes in data products that are outside of the prediction area for species (also known as assumed zeroes) now require, on average, across the up-to 100 models in the ensemble, 0.5% of 3km grid cells filled with at least 1 checklist for a given week to be reported as zero. Previously, this was 0.1% of 3km grid cells. This has been adjusted to offer a more appropriately conservative representation of where absence can be assumed based on overall data volume.
  • ADDED: Locations (3km grid cells) with less than a 0.5% mean site selection probability are now masked out of the final data products and reported as NA. Mean site selection probability is calculated weekly in a species-agnostic AdaSTEM workflow that estimates the probability that a location of a given habitat configuration will be visited in a given region and season.
  • ADDED: Spatial representations of predictive performance metrics and other individual model-level summaries are being generated as 27km GeoTIFFs for each week of the year. The spatialization is done by assigning the stixel-level values to every 27km grid cell within the stixel and then averaging across stixels to determine regional metrics.
  • FIXED: The Caspian Sea is now masked out of all data products.
  • CHANGED: Raw test data that does not receive model predictions has been removed from the calculation of predictive performance metrics. Previously, this type of test data was used as a form of assumed absence in the calculation of binary predictive performance metrics.
  • ADDED: Predictions to 3km grid cells now include a standardization of hourly weather within each individual model. The hourly weather values set for prediction are based on a maximization of occurrence estimates between the 80th and 90th percentiles.
  • CHANGED: Calculation of individual model partial dependencies now uses train out of bag data. Previously, train in bag data was used.
  • ADDED: Predictor Importance and Partial Dependency products are now included for both occurrence rate and count models. Previously, these products were only available for the occurrence rate model.
  • CHANGED: The time covariate used in the models, calculated as the difference between the local checklist time and solar noon at the checklist location, has been changed to use the temporal midpoint of the checklist for the calculation. Previously, the time at the start of the checklist had been used for this calculation.
  • FIXED: The temporal centroid of individual models, used with predictor importance and partial dependencies, has been changed to represent the mean date of train in bag data. Previously, this was a mean of all train, test, and all four weeks of 3km grid cell location data.
  • CHANGED: Regional habitat association charts are based on a weighted summary of stixel-level predictor importance and partial dependence estimates, with the weighting determined by the proportion of the region covered by each stixel. Previously, stixel centroids were used to determine the set of stixels contributing to a given region, with crude approximations of the stixels as rectangles in lat-lon coordinates being used to determine the overlap-based weighting. Now, the exact stixel shape is used when calculating regional habitat associations, by considering the exact set of 27km grid cells falling within each stixel, to determine both the set of stixels used in habitat summarization and the overlap-based weighting for a given region.
  • CHANGED: Habitat and regional abundance and range statistical summaries are now computed for all species, globally, using the Natural Earth Data Admin 1 data for summarization.

Expert Review

  • CHANGED: Animations are no longer being reviewed for resident species.

2019 Changelog

Data Version: 2019 (currently available)

Citation:
Fink, D., T. Auer, A. Johnston, M. Strimas-Mackey, O. Robinson, S. Ligocki, W. Hochachka, C. Wood, I. Davies, M. Iliff, L. Seitz. 2020. eBird Status and Trends, Data Version: 2019; Released: 2020. Cornell Lab of Ornithology, Ithaca, New York. https://doi.org/10.2173/ebirdst.2019

Data Inputs

eBird Checklists

  • CHANGED: Checklists are included for January 1, 2005 through April 15, 2020, updated from January 1, 2014 through December 31, 2018.
  • ADDED: Include checklists from the International Shorebird Survey (ISS) as complete for shorebird species.
  • CHANGED: Checklists where “slashes” (representing two similar species) are non-zero now have child species set to “X” (present-only, no count info).
  • FIXED: Subspecies did not always roll up to species-level correctly.

Environmental Covariates

Workflow and Code Changes

Spatiotemporal Partitioning

  • CHANGED: The adaptive partitioning algorithm (AdaSTEM) now uses projected coordinates (sinusoidal) and meters instead of unprojected coordinates and degrees.
  • CHANGED: AdaSTEM partitions are now 1500 kilometers on a side at their largest and 187 kilometers on a side at their smallest.
  • CHANGED: AdaSTEM rules now split partitions if they contain more than 16,000 checklists or are larger than 1500 kilometers on a side.
  • CHANGED: AdaSTEM now reverts individual partitions back to the next largest size if any of the partition children contain less than 500 checklists and are not mostly open water. Partitions are never allowed to revert back to partitions that are 1500 kilometers or more on a side.

Model Ensemble

  • ADDED: Individual models now report 0 for predictions if the training data set contains less than 10 positive observations of a species and the mean spatial coverage within the model is greater than or equal to 5%.
  • CHANGED: Range boundaries are now set weekly to have the highest level of ensemble support, between 50% and 95% of models, while including at least 99.5% of positive observations, changed from being fixed at 75% of models in previous versions.
  • CHANGED: Zeroes in data products that are outside of the prediction area for species (also known as assumed zeroes) are now based on the mean spatial coverage of checklists within those areas. For locations where species-specific models did not report zero or non-zero predictions, locations need to have, on average, across the up-to 100 models in the ensemble, 0.1% of 3km grid cells filled with at least 1 checklist for a given week to be reported as zero. Previously, these locations required 95% of models at a given location to have had at least 50 complete checklists for the given week.

Seasonal Products

  • ADDED: When averaging weekly estimates to represent resident species, reviewers select a subset of weeks, as opposed to having previously averaged the entire year.

Data Products

  • ADDED: There are now 184 species modeled at a fully global extent. The overall species total is now 807.

Expert Review

  • ADDED: Expert reviewers now assign quality scores for the full-year, animations, and all seasons.