##Background Erroneous database entries and problematic geographic coordinates are a central issue in biogeography and there is a set of tools available to address different dimensions of the problem. CoordinateCleaner focuses on the fast and reproducible flagging of large amounts of records, and additional functions to detect dataset-level and fossil-specific biases. In the R-environment the scrubr and biogeo offer cleaning approaches complementary to CoordinateCleaner. The scrubr package combines basic geographic cleaning (comparable to cc_dupl, cc_zero and cc_count in CoordinateCleaner) but adds options to clean taxonomic names (See also taxize) and date information. biogeo includes some basic automated geographic cleaning (similar to cc_val
, cc_count
and cc_outl
) but rather focusses on correcting suspicious coordinates on a manual basis using environmental information.
Table 1. Function by function comparison of CoordinateCleaner, scrubr and biogeo.
Functionality | CoordinateCleaner 2.0-2 | scrubr 0.1.1 | biogeo 1.0 | Percent overlap |
---|---|---|---|---|
Missing coordinates | cc_val | coord_incomplete | missingvalsexclude | 100% |
Coordinates outside CRS | cc_val | coord_impossible | - | 100% |
Duplicated records | cc_dupl | dedup | duplicatesexclude | The aim is identical, methods differ |
0/0 coordinates | cc_zero | coord_unlikely | - | 100% |
Identical lon/lat | cc_equ | - | - | 0% |
Country capitals | cc_cap | - | - | 0% |
Political unit centroids | cc_cen | - | - | 0% |
Coordinates in-congruent with additional location information | cc_count | coord_within | errorcheck, quickclean | 100% |
Coordinates assigned to GBIF headquarters | cc_gbif | - | - | 0% |
Coordinates assigned to the location of biodiversity institutions | cc_inst | - | - | 0% |
Coordinates outside natural range | cc_iucn | - | - | 0% |
Spatial outliers | cc_outl | - | outliers | 50%, biogeo uses environmental distance |
Coordinates within the ocean | cc_sea | - | - | 0% |
Coordinates in urban area | cc_urb | - | - | 0% |
Coordinate conversion error | dc_ddmm | - | - | 0% |
Rounded coordinates/rasterized collection | dc_round | - | precisioncheck | 20%, biogeo test for predefined rasters |
Fossils: invalid age range | tc_equal | - | - | 0% |
Fossils: excessive age range | tc_range | - | - | 0% |
Fossils: temporal outlier | tc_outl | - | - | 0% |
Fossils: PyRate interface | WritePyrate | - | - | 0% |
Wrapper functions to run all test | CleanCoordinates, CleanCoordinatesDS, CleanCoordinatesFOS | - | - | 0% |
Database of biodiversity institutions | institutions | - | - | 0% |
Taxonomic cleaning | - | tax_no_epithet | - | 0% |
Missing date | - | date_missing | - | 0% |
Add date | - | date_create | - | 0% |
Date format | - | date_standardize | - | 0% |
Reformatting coordinate annotation | - | - | a large set of functions | 0 % |
Correcting coordinates using guessing and environmental distance | - | - | a large set of functions | 0 % |