Despite a rich scientific heritage of environmental exploration and classification, we are still not much closer to understanding how many unique species populate our planet.
Estimates vary widely from as little as 5 million up to 100 million, a staggering degree of difference. To reach these numbers, the majority of studies extrapolate from historic rates of species description, but how many unique species have already been described? This question is more difficult to answer than it initially appears, as our taxonomies are littered with many names for a single species. This makes interpreting both species richness and the attached occurrence data more complicated.
What’s in a name?
Taxonomic nomenclature is based on the Linnaean system, comprising of a genus, species and the author who described the specimen. The name must be linked to a type specimen to be valid. When the species is formally described twice, either through coincidence or a misinterpretation of population-level variation, it can end up with two different names. As the names chosen are often subjective and based on a variety of social and environmental cues, the resultant names may be very different. In more recent decades, reinterpretation of existing specimens and molecular investigations have led to a proliferation of name changes. Species are now often moved to a different genus or split into multiple species, creating more synonyms.
These taxonomic synonyms are not synonyms in the literary sense. Rather, there can only be one ‘correct’ biological synonym for a species at any time. For example, the Western Hoolock Gibbon, Hoolock hoolock, has also been known in the genera Simia, Hylobates and Bunopithecus, but these names are now antiquated. When a species is described in parallel or with different regional names, who decides the ‘correct’ name? This depends on the taxonomic group. For example, birds are well studied and there are governing bodies to make these decisions. However, for plants there has historically been no central authority. This has led to highly variable names on a continental, country-specific and even regional level. Some herbaria and gardens have recently attempted to collate lists of known plant species and their synonyms, a famous example being Kew’s ‘The Plant List’. This database is not freely downloadable, however, making it difficult to access. Regardless, synonyms provide a layer of complication for managing and interpreting biodiversity data.
Identifying the Identifiers
For the conservation of biodiversity, synonyms are an inconvenience. An ideal species identifier would not vary through space and time, whereas current taxonomic names can change rapidly. Without a coordinated central taxonomy, remnant synonyms can float around inflating species counts, creating problems for biogeographical studies. For example, one interested in the Western Hoolock Gibbon would have to know of the four synonyms, search for occurrences of these, and aggregate the data. As these untidy elements litter our taxonomic information, more time and effort is wasted sorting our data.
Bioinformatics — the application of technology to biodiversity data — offers a bright future to overcome this problem. For example, the Global Biodiversity Information Facility (GBIF) collates species globally from 40 taxonomic databases and occurrence data from a range of contributors, and has recently created programatic public access to its species database. Cross-validation between different records is used to classify names into accepted, synonym and questionable. Thus, occurrence data can be grouped for a species despite being recorded against synonym taxa. However, this approach is not without its faults. Last September, a GitHub ‘paper’ criticised the level of synonym detection in GBIF. The author used a simple method of detecting when a species, author and year combination existed more than once. Although this may include false positives, it flagged taxa that may be problematic. For example, it identified the frog family Rhacophoridae, which on further examination had competing classifications from two different databases both appearing in GBIF.
Online data aggregation efforts have shown great potential to decrease uncertainty relating to the species synonym problem. I believe that open, accessible and aggregated data is key to mobilising experts, both from institutions and the armchair, to create a taxonomy with minimal species duplication. GBIF, working with other initiatives, still has a long way to go though. The Plant List, for example, has 22.8% unresolved plant names. Using cross-tabulation, we could make better informed estimates of what we have and better models of where it is, greatly enhancing our knowledge for biodiversity conservation.