New guide published on sharing DNA-derived occurrence data

A new guide released by the GBIF Secretariat, Publishing DNA-derived data through biodiversity data platforms, aims at providing holders of genomic and metagenomic information with practical considerations for resurfacing DNA-derived occurrences in biodiversity data platforms like GBIF.org.

Date published: 2021/09/20

The GBIF Secretariat has released a new guide, Publishing DNA-derived data through biodiversity data platforms, aimed at providing holders of genomic and metagenomic information with practical considerations for resurfacing DNA-derived occurrences in biodiversity data platforms like GBIF.org.

Read the full guide here.

An expert team of co-authors from Australia, Estonia, Norway, Sweden and Denmark has described principles and practices for holders of DNA-based data interested in increasing its usability beyond its initial "omics" contexts.

The use of genetic data to detect, describe, classify and quantify taxa has become widespread in molecular ecology, phylogenetics and other areas of biodiversity research. The new guide helps holders of such data realize the potential of making it accessible to a wider range of biodiversity research and policy.

By refraining from describing platform-specific details documented elsewhere in favour of common terms and schemas, typical pitfalls, and best practices, the general approach the authors have taken applies to sharing of DNA-derived data through any biodiversity data platform, including the many national systems used around the world.

The guide's publication will also allow GBIF network members including the Ocean Biodiversity Information System (OBIS) and the Atlas of Living Australia to align the efforts of their own networks to combine DNA-detected records with existing data from museum specimens, field surveys and monitoring, citizen-science projects and other sources.

The guide benefitted from discussions with members of several different DNA-focused communities, including the Biodiversity Information Standards (TDWG) Genomic Biodiversity Working Group and the TDWG task group on sustainable Darwin Core-MIxS interoperability. These communities' collective expertise helped clarify how best to apply terms from genomic data standards and guidelines including MIxSGGBN) and MIQE. These additions are supported through a new Darwin Core extension for DNA-derived data now in production in both the GBIF Integrated Publishing Toolkit (IPT) and GBIF.org.

While the practices detail how to share DNA-derived data in biodiversity platforms, the guide reinforces the community expectation that primary genomic data is first shared through the International Nucleotide Sequence Database Collaboration (INSDC). By resurfacing DNA-derived data through other biodiversity platforms, data publishers contribute to a more complete view of world's biodiversity, with specimens, observations and sequences contributing to a single searchable resource. Cross-platform indexing of data helps produce richer datasets with better metadata, while GBIF's data-clustering algorithm identifies potential links between the different sources of evidence used in natural history and molecular biology.

The guide represents the next stage in GBIF's efforts to connect its data infrastructure and tools with relevant sources of genomic and metagenomic information, building on collaborations with the UNITE CommunityEMBL's European Bioinformatics Institute (through both its European Nucleotide Archive and MGnify platform), and the International Barcode of Life Consortium (IBOL).

Authors

Originally published on the GBIF website on 14 September 2021.