discussion / Acoustic Monitoring  / 2 November 2022

Unifying acoustic metadata

Let's be honest, there are only a few of us that get super excited about metadata standards. However, it's perhaps ironic that the highly technical (and perhaps boring to some) task of creating a metadata standard and the databases/infrastructure to support it actually opens up a whole host of super exciting possibilities with significant implications for science and conservation. For example, imagine having a centralized database of acoustic studies, all of which comply to the same metadata standard - anyone could mine that database for historical studies (e.g. at the correct sample rate, location, time of year, noise floor, calibration etc.) allowing for integration of data and larger spatial/temporal scale meta analysis that could provide far richer ecological insights than might have been possible for each individual study. 

Unfortunately, our bioacoustics metadata standards are a bit of a wild west at the moment - but in marine bioacoustics field this is all about to change. In 2016 a group in the US came up with a database standard called Tethys described in this open access paper and available here. Tethys provided the basis for a unified database system which provides a schema conforming to bioacoustics metadata standards currently being finalized. However, whilst Tethys is used by NOAA and some other organization involved in larger scale studies, there remains a fairly steep learning curve and it could only be accessed programmatically which has so far impeded uptake. The Scottish Oceans Institute and Tethys  (San Diego University) have received a significant chunk of funding from BOEM to update Tethys and make it easier to use. PAMGuard (open source software widely used for analysis of marine bioacoustic data) will seamlessly integrate with Tethys, allowing users to easily create a Tethys database based on their processed data without a single line of coding. That means every detector, classifier and and signal processing module in PAMGuard will conform to  acoustic metadata standards - providing a complete record of acoustic workflows used in analysis and organizing the resulting data. Each PAMGuard database will then be Tethys compatible making it super easy to upload to a centralized government or institutional database, hopefully, encouraging data sharing and eventually wider scale analysis of multiple datasets. 

While we are focused on marine bioacoustic data, there is no reason Tethys cannot (and indeed it has been)  used in terrestrial acoustics.  Everything is open source so other bioacoustic software developers can also implement Tethys integration if they wish to do so. Whilst we have no funding to support this, this project will be ongoing over the next four years and I am happy to answer developers or any other folk's questions on Tethys  (I'm learning myself so you may have to bear with me). So this post is really just to tell the community what we are doing and, if you wish to learn more we will do our best to help out.  



Super interesting work! I wonder how accomodating Tethys is of incomplete data--for example, my research group works primarily at the soundscape level and may not be running any algorithms or detecting individual species. In contexts like this, it might make sense to work with some, but not all, of the major fields the standard is working with.

Hi Jamie, this is super exciting! I had not realized that PAMGuard integration was going to be part of the plan for Tethys - so thrilling! 

One quick question - when processing large datasets, often I end up with a series of binary/database files (e.g., separate runs for separate frequency bands). Does Tethys accommodate the multiple file scenario? 

Would love to give it a whirl when appropriate.