article / 5 July 2026

Deep Voice - A Free Online Platform for AI-Based Marine Mammal Sound Detection and Classification

Passive acoustic monitoring floods marine researchers with data that can take months to annotate by hand, and the AI models that could help have long required Python setup, GitHub repos, and complex config files. Funded by the WILDLABS Awards 2025, Deep Voice removes that barrier with a free, public web app that turns marine mammal sound detection into a simple drag-and-drop task.

Introduction

 

Deep Voice is a non-profit organization dedicated to building open-source AI tools for marine bioacoustic conservation. Over the past seven years, our team of over 30 volunteer engineers, researchers, and bio-acousticians has trained species-specific detection models to help partners worldwide. While passive acoustic monitoring is an incredible way to study marine populations, the sheer volume of data often leaves researchers stuck spending months manually annotating recordings.

 

We knew our AI models could save thousands of hours of manual work, but there was a catch: utilizing them required non-technical researchers to configure Python environments, clone GitHub repositories, and learn complex configuration files. Thanks to the WILDLABS Awards 2025, we finally had the resources to tear down that barrier. The result is a completely free, public, browser-based web application where anyone can drop in their audio recordings and get automated detections — no coding required!  

Deep Voice Volunteers
 

 

Journey Through the Grant Period


Our journey started in the summer of 2025 when we launched a single-species demo on
HuggingFace Spaces to prove that our inference pipeline could run successfully in a
browser on free CPU hardware. But to make this tool genuinely useful for biologists, we
needed to listen to the people using it in the field.


Over the following months, we iterated continuously to address the workflow pain points
that researchers actually face. We implemented live runtime estimates and resolved
headaches caused by sample-rate mismatches. Now, the platform uses lenient
sample-rate handling—if you upload a 96 kHz file to a model trained for 2 kHz, the
platform transparently resamples it for you. Along the way, we also tackled technical
challenges, such as fixing upstream library-version bugs in the underlying open-source
framework, ensuring the tool runs reliably for the whole community.

 

 

Achievements and Outcomes


Today, the fully functioning web platform is live on HuggingFace. It currently supports
eight validated AI detection models spanning both low and high-frequency species:


  • ● Arctic cod fish
    ● Greater Caribbean manatee
    ● Burrunan dolphin (with models covering barks, echoes, buzzes, and whistles)
    ● Killer whale (a multi-class model covering upsweeps, downsweeps, tones,
    squeaks, and clicks)
    ● Humpback whale


How to use it:


  • 1. Pick a species: Select your target species or call type from the dropdown menu,
    which automatically updates to show the recommended detection threshold.
    2. Upload recordings: Drop in one or multiple .wav files (up to 500 MB per
    request).
    3. Run inference: Hit the run button and watch the live progress bar.
    4. Get results: The platform instantly generates a CSV file of probability scores and
    a single merged, ready-to-use Raven-compatible selection table that you can
    visually inspect in Raven Pro or Lite.


You can try the platform right now here: Deep Voice Detection Space.
A short demo is available on YouTube:

 


Add the first post in this thread.

Want to share your own conservation tech experiences and expertise with our growing global community? Login or register to start posting!