Introduction
Deep Voice is a non-profit organization dedicated to building open-source AI tools for marine bioacoustic conservation. Over the past seven years, our team of over 30 volunteer engineers, researchers, and bio-acousticians has trained species-specific detection models to help partners worldwide. While passive acoustic monitoring is an incredible way to study marine populations, the sheer volume of data often leaves researchers stuck spending months manually annotating recordings.
We knew our AI models could save thousands of hours of manual work, but there was a catch: utilizing them required non-technical researchers to configure Python environments, clone GitHub repositories, and learn complex configuration files. Thanks to the WILDLABS Awards 2025, we finally had the resources to tear down that barrier. The result is a completely free, public, browser-based web application where anyone can drop in their audio recordings and get automated detections — no coding required!

Journey Through the Grant Period
Our journey started in the summer of 2025 when we launched a single-species demo on
HuggingFace Spaces to prove that our inference pipeline could run successfully in a
browser on free CPU hardware. But to make this tool genuinely useful for biologists, we
needed to listen to the people using it in the field.
Over the following months, we iterated continuously to address the workflow pain points
that researchers actually face. We implemented live runtime estimates and resolved
headaches caused by sample-rate mismatches. Now, the platform uses lenient
sample-rate handling—if you upload a 96 kHz file to a model trained for 2 kHz, the
platform transparently resamples it for you. Along the way, we also tackled technical
challenges, such as fixing upstream library-version bugs in the underlying open-source
framework, ensuring the tool runs reliably for the whole community.
Achievements and Outcomes
Today, the fully functioning web platform is live on HuggingFace. It currently supports
eight validated AI detection models spanning both low and high-frequency species:
● Arctic cod fish
● Greater Caribbean manatee
● Burrunan dolphin (with models covering barks, echoes, buzzes, and whistles)
● Killer whale (a multi-class model covering upsweeps, downsweeps, tones,
squeaks, and clicks)
● Humpback whale
How to use it:
1. Pick a species: Select your target species or call type from the dropdown menu,
which automatically updates to show the recommended detection threshold.
2. Upload recordings: Drop in one or multiple .wav files (up to 500 MB per
request).
3. Run inference: Hit the run button and watch the live progress bar.
4. Get results: The platform instantly generates a CSV file of probability scores and
a single merged, ready-to-use Raven-compatible selection table that you can
visually inspect in Raven Pro or Lite.
You can try the platform right now here: Deep Voice Detection Space.
A short demo is available on YouTube:
Add the first post in this thread.