discussion / Acoustics  / 16 February 2024

Labelled Terrestrial Acoustic Datasets

Hello all,

I'm working with a team to develop an on-animal acoustic monitoring collar. To save power and memory, it will have an on board machine learning detector and classifier to determine what data to record. We are looking for acoustic labelled data from terrestrial environments to train our model, ideally with labels from multiple species/taxa if possible.

Does anyone have suggestions where we could find such datasets for training our model? We would like the model to function globally, so any geographic region will help. Our initial deployments will be in North America on deer, so datasets from that area would be most useful right now. 

 Thank you,

Jesse Turner 




Lars Holst Hansen
@Lars_Holst_Hansen
Aarhus University
Biologist and Research Technician working with ecosystem monitoring and research at Zackenberg Research Station in Greenland
Conversation starter level 3
Popular level 3
Poster level 2
Reactor level 3
Involvement level 3

Hi Jesse!

Interesting approach! Are you mostly going after sounds produced by the animal (either vocally or by its behaviour) or environmental sounds (other animals, wind, rain, streams, traffic etc.). 

What are your thoughts about rare and unknown sounds which will not trigger the detection but may still be very interesting?

Nice to see another thread about animal borne audiologgers (hint hint @StephODonnell ...)

Cheers,

Lars

Hi Jesse,

It is great to hear about another team working to improve this area of research.

Are you sure that you can get significant energy savings by processing onboard? In my (admittedly limited) experience, that processing has underlying energy costs. Also the sounds you might want to detect can have a massive range of qualities and as @Lars_Holst_Hansen has suggested, you may not even know exactly what it is you will want to have recorded.

I investigated using a smart recorder (µMoth) v a 'dumb' recorder (Lab2 S17e) for my koala project. The µMoth used about 3.5-4 times more juice than the simple S17e which is a big deficit to overcome before you even ask the recorder to do something "smart". Higher power requirements also limit the battery tech that will work efficiently for you. 

After you have your neatly separated recordings, you may still be asking yourself 'what did I miss?'.

I would happily share my labelled data with you if it was from your area of interest. Apologies for not being able to help with your core request. I assume that you have already raided the repository of: 

 

It is not too useful for terrestrial mammals yet but has some good labelled sounds for birds and soundscapes.

Please tell us more about your project, particularly the hardware side.

I try to keep a list of labeled bioacoustic datasets (including pointers to other lists, or pointers to large repositories like Xeno Canto) here:

https://lila.science/otherdatasets#bioacoustics

Let me know what I'm missing!