discussion / AI for Conservation  / 27 March 2020

Open, challenging dataset for audio classification

Hi!

Do you know of an open audio dataset that could be used for audio classification? I am a part of a group of students doing the fast.ai deep learning course right now. I wanted us to exchange best practices on working with audio (and research new methods as they apply to working with animal audio) but the dataset I went for unfortunately proved too easy for deep learning models (the first submission from a student brought the error on the validation set to 0.2%). You can find the repository we are collectively working on here.

If a dataset like that would come to your mind and you would be so kind and share the details with me, that would be greatly appreciated!

Thank you!

Radek




Hi Radek, 

I'm sure others can help here, but check out our recent virtual meetup (it'll be posted here in about an hour), the speakers - particularly Dave Watson - shared open datasets that might be what you're looking for. 

Over on Twitter, Jesse Alston is collating a google sheet so that people can advertise data sets that grad students can use to finish theses. @arik 's reply here might be of particular interest: 'We have been recording 24/7 soundscapes in remote US locations like Yellowstone NP and rural central Wisconsin with multiple GPS synced recorders. Our goal is to study wolf and coyote vocalisations, but if anyone can make use of these data for their own studies, drop me a line!.'

Hope this helps!

Steph