discussion / Acoustics  / 20 December 2018

Machine learning to detect fish bomb blasts

Over the past year I have been involved in a project trying to detect illegal fish bomb blasts in Tanzania using acoustics. We have 4 long term recorders deployed along the Tanzanian coastline which can listen for and triangulate the position of fish bombs. Currently I’m using PAMGuard as an initial detection stage, saving any loud low frequency sound to a small .wav clip. We can end up with thousands of clips in a month, most of which are whales, fish or other weird reef sounds. It is not feasible to manually annotate all these clips, especially on a shoestring budget so I made an interactive program called SoundSort similar to (or perhaps inspired by) a google experiment which displays the clips on a grid and then clusters them using a t-SNE algorithm. That way similar sounds should be grouped together. A manual analyst can then navigate the grid and annotate the sounds of interest much faster. There’s a more complete explanation of the project and program here

I thought I’d post on WildLabs because the program is open source and might be useful to others and, although we’re using for analysis now, it can definitely be improved. Specifically, I have two questions which the WildLabs people might be able to help with but any other suggestions comments also welcome.

1) At the moment I am feeding a flattened spectrogram image to the t-SNE algorithm. There are more sophisticated feature extraction methods out there which might help optimise clustering of sounds. Does anyone have any experience of/suggestion for feature extraction methods that work when it comes to predominately biological sounds?

2) The t-SNE algorithm clusters the spectrogram images and then represents them as co-ordinates in 2D. To make the program more user friendly and allow for rapid manual annotation, it’s best to display the clustered points on an evenly distributed grid. The algorithms to this (an assignment problem) are quite processor intensive for the thousands of clips we have and can in fact take orders of magnitude more time than the t-SNE machine learning part. A great explanation of this concept is here. Other folk seem to be using the Jonker-Volgenant Algorithm which as far as I can gather is based on a piece of 90’s C code that it seems very few people (including me) actually understand but which can solve this type of problem rapidly. I can find no good Java implementation of this which works. Does anyone have suggestions for other time efficient algorithms (i.e. probably not Hungarian matching algorithm etc.) that might solve the problem, ideally with a Java implementation?

Thanks! Jamie