discussion / AI for Conservation  / 17 April 2024

Faces, Flukes, Fins and Flanks: How multispecies re-ID models are transforming Wild Me's work

Hi everyone!

@LashaO  and @holmbergius from the Wild Me team at ConservationX Labs gave a superb talk at last month's Variety Hour, sharing their recent breakthroughs with multi-species re-id models. 

Questions & discussion coming through during the session: 

How close do the species in a multi-species have to match (e.g. all dolphins and whales, or could you also add say, seals)?

  • The closer to the domain the better generalization you may expect. We started multi species work with very similar groups but are expanding diversity. Diversity means better generalization for more distinctive species, and we have not hit a wall yet. Current Flukebook 21 species model would likely not generalize well to seals, but if trained jointly - will get up to speed with much less data (order of magnitude) than would with single species training. We are building ~50 species model results of which should come out next month.

Which model were you using to show the graphs comparing the accuracy across species? Which types of models does it work best with?

  • We are using a pipeline developed for Wildbook, called MiewID: https://github.com/WildMeOrg/wbia-plugin-miew-id
  • Core architecture is quite simple - CNN backbone with a metric learning head. Model used in the graph comparison is the efficientnetv2_m - a default for MiewID model. Experimentation for various CNN and transformer backbones all yielded similar fruitful results.
  • We are experimenting with larger backbones as we scale up the datasets, and juggling with the balance of performance and scalability.

Do you use a specific matching / comparison network to give the predicted individual?

  • The model that we use is called MiewID https://github.com/WildMeOrg/wbia-plugin-miew-id
  • Though, you don’t have to be familiar with any ML to use it. The Wildbook platforms give you ability to ingest your data, and kick off individual search queries served by MiewID. Search query is executed by comparing visual similarity of your query image to all the other images in your database and match candidates are returned that you can review and choose from.
  • This is what it looks like in practice: https://www.youtube.com/watch?v=sJr27BJ5J7g

We need an ‘imagenet’ for animals ie a large dataset, with a uniform format, so that large NNs can be trained

  • Agreed. There actually lots of publicly available datasets, some even shared by Wild Me like this one: https://lila.science/datasets/beluga-id-2022/
  • We have found that most of the public datasets (even ones historically shared by us) have lots of issues that can cause ‘lies with statistics’ - metrics looking much better than it looks. Lot of work has been put into mitigating this for multi species datasets. Having a stable, diverse benchmark that well reflects real world performance is going to be paramount to support innovation. We hope to build up to a point of data used in these experiments being openly accessible - which given the parties involved in sourcing this is a long and painful process - but we hope to get there.

Thanks for the talk Jason and Lasha - so cool to see! I’m really interested to see what the implications are for data labeling requirements as that’s still one of the biggest barriers to engagement we run into from the conservation partners’ side

  • Agreed - we have observations that in the multi-species fashion, 25% of the original well-rounded data requirement can get a completely new species 80% of the way. That means ~100 individuals, with 10 samples per individual. Compared to 500x10 requirement previously. We have seen success with even less, down to 20-50 individuals. The main problem there is the data subset is so small that it is hard to validate and assure the model is working well.
  • For individuals that look very similar to domain - say a species that can be identified by a dorsal fin - it can be as low as 0 to a couple of samples.
  • Here are the results of leave-one-out experiments where the model has not been trained on the species it is scored on at all: 
  • 80% of species get a correct match in the top-20, 80% of the time. This again means - the model is not autonomous, but it can make your life hell of a lot easier in searching through match candidates.

If you have further questions, drop them below!