AI Animal Identification Models

David Hunter

@davidhunter
| he/him

University of Colorado Boulder

PhD student exploring design and technology to connect people with nature and the environment

Groups

Hi Everyone,

I've recently joined WILDLABS and I'm getting to know the different groups. I hope I've selected the right ones for this discussion...

I am interested in AI to identify animals using cameras. Ideally, I want to do this on a mobile phone and eventually on an Edge Computing device.

I've been using ml5js and an object detection model (cocossd and yolo v2) to do realtime detection in a mobile browser but the detection is very generic e.g. person, banana, laptop, cup. I would like to swap the model for something more specific to wildlife, but drawing a blank on finding a pre-existing model to use. I think for the model to work with my javascript setup it needs to work with Tensorflow.js.

What models are people using or are you all training your own? If you're training your own can it be done using Google Teachable Machine or is something more sophisticated required? Do you use transfer learning? (I am new to Machine Learning so please excuse my lack of knowledge).

Running before I can walk, but what boards/kits/setups do people recommend for deploying AI computer vision animal identification to the Edge? I am familiar with Arduino, but imagine there are good alternatives that might be better.

Thanks for any guidance on this! I'm happy to collaborate on such a project if anyone wants to, I just started a PhD in creative technology & design.

Many thanks, David

Lars Holst Hansen

@Lars_Holst_Hansen

Aarhus University

Biologist and Research Technician working with ecosystem monitoring and research at Zackenberg Research Station in Greenland

30 March 2023 5:25pm

Hi David,

I have no specific advice as I am still a sideliner when AI is concerned but Dan Morris' @dmorris

has a ton of info which could be a great starting point perhaps.

Also you may want to cross post to the https://www.wildlabs.net/groups/camera-traps group as you specifically wants to work on AI on images.

Cheers, Lars

Dan Morris

@dmorris

I help conservation scientists spend less time on boring stuff.

5 April 2023 11:54pm

I don't know if I can exactly answer your question, but I think I might just be the world champion at talking people out of doing AI on the edge. :)

But seriously, my biggest pro tip for training edge models is to put a really fine point on your requirements before doing any ML. You can more or less run any model on any hardware if you're willing to wait long enough, and every bit of model-shrinking or model customization that you do will cost you a lot of engineering time and some amount of accuracy. So it benefits you to do a really thorough analysis of how long you can tolerate per image in terms of inference time... if you are running on a modern mobile phone, you can run just about any model in existence in <60 seconds without having to get into the business of model compression or deep customization. Is that OK for your scenario? And what kind of accuracy do you need? And if you're on a phone, are you *sure* you want to run your model on the device, rather than in the cloud? E.g. if 75% of your users would have connectivity, can you support just those 75% to start with? Or does your scenario fundamentally require edge inference, e.g. an app specifically designed for national parks where connectivity is unavailable more than it's available?

In terms of specific models or training data, the more details you can provide about what you want to monitor, the more folks here will be able to point you toward possible training data sources and existing models. What wildlife do you want to classify? Do you need to handle night-time images well?

I would also put some time into tinkering with the iNaturalist and Merlin apps and contrast them against your goals. I'm not saying they've solved every problem in the universe or that you shouldn't build something new. But they're both really good at what they do, and understanding how they compare to what you want to build will shed a lot of light on important details. And they're fun to play with!

-Dan

David Hunter

10 April 2023 4:56pm

Thanks Dan for the detailed response. That is really helpful.

I'm based-in Boulder, Colorado, so I'm primarily thinking of birds and mammals. Is that too broad?

I'm envisioning a system that is worn on the body like a body camera (but likely at this stage a mobile phone) and passively capturing wildlife as they walk. Then a user can review what has been captured later or watch the id stream in realtime if they wanted to.

I'm loving iNaturalist since I moved to Boulder and also find Merlin quite useful (and amazing how good the suggestions are). I guess the difference is I'm looking at real-time video/image and passive rather than relying on a user to take a photo and uploading to the app.

At the moment having the model run on the edge is preferable as I'm looking at funding sources around edge computing and not relying on the cloud seems to be the preferable direction we all go in (but I could be wrong).

Ideally I'd like it to analyse in real-time, like the COCOSSD and YOLO models can manage, so something like 10fps but perhaps that is fantastical with a wildlife detector model? I haven't been able to find a wildlife-focused model to test out and see what the performance is like compared to the more general object detectors I have used.

At the moment I was only thinking of daytime.

David Hunter

13 April 2023 5:10am

OK I have a working prototype of the webapp I mentioned before. This is with a general object detector so good at spotting people, cars, and identifying objects in real-time. Video below.

But as you can see from the second video it can't identify a squirrel or mistakes for a bird or a sheep! So I need to swap to a different model or customise the current model...

@dmorris any thoughts on how achievable my idea is after seeing a video with a different model? Are there wildlife models that can detect that quickly in the browser without the cloud?

Thanks

Dan Morris

14 April 2023 1:47am

Unfortunately, to my knowledge, speed aside, there is not an existing model that will check all the boxes you want to check (an object detector that knows about, e.g., squirrels). Almost every off-the-shelf object detector is trained on COCO, whose animal classes are bear, bird, cat, dog, giraffe, horse, sheep, and zebra. There aren't a *lot* of zebras or giraffes wandering around Boulder, so that's not leaving you with a lot of classes.

So, you're likely to be training your own model if you stick with these requirements. I'll list a few resources you might want to check out for models and training data, but a few other things I would think about first:

Do you definitely need an object detector? I.e., do you definitely need bounding boxes? If you don't care about the location of the animal within the image, and you don't care about the scenario where two different species are present, you may be OK with an image classifier, which is going to run gobs faster, and also opens a couple of opportunities to *maybe* use existing models (more on this below).
Do you definitely need 10fps? E.g., if you can get to, e.g., 0.2 fps, does that qualitatively change the scenario? Again, if you can relax your speed requirement, you *may* be able to use some existing models.

Other models you may want to look at:

If you want to consider an image classifier, ImageNet-1000 has a surprising number of animal classes, and lots of pre-trained classifiers are trained on ImageNet-1000. For example, I *think* the default YOLOv8 image classifier is trained on ImageNet-1000, and it comes in a variety of sizes/speeds.
If you want an object detector but you might consider reducing your frame rate substantially, you could consider using MegaDetector to find animals. You might be able to combine this with an ImageNet-trained classifier to find, e.g., squirrels without training a custom model. Compared to most ImageNet-trained classifiers, MegaDetector is - in scientific terms - slow AF, but maybe you want to compromise on frame rate for an MVP.

If you find yourself training a model, though, IMO your one-stop-shop for training data is likely to be the iNaturalist 2017 competition data, which includes bounding boxes. That's a pretty good fit to your problem.

Good luck!

David Hunter

20 April 2023 12:53am

Thanks Dan, that is very helpful. No zebras here but I did see four deer wandering through the streets this morning. Quite wild at times here!

I am totally willing to try an image classifier if it reports multiple objects it identifies in a scene. I will give this a go.

I think 1 fps would be quite acceptable actually, and in some perspective actually advantageous in reducing how much data is getting logged.

I tried upgrading my existing object detector model to YOLOv8 following the links you sent, but I don't think it is possible to upgrade the model on the framework I'm using (ml5js) so I think I will have to try a different framework.

Thank you.

David Hunter

@davidhunter | he/him

University of Colorado Boulder

PhD student exploring design and technology to connect people with nature and the environment

11 April 2023 4:09pm

This looks like an interesting thread, I'll have to dig in...

Bas Michielsen

@Bazzz

PhD student in the field of AI for wildlife/nature conservation.

25 April 2023 2:08pm

Hi David

It appears that you have been looking for existing models, however, most existing models are trained on either COCO or some other very generic dataset. So, if you want to identify just animals, you may be better off training your own model. It seems no one in this thread mentioned yet that it is possible to do transfer learning on existing models, which keeps most of the "visual part" of the model as is, but just changes the classification part so it can identify other things. This way you can take an existing model trained on COCO and in a fraction of the time it takes to train a full model, just retrain that for your animals.

Also have a look at your requirements for the inferencing stage. Some models take long in training but are superfast in inference and others are slow in both cases but very accurate, etc. If you want semi-realtime inferencing, you are probably looking at single shot detectors (SSD), and not RCNNs.

Aarhus University

University of Colorado Boulder

Wildlabs.net : The conservation technology network

Wildlabs.net : The conservation technology network

AI Animal Identification Models

David Hunter

University of Colorado Boulder

Groups

Lars Holst Hansen

Dan Morris

David Hunter

Bas Michielsen