discussion / Camera Traps  / 20 June 2024

Background subtraction to improve camera trap AI

Hello All,

In my work I use a lot camera traps to record videos of wildlife.

It's time consuming select the videos with wildlife, I use to have hundreds of videos where nothing happens. 

After testing some AI software to analice videos and detect if there is wildlife or not, I realize that even with AI is very time consuming, usually my computer takes one full day of computing in analyze only one card, with all energy required.

Some years ago, in my early career as a computer vision engineer, I was using a lot the Background Subtraction feature in openCV:

Current AI software, analyze video frame by frame, without any context of frames after and before.

Since normally camera traps positions are fixed, will be very easy to apply the background subtraction to:

  • Determine if there is animal presence (If there is a "blob bigger than xx pixels").
  • Reduce AI analysis time by analyze only frames with movement, (or analyze only a fraction of the frame with the "blob")

 

I wonder if anyone knows any software that uses this fast technique to separate videos with animal presence or not, or if somebody here works in wildlife detection software, I invite to use this technique to improve the software.

Kind Regards

 

 




My take, YMMV: there may be a place for background subtraction here, but it's a tough nut to crack, and you could spend forever tuning heuristics without getting this to really work.  The essence of the issue is that the movement of many animals tends to be quite small compared to typical background movement in camera trap scenes with lots of vegetation.  Can you find elk walking by a concrete wall?  Definitely.  But a mouse or even a coyote moving among twigs and leaves?  At best, you would probably have to do a lot of tuning for each camera.  If your scenario has a stable background and animals don't linger, it might work great.

That said, the most sophisticated approach to this that I'm aware of for camera traps is:

Penn MJ, Miles V, Astley KL, Ham C, Woodroffe R, Rowcliffe M, Donnelly CA. Sherlock—A flexible, low‐resource tool for processing camera‐trapping images. Methods in Ecology and Evolution. 2024 Jan;15(1):91-102.

Definitely check that paper out, it's the closest recent work that I'm aware of to what you're describing.

Some other notes:

  1. Background subtraction works great for thermal images.
  2. This is not exactly camera traps, and it's not exactly simple background subtraction (since it still uses deep learning), but DeepMeerkat learns the background properties in video for the purposes of finding animals.
  3. Definitely camera traps, but still not exactly what you're getting at, since it still uses deep learning, but Context R-CNN takes advantage of temporal context (i.e., learns the background, among other things) in a way that most models don't.
  4. EventFinder predates the extensive use of deep learning in this field, but FWIW, it uses a non-deep-learning-based approach to background subtraction.
  5. Background subtraction and single-frame classification/detection are not entirely mutually exclusive; this work uses motion cues as an additional information channel for a more typical deep learning approach (doing stuff like "remove the red channel and use a motion channel instead") (disclosure: I'm 7000th author on that paper).
  6. The only tools that I'm aware of that have a true video-first approach to Camera Trap AI are Zamba and its cousin Zamba Cloud.  Those also aren't quite what you're asking about, because they use deep learning, but they do implicitly learn motion parameters, in the sense that they operate on whole videos.

Hope that helps!

Lars Holst Hansen
@Lars_Holst_Hansen
Aarhus University
Biologist and Research Technician working with ecosystem monitoring and research at Zackenberg Research Station in Greenland
Conversation starter level 3
Popular level 3
Poster level 2
Involvement level 3
Commenter level 4

This reminds my of this preprint:

FoxMask: a new automated tool for animal detection in camera trap images

https://www.biorxiv.org/content/10.1101/640037v1

 

Peter Bull
@pbull  | he/him
DrivenData
Engineer and AI for Good leader working on bringing machine learning tools to social impact organizations.
Inventory_key_contact

We evaluated background subtraction as part of our pipeline for Zamba Cloud a few years ago. It's going to depend a lot on your backgrounds, but we found that for jungle/forest scenes in videos there was too much movement/noise in the backgrounds for the usual methods to add substantial benefit to the accuracy of our classifications.

To classify frames, we run a very fast distilled version of megadetector to select frames from videos that are likely to have animals and then classify using a CNN. We found this to be much more computationally efficient with respect to accuracy gain versus RCNNs or some dual stream approaches that try to account for movement with something like optical flow.