I am no expert in the underlying machine learning models, algorithms, and AI-things used to identify animals in the various computer-vision tools out there (MegaDetector, Wildlife Insights, Deep Faune, etc), so please excuse my possibly uninformed question and my likely incorrect use of jargon (or knowledge of the matter).
I've gotten to use several of computer-vision tools over the years and I've found that, in most cases, they fail to locate/identify small animals when those animals don't really "contrast" with the background, when a photo is blurred by fog, or when only parts of an animal is visible (I work with a burrow-nesting seabird that nests in fog-prone Caribbean mountains). I am usually (at least, more often than computer vision models) able to see those animals because I can look at several images in a flipbook way: "flipping" through them fast enough so I can detect the changes in photos, either in forward or back-and-forth between two images.
I always wondered how much of this "flipbook", sequence-based approach is implemented in computer-vision. I'm guessing it is but it's difficult for a non-expert to understand all the underlying machinery of computer-vision.
Also, could an approach like "motion extraction" used in videos (https://www.thisiscolossal.com/2024/01/motion-extraction/) be useful to analyze sequences of still images - if not already implemented?