Is Google’s Cloud Vision useful for identifying animals from camera-trap photos?

Camera traps have revolutionised wildlife research and conservation, enabling scientists to collect photographic evidence of rarely seen and often globally endangered species, with low expense, relative ease, and minimal disturbance to wildlife. However, processing the sheer volume of data produced by camera traps remains a key challenge for conservationists deploying this tool.

When Aditya Gangadharan found that Google had made some of its automated image recognition algorithms available to the public through its Cloud Vision service, he was curious to see if this off the shelf solution might be useful for camera trap photos. 

Date published: 2016/04/20

Wildlife biologists often struggle with identifying animals from the millions of photos they collect via camera traps. Usually, this requires people to manually click through the photos one by one to distinguish between say photos of moving grass (boring!) versus photos of elephants (hyper-charismatic gigafauna).

And so when I found that Google had made some of its automated image recognition algorithms available to the public through its (beta) Cloud Vision service, I was curious to see if this might be useful for camera trap photos. Now of course, Google hasn't designed these algorithms specifically for scientific/biological use – so it's not reasonable to expect it to identify animals down to species level. Therefore, the best I could expect was some kind of an initial binary classification – some kind of animal present versus no animal present. But even that simple step would be a huge gain if you are talking say one million photos of which only 200,000 contain animals on average. Much easier to classify those 200,000 animals down to species manually if you can safely discard the rest, knowing that they mostly don't contain animals.

So I did a quick and dirty test of Cloud Vision using a set of 713 camera trap photos of leopards, previously identified manually (why leopards? No particular reason). The photos were from camera traps that were setup singly on trails or at trail junctions; most were placed in forested areas with a lot of background detail, and some were at clearings formed by multiple intersecting trails; there were >50 locations represented in the photo sample. You can access Cloud Vision via its application program interface (API); I could not access the API via R, but it works well with Python.

I divided these test photos into 4 ordinal levels, based on how difficult it would be to identify animals in them:

  • Easy: Very clear photo, with at least half body visible; good light; not too close, not too far from  camera. Unquestionably, blindingly and indubitably obvious to any human viewer.
  • Medium: Bit worse than the above in terms of how much of the body is visible, how clear the picture is, how far away (or excessively close) to the camera the leopard is, or how much foliage blocks the animal.
  • Difficult: Even worse than the above
  • Very difficult: Easy to miss for an untrained or careless human viewer; for example, only the tip of the tail is visible, or animal is highly obscured by vegetation.

Here are some examples of photos from each difficulty level, dislayed in order from easy to very difficult (photos are cropped to prevent easy identification of sites):

Google then examines each photo, attaches a tag of what it thinks is in the picture, and sends it back to you. I divided these automatic tags from Google into 3 ordinal levels:

  • Animal: These photos are clearly identified as some kind of animal (in a very broad sense). The tags I placed under this category are: ANIMAL, JAGUAR, MAMMAL, TIGER, WILDLIFE. Note that it never identified any photo as a ‘leopard’!
  • Animal-ish: These are (mostly) also animals, though not the large animals that I am interested in. Basically, Google identifies some object in the photo which is not forest or trees or rocks. The tags I placed under this category are: AMPHIBIAN, AUTOMOBILE, SPORTS UTILITY VEHICLE, REPTILE, INVERTEBRATE, SUPERNATURAL CREATURE, FISH. (Yes, it allegedly identified a supernatural creature and a sports utility vehicle!)
  • Non-animal: These are longer identified as any “animal” of any kind, but are identified as geological or tree-related tags, such as: GEOLOGICAL PHENOMENON, FOREST, TRAIL etc.

Based on the above categorizations, the best we can expect from any generic automated method like this is that ‘easy’ and some ‘medium’ photos should be classified as ‘animal’ or ‘animal-ish’. On the other hand, we would expect that animals in the ‘difficult’ and ‘very difficult’ photos would be missed by the algorithms, and hence placed into the ‘Non-animal’ category.

Here are the results:

Difficulty Level No. Photos No. Photos classified as Animal No. Photos classified as Animal-ish No. Photos classified as Non-Animal % of photos classified as Animal or animal-ish
Easy 331 264 5 62 81%
Medium 165 51 0 114 31%
Difficult 93 16 6 71 24%
Very Difficult 124 11 3 110 11%

So, if photos are easy, then 81% are classified as some kind of animal, very broadly speaking. For medium, difficult and very difficult photos, the success rate drops drastically all the way down to 31, 24 and 11%.

So, how useful does the current version of Cloud Vision seem to be for camera trap photos? Here's my expectation:

  1. It will probably work for you if you have a lot of ‘easy’ photos, but not if you tend to have more difficult photos.
  2. You will still miss close to 20% of the animals even if most of the photos are easy, but thats probably not a problem if the species in question is relatively widespread and if false negatives are random.
  3. If your cameras are paired, it is likely more useful since there is a greater chance of getting a good photo with two cameras than one.
  4. If your cameras are on trails, you are more likely to avoid false negatives compared to cameras set at big trail junctions and clearings. This is because animals tend to be fairly close to the camera, and hence fill the frame a bit more. Also, less confusing background.
  5. This may also perhaps work better for larger animals such as tigers (more obvious to the algorithm that there is an animal present). But perhaps not too large, since an elephants legs could be confused for tree trunks!

I’ll play around a bit more with both Cloud Vision and other image analysis methods, and write another blog post when I’m done.

I acknowledge the help provided by Solange Gagnebin and Akhshy Thiagarajan for this.


About the Author

Aditya Gangadharan is a conservation scientist, and he helps endangered species to persist in landscapes that are shared with people. Aditya is currently part of a collaborative project with industry and government to prevent grizzly bears from getting hit by trains in Banff, Canada. His previous work includes another collaborative project with industry to estimate landings by migratory birds in mine tailing ponds in the Oil Sands region, Canada, and a project to identify critical habitat and corridors for elephants, tigers and 22 other mammals in a multiple-use landscape in the Western Ghats biodiversity hotspot, India. 

This post originally appeared on Aditya's Blog and was published here with permission. 

Aditya is hosting a conversation in the Camera Trap group about this article. Join the thread to add your voice to the discussion.