Overview: Depth Sensing Technologies for Camera Traps

Andrew Quitmeyer

@hikinghack

DIY electronics for behavioral field biology

Groups

Camera Traps

Hi I am cross-posting a conversation I had with some people from the Global Open Science Hardware group and figured yall were the experts on this stuff: https://forum.openhardware.science/t/depth-sensing-technologies-for-camera-traps/3236

My friend Pen had asked me about a quick review of potential technologies that could be used to incorporate depth sensing capabilities into camera traps. The idea is that if camera trappers can have decent depth information from their cameras they can automatically do a lot more stuff with high precision (like estimate the size of the animals passing by with greater accuracy).

I figured I might as well cross post this quick little list I made in case it inspires anyone, or if anyone has other ideas to toss into this arena!

Reminder also that there's lots of fun ideas for new camera traps out there, but a huge difficulty always seems to be making good cases that can deal with lots of abuse from people, transportation, weather, and animals.

Here's a quick and dirty list of technologies and possible ideas I talked about with my other friends Marc Juul and Matt Flagg:

**Active:**

TOF arrays (e.g. this 8x8 array from sparkfun https://www.sparkfun.com/products/18642 )
* Autonomous Low-power mode with interrupt programmable threshold to wake up the host
* Up to 400 cm ranging
* 60 Hz frame rate capability
* Emitter: 940 nm invisible light vertical cavity surface emitting laser (VCSEL) and integrated analog driver

IR pattern projection (e.g. Kinect, Realsense)
- Limits - some have difficulty in direct sunlight

calibrated Laser Speckle projection

- Could flash really bright laser speckle and photograph it
- could be visible in daylight, or have filters for specific channels
- could be very sensitive to vibration if the laser shifts and decalibrates

Structural light projection
- limits- very slow, can't really work for moving things

LIDAR scanners
- limits - VERY expensive (like 600$+)

**AI Prediction Based**

single view depth prediction (e.g. https://www.cs.cornell.edu/projects/megadepth/)
Results are simply an inference of machine learning, not actual depth sensing. Would require lots of calibrated training.

**Passive:**

*Photogrammetry*
Personally, the passive methods of depth estimation make me the most excited, since just using 2-D camera images doesn't add much new hardware into the mix, and helps future-proof designs, since photogrammatic techniques (like https://colmap.github.io/) can improve and still use old 2D images

Pre-calibrated Stereo Depth

- passive stereo depth (no active illumination), accuracy requires adequate lighting and the texture of objects/scenes. The typical accuracy is approximately 3% of distance but varies depending on the object/actual distance.
- Accuracy drops as the distance increases.

*Off the shelf kits*

OPENCV AI Kit lite -stereo grayscale cameras +

Min depth perception: ~18 cm (using extended disparity search stereo mode)
Max depth perception: ~18 m

Multi-camera arrays
(This is my favorite idea, so i even drew some pictures in the original thread: https://forum.openhardware.science/t/depth-sensing-technologies-for-camera-traps/3236 )

- There are SUPER cheap ESP32 cameras available for like $6-$20 USD
Like this one for $14 which has a display we wouldn't need https://www.tindie.com/products/lilygo/lilygo-ttgo-t-camera/ or this one for $20 that even has a case and nicer specs https://www.tindie.com/products/lilygo/lilygo-ttgo-t-camera-mini/

- You can put these ESP32 boards into "hibernation mode" which requires just like 3-5 microamps to stay on (meaning they could last months)
- Get 5-10 of these cameras that you could set up as an array (this could cost about the same as a single off-the shelf camera trap
- The array could be all connected to a single unit that is connected to a tree with telescoping arms
- or several cameras could be independently connected around an area the animal might go through

- then the cameras could be woken from hibernation by simple PIR motion detectors, grab images, and transfer them to a central node camera

- finally the array of photos could be processed through something like COLMAP to get 3D reconstruction of each shot
- a person may need to walk through the target area after setting up the cameras with something like a chessboard for calibration to make the 3D reconstruction easier

Other camera modules are also available if you want to have fancier optics that the 2MP
https://www.uctronics.com/arducam-mini-module-camera-shield-5mp-plus-ov5642-camera-module-for-arduino-uno-mega-2560-board.html

https://www.aliexpress.com/item/32999908472.html

Akiba

@Freaklabs

Freaklabs

I'm an engineer and product designer working on wildlife conservation technology.

10 November 2021 10:30pm

Hi Andrew.

We've looked at LIDAR and also are working on the TOF array, VL53L5CX.

Here are some notes on our efforts working with the TOF array. The VL53L5CX is interesting because you can actually get a low resolution depth image, but the current software requires that you use an ARM Cortex M4 class chip. For us, the work is porting that over to the Arduino platform and also to the standard AVR chip. There's an Arduino port, but it's for the ARM Cortex M4 using the STM32-duino flavor of the platform.

The choke point is that the VL53L5CX is actually a microcontroller + depth sensing sensor array so whatever chip you use to interface it with, you're actually communicating with the MCU on the VL53L5CX. At startup, you have to load the firmware on to the array also and the binary image is around 80 kB. Also as of this moment, the only software we've seen for it uses the STM32 ARM Cortex M4 microcontrollers (the sensor is an ST part also). We're interested in porting the software off ST and have it available for Arduino AVR microcontrollers too. Unfortunately the binary fimrware image at 80 kB is more than the resources for the standard Arduino AVR (ATMega328P) which maxes out at 32 kB and the flavor we use, ATMega1284P can fit it, but doesn't leave much room for anything else.

So the approach we're taking is to use an external SPI flash to store the firmware for the VL53LCX. Long story short, it required an unexpected custom hardware modification so we tossed the project on to the pile of other projects that need custom hardware and are currently working down that queue. Some other pressing projects came up so the TOF array is on the back burner for the moment. But happy to discuss the little we know we know about it.

In regards to LIDAR, there's a bit of confusion because some people refer to laser ranging systems as LIDAR whereas others refer to a laser ranging system that also scans as LIDAR. This was a bit of a tripping point for us also. We now refer to LIDAR as a laser ranging system that scans in at least one direction.

There are a lot of cheap LIDARs that are coming online because the laser ranging system pricing is dropping through the floor. However we're actually interested in the laser ranging system because most of the commerical LIDARs coming online are designed for autonomous vehicles and robotics. Hence they only scan horizontally and not vertically.

If you don't care about time, we think it's possible to use a servo type mechanism to scan horizontally and vertically to build a 360 degree 3D depth image. This would be cost-effective and really interesting for us to map how quickly foliage changes over say the course of a year. We're using a pan-tilt stage which has 2 servos (the pan and the tilt). We're hoping to make a LIDAR based depth-imaging time lapse but that's also one of the projects on our "to-do pile". A variation on this would be to use the 2-D circular scanning LIDARs which would then only require mounting on a single servo (the tilt) to obtain the full 3D image.

FYI we're using the TF-Luna which has 8m range and is low cost as a proof-of-concept. If the results are okay, we'd move up to the TF02 which has 20m of range but is pricier.

Anwyays, just thought I'd add a bit of our experiences so far on depth imaging. It's quite new for us, but there's a lot of interesting areas to explore.

Akiba

Pen-Yuan Hsing

10 November 2021 10:30pm

Thanks @Freaklabs ! I'm the one who initiated the conversation with @hikinghack , sorry I'm a bit late to the conversation. :p I posted to the GOSH forums a response which I will repeat below, and hopefully we can have a conversation about making something happen:

I just read @hikinghack 's summary again, and think it might be helpful to think about this from the following angles.

0. Which ecological/conservation questions might depth-sensing data answer?

Some off the top of my head:

Estimating wildlife populations - This is a big one, I know there are existing mathematical models that can make use of animal observation data only if there's a good way to get depth-sensing info. Right now it's a super labor intensive process that's not practical (I can explain more if there's interest).
Measuring the size of animals - This can be a proxy for age, which provides demographic information about the species in question.
Movement speed - If you can take images in burst mode (e.g. 3-4 photos per second) with depth data, you can estimate how fast an animal is moving.
What else?

1. What ecological/conservation questions can each tech help answer?

For example, since structural light project is slow and can't work on moving things, that constrains the type of data you get. Or, LIDAR scanners are powerful but expensive (though I'm glad to hear prices are dropping), so might be hard to deploy a large number of them in an array so you don't get as much spatially distributed data. What are the implications of each?

2. Common evaluation criteria for each tech

Such as:

Resolution
Range
Power needs
Response time
Spatial scalability (i.e. how feasible to deploy an array of these devices in the field)
Cost $$$
Can it sync with or replace existing camera trap images
Technical complexity for building a camera trap out of it

----------------------------------------

How does the above sound? Is there a better approach?

Eventually, I think it would be super cool to develop an open source hardware camera trap. But like @hikinghack said even the case would be a challenge, not to mention other things like a quick triggering system, power requirements, etc. But I think it's a worthwhile endeavor especially if we can bring new technology to the table like depth-sensing, something multiple ecologists have dreamed about but don't have the ability to create.

Maybe a first step is to see if @Freaklabs 's BoomBox system can be adapted?? I.e. replace the speakers with a depth sensor?

James Hughes

10 November 2021 10:30pm

We have developed a general data collection system. It is based upon a Raspberry Pi Zero. One of the issues we found early in just using a motion sensor to trigger photos (or other responses) is that in the wild, the motion sensor gets set off almost continously. Which very quickly drains resources, both power and storage space. We have found a second variable is required to limit triggering. Our software is designed to allow for multiple criteria to be met before response. The simplist is to add a time delay component. We found that quite often that just resulted in photos being taken every time the time delay was reached. We have also integrated a light sensor, to avoid taking photos in the dark. One of the projects incorporated a load cell to confirm a bird was present in a nest before taking photos. We have used a RFID tag sensor to confirm presence of subject of interest.

Our case is completly waterproof and solar recharged. Any external 6v to 24v source will power/ recharge the system. The system is designed to be modular, so you can add any type of sensor you require, and set the amount of data you want to collect.

We do custom designs, as we are a small company and like to find solutions to dificult probles. We could configure a system that had multiple cameras and take a series of photos. One issue is the speed of writing to an SD card or usb storage. Multiple photos could not be stored at the same instant to the same storage. We have found that 8mp photos max out at roughly 0.5 sec between photos. For a four camera array, there would be two seconds between first and last photo. Depending on the subject being photograhed that may be fast enough. The other option would be to set up the array as seprate mini camera/storage devices, and use one main system to control when the photos are triggered.

For depth reading a simple ultrasonic sensor would give a rough distance for under $10. Step up to IR for under $20 for more accurate measurement, or all the way to LIDAR for $150 for very accurate measurement. Depending on what data you are trying to collect. Post processing photos can give very accurate results. We have done drone photos of multi acre property and processed 300+ photos to give about 3" acurate elevation survey.

James Hughes

HM Solutions (hms9.net)

Pen-Yuan Hsing

10 November 2021 10:30pm

Hi @JHughes , thanks so much for your feedback. I'd be very interested in a deeper chat about these technologies! Let me know if you have a preferred way to get in touch, or we can continue the chat here.

Yeah I had the feeling the triggering mechanism is going to be a challenge, which seems to be confirmed by your experience. Did you use passive infrared sensors (PIRs) as part of your camera triggering mechanism?

Regarding the multi-camera system, I think a good starting point would be "just" two camera units side by side in order to get a stereoscopic image to construct a depth map. I think software libraries like OpenCV can help with this? An additional benefit is that you get the actual photographs, too, which you would need to identify the animal. After that, we can increase the number of cameras for variations on 3D imaging to maybe even get a "bullet time" image of a wild animal...

Because some animals move fast, a critical requirement for camera traps is response time. So we need to think about the time lag between triggering and an actual exposure being taken.

I once considered an ultrasonic sensor to get distance data, but from I learned it has to be pointed at the animal of interest. Is this correct? This would be in contrast to a full depth map where you can pick out the animal from it post hoc.

I'm very interested in better understanding the state-of-the-art for these different technologies. E.g. for a particular tech (infrared projection/LIDAR/time-of-flight sensor/stereo camera/etc.), what is a representative off-the-shelf product you can buy right now; does it make only point measurements or could be used to make a full depth map; power requirements; size/sturdiness for incorporation into a camera trap-sized weatherproof enclosure, etc.

Given all of the above, I think we could start by focusing on the depth sensor first, i.e. making a module that "plugs" into an existing off-the-shelf camera trap. This way, we utilise existing triggering mechanisms just like @Freaklabs 's BoomBox.

Anyways, let me know what you all think!

James Hughes

10 November 2021 10:30pm

The best way for us to have a in depth discussion about solutions, would be to email me directly. [email protected].

We do use PIR sensor on our base system. They do the job, but are overly sensitive to all motion, even shadows of moving brances. We are developing the integration of a very low resolution thermal sensor (8x8) to see if it will be able to add enough data to tell the diferance from a branch moving, or the presence of body heat from an animal. We have tested a high resolution FLIR camera to "see" birds on tree brances. Unfortunatly there was very little thermal diferance between the birds and the tree. We expect working with mamals to work better.

We dont usualy tray and modify other equipment. We try and develop custom solutions. We have found that we can usualy offer a more reliable and cusomizable solution. Often once equipment is tested in the field, there are adjustments or additions required. With a custom solution, we can handle those quickkly. Our curent base unit is significantly more expensive then a single trail cam, but would allow us to develop a custom solution for your requiremnts, as well as being able to collect other data while in the field. Light levels, temerature, record audio clips, esentialy any type of sensor can be tied into our main system.

Please email me directly if you would like to discuss devloping a custom solution for your requirements. All of our products have come from finding unique solutions for clients. 1.5km long range trap releases. WORMS modular in field data collection system. eDNA water sample collection. Drone for bird flock counting. All of these were developed to solve a specific problem, them updated and modified as real life use showed us what could be done better.

Pen-Yuan Hsing

10 November 2021 10:30pm

Thanks @JHughes for your insight. Very informative! It's interesting how sensitive your PIR sensors are, because apparently "off-the-shelf" camera trap have some way of being less sensitive even with PIR sensors. I wonder how they do it?

Yes, in my experience mammals have much more thermal contrast with the background compared to other animals.

Two reasons I didn't exlicitly state earlier when suggesting a BoomBox-like depth sensing add-on module for camera traps is that (1) the depth map and camera trap photo will be synchronised, which helps compare them; and (2) maybe it would be easier and quicker to focus on the depth sensing tech without having to figure out everything all at once, i.e. triggering system, IR flash, housing and power. By piggy-backing on a working camera trap, we might be able to arrive at a proof of concept quicker.

That said, I would definitely LOVE to work on a full camera trap with depth sensing abilities, and truly believe it would be of great help to ecologists who want to estimate wildlife populations.

I'll shoot you an email, though I'll try to keep everyone updated here so that we can learn together.

Looking forward to feedback/input from others, too!

Andrew Quitmeyer

@hikinghack

DIY electronics for behavioral field biology

11 November 2021 5:07pm

Thanks for sharing all this! fascinating and fantastic work as always! Great to know about that TOF chip!

Doug Osborn

@DougOsborn

5 December 2021 2:08am

Thank you very much for sharing, this is great information.

I'm looking at an application where we can better manage conflict between off-leash dog walkers and birders in a peri-urban wetland, which includes saltmarsh and shorebird breeding grounds in its biodiversity values.

As usual, there is anecdotal evidence, including assaults, but nothing systematic. I've been thinking about a 'dog off-leash' detector and had thought mmwave might be a solution (to distance between human and dog, to infer "off-leash") but see it didn't make the list here. Any thoughts/reviews?

Cheers,
Doug

Roland Kays

@Rolandisimo

Prof at NC State University and Scientist at NC Museum of Natural Sciences

29 January 2022 12:50pm

Someone told me once they thought you could get distance information with sensors that could measure the polarization of light, I can't remember exactly how it would work, but maybe that makes sense to smart people like @hikinghack

-Roland