Hi I am cross-posting a conversation I had with some people from the Global Open Science Hardware group and figured yall were the experts on this stuff: https://forum.openhardware.science/t/depth-sensing-technologies-for-camera-traps/3236
My friend Pen had asked me about a quick review of potential technologies that could be used to incorporate depth sensing capabilities into camera traps. The idea is that if camera trappers can have decent depth information from their cameras they can automatically do a lot more stuff with high precision (like estimate the size of the animals passing by with greater accuracy).
I figured I might as well cross post this quick little list I made in case it inspires anyone, or if anyone has other ideas to toss into this arena!
Reminder also that there's lots of fun ideas for new camera traps out there, but a huge difficulty always seems to be making good cases that can deal with lots of abuse from people, transportation, weather, and animals.
Here's a quick and dirty list of technologies and possible ideas I talked about with my other friends Marc Juul and Matt Flagg:
TOF arrays (e.g. this 8x8 array from sparkfun https://www.sparkfun.com/products/18642 )
* Autonomous Low-power mode with interrupt programmable threshold to wake up the host
* Up to 400 cm ranging
* 60 Hz frame rate capability
* Emitter: 940 nm invisible light vertical cavity surface emitting laser (VCSEL) and integrated analog driver
IR pattern projection (e.g. Kinect, Realsense)
- Limits - some have difficulty in direct sunlight
calibrated Laser Speckle projection
- Could flash really bright laser speckle and photograph it
- could be visible in daylight, or have filters for specific channels
- could be very sensitive to vibration if the laser shifts and decalibrates
Structural light projection
- limits- very slow, can't really work for moving things
- limits - VERY expensive (like 600$+)
**AI Prediction Based**
single view depth prediction (e.g. https://www.cs.cornell.edu/projects/megadepth/)
Results are simply an inference of machine learning, not actual depth sensing. Would require lots of calibrated training.
Personally, the passive methods of depth estimation make me the most excited, since just using 2-D camera images doesn't add much new hardware into the mix, and helps future-proof designs, since photogrammatic techniques (like https://colmap.github.io/) can improve and still use old 2D images
Pre-calibrated Stereo Depth
- passive stereo depth (no active illumination), accuracy requires adequate lighting and the texture of objects/scenes. The typical accuracy is approximately 3% of distance but varies depending on the object/actual distance.
- Accuracy drops as the distance increases.
*Off the shelf kits*
OPENCV AI Kit lite -stereo grayscale cameras +
Min depth perception: ~18 cm (using extended disparity search stereo mode)
Max depth perception: ~18 m
(This is my favorite idea, so i even drew some pictures in the original thread: https://forum.openhardware.science/t/depth-sensing-technologies-for-camera-traps/3236 )
- There are SUPER cheap ESP32 cameras available for like $6-$20 USD
Like this one for $14 which has a display we wouldn't need https://www.tindie.com/products/lilygo/lilygo-ttgo-t-camera/ or this one for $20 that even has a case and nicer specs https://www.tindie.com/products/lilygo/lilygo-ttgo-t-camera-mini/
- You can put these ESP32 boards into "hibernation mode" which requires just like 3-5 microamps to stay on (meaning they could last months)
- Get 5-10 of these cameras that you could set up as an array (this could cost about the same as a single off-the shelf camera trap
- The array could be all connected to a single unit that is connected to a tree with telescoping arms
- or several cameras could be independently connected around an area the animal might go through
- then the cameras could be woken from hibernation by simple PIR motion detectors, grab images, and transfer them to a central node camera
- finally the array of photos could be processed through something like COLMAP to get 3D reconstruction of each shot
- a person may need to walk through the target area after setting up the cameras with something like a chessboard for calibration to make the 3D reconstruction easier
Other camera modules are also available if you want to have fancier optics that the 2MP
10 November 2021 10:30pm
We've looked at LIDAR and also are working on the TOF array, VL53L5CX.
Here are some notes on our efforts working with the TOF array. The VL53L5CX is interesting because you can actually get a low resolution depth image, but the current software requires that you use an ARM Cortex M4 class chip. For us, the work is porting that over to the Arduino platform and also to the standard AVR chip. There's an Arduino port, but it's for the ARM Cortex M4 using the STM32-duino flavor of the platform.
The choke point is that the VL53L5CX is actually a microcontroller + depth sensing sensor array so whatever chip you use to interface it with, you're actually communicating with the MCU on the VL53L5CX. At startup, you have to load the firmware on to the array also and the binary image is around 80 kB. Also as of this moment, the only software we've seen for it uses the STM32 ARM Cortex M4 microcontrollers (the sensor is an ST part also). We're interested in porting the software off ST and have it available for Arduino AVR microcontrollers too. Unfortunately the binary fimrware image at 80 kB is more than the resources for the standard Arduino AVR (ATMega328P) which maxes out at 32 kB and the flavor we use, ATMega1284P can fit it, but doesn't leave much room for anything else.
So the approach we're taking is to use an external SPI flash to store the firmware for the VL53LCX. Long story short, it required an unexpected custom hardware modification so we tossed the project on to the pile of other projects that need custom hardware and are currently working down that queue. Some other pressing projects came up so the TOF array is on the back burner for the moment. But happy to discuss the little we know we know about it.
In regards to LIDAR, there's a bit of confusion because some people refer to laser ranging systems as LIDAR whereas others refer to a laser ranging system that also scans as LIDAR. This was a bit of a tripping point for us also. We now refer to LIDAR as a laser ranging system that scans in at least one direction.
There are a lot of cheap LIDARs that are coming online because the laser ranging system pricing is dropping through the floor. However we're actually interested in the laser ranging system because most of the commerical LIDARs coming online are designed for autonomous vehicles and robotics. Hence they only scan horizontally and not vertically.
If you don't care about time, we think it's possible to use a servo type mechanism to scan horizontally and vertically to build a 360 degree 3D depth image. This would be cost-effective and really interesting for us to map how quickly foliage changes over say the course of a year. We're using a pan-tilt stage which has 2 servos (the pan and the tilt). We're hoping to make a LIDAR based depth-imaging time lapse but that's also one of the projects on our "to-do pile". A variation on this would be to use the 2-D circular scanning LIDARs which would then only require mounting on a single servo (the tilt) to obtain the full 3D image.
FYI we're using the TF-Luna which has 8m range and is low cost as a proof-of-concept. If the results are okay, we'd move up to the TF02 which has 20m of range but is pricier.
Anwyays, just thought I'd add a bit of our experiences so far on depth imaging. It's quite new for us, but there's a lot of interesting areas to explore.