Monocular Depth Estimation in Python using Monodepth2 and Manydepth
Estimating 3D depth from a single camera has remained an ill-posed problem for a long time. This also has vast applications in the field of robotics and self-driving vehicles as traditional depth estimation sensors like LiDAR, Radar, Sonar and even Stereo Vision have certain drawbacks including high cost, a sparse data signal and potentially require regular calibration. It becomes difficult to perform 3D object detection on a sparse signal of LiDAR, Radar or Sonar when compared to the rich and dense vision signal from a camera. Stereo cameras or a Camera-LiDAR setup would require regular calibration to maintain its accuracy.
Monocular depth estimation using Neural Networks proposes a simple and elegant soution to the high cost, sparse signal and calibration problem of traditional approaches by having a neural network predict depth given an image or a sequence of images.
Monodepth2[1] is a Self-Supervised Monocular Depth estimation technique that takes in one frame at a time and predicts the depth map. Monodepth2 runs at about 100ms per image frame on most modern NVIDIA GPUs. To install and run run the webcam demo run the following
pip install monodepth2
python -m monodepth2 # Run the webcam demo
Manydepth[2] is a followup to Monodepth2. It is aSelf-Supervised Monocular Depth estimation technique that takes in two adjacent frames at a time and predicts the depth map. Manydepth runs at about 300ms per image frame on most modern NVIDIA GPUs. To install and run run the webcam demo
pip install manydepth
python -m manydepth # Run the webcam demo
To use either of the models in your robotics project, you can import and pass it frames as follows
Refrences
[1] Monodepth2— https://pypi.org/project/monodepth2/
[2] Manydepth — https://pypi.org/project/manydepth/