Vision Based Depth for Autonomous Machines

Aditya NG
4 min readMay 30, 2021

With our ever growing dependence on robots and autonomous machines, we see research in depth perception systems like Vision and Li-DAR on the rise. Depth estimation and 3-D object detection are important for autonomous systems to be able to estimate their own state and gain greater context of their external environment. Interested in autonomous driving, our Formula Student team, Vega Racing Electric set out to learn more about these technologies.

The DQN Agent driving in FSDS with Li-DAR only

One of the things that stuck out like a sore thumb and felt somewhat counter intuitive is the fact that Li-DAR sensors are used by many hobbyists, researchers and companies to provide their robots with depth information about their environment. Li-DAR is a Time-Of-Flight based sensor and works by measuring the time taken for a laser shot from the sensor to reflect off a surface and return to the sensor. Nobody drives like this. Humans and some other animals have a lot of depth perception with stereoscopic vision, giving them two readings of the environment at every timestamp. Birds have eyes on the opposite sides of their heads, tend to bob their heads back and forth to somewhat emulate this stereoscopic vision. And even with one eye closed, the brain uses many monocular cues that allow it to infer depth. More importantly, autonomous driving involves having the system truly understand its environment, not just know the physical positions of the objects nearby. To quote Andrej Karpathy,

LiDAR is really a shortcut. It sidesteps the fundamental problems, the important problem of visual recognition that is necessary for autonomy; and so it gives a false sense of progress and is ultimately a crutch.

Andrej Karpathy at Tesla Autonomy Day

Extensive Testing of our Vision Based Depth system on the KITTI Dataset

With the goal of showing that vision only autonomous driving is very much possible, we got to work on building an autonomy stack within the Formula Student Driverless Simulator. Our first step was to choose a stereo vision based disparity estimation algorithm and work on calibrating the in-simulator cameras within FSDS. We optimised the system for frame rate by parallely running large portions of the code on the GPU and CPU. Before moving into the simulator, we tested the setup extensively on the KITTI dataset.

In order to compare Li-DAR and Vision Based Depth or Pseudo Li-DAR, we got to work on training a DQN Agent which takes a point cloud as input to drive through the track. Initially we trained the system to drive with nothing but Li-DAR and it showed impressive lap times. We then proceeded to swap out Li-DAR for our Vision Based Depth perception system to see how it would fare. Even without retraining the DQN Agent for the newer system, by only running appropriate signal processing and noise filtering on the Pseudo Li-DAR point cloud, it was showing lap times similar to what it achieved using Li-DAR.

Color Space based Image Segmentation in 2D (left), being used to extract the cones positions in 3D (right)

We are continuing to work on the system to fully understand the upper limits and advantages of using Vision over Li-DAR. To read more about the hardware acceleration tasks that made it possible to run the simulation in real time, do check out our article “Go Fast or Go Home ! — Thinking Parallelly

To read more about the how we Containerised FSDS using Docker, do check out our article “Formula Student Driverless Simulator on Docker

--

--

Aditya NG

Computer Vision and Autonomous Robotics Research