How Foundation Models Perceive the World with VQ-VAEsIn the rapidly evolving landscape of artificial intelligence, foundation models have emerged as powerful tools that are reshaping our…Feb 1Feb 1
How to build your Foundation Model’s Perception systemLarge vision models like LLaVA, AV models like D³Nav, GAIA-1, and image generation models like DALL-E, are built on vast amounts of data…Oct 14, 2024Oct 14, 2024
Demystifying Video Generation ModelsHave you ever wondered how AI can create videos from scratch? Or how it can predict what happens next in a video clip? Enter D³Nav and…Oct 3, 2024Oct 3, 2024
D³NavZero: AV Navigation with AI-Guided Graph SearchD³NavZero proposes a navigation system that integrates MyuZero-style intelligent graph search with D³Nav [1] and TrajNet [2]. This fusion…Jul 21, 2024Jul 21, 2024
Real-Time Semantic and Occupancy Prediction: A Deep Dive into SOccDPTSOccDPT Project Page: https://adityang.github.io/SOccDPTJun 9, 2024Jun 9, 2024
Model Predictive Control for DummiesControlling a robot, whether in simulation or real life, often requires a robust control strategy. Traditional approaches like PID and…Jun 9, 2024Jun 9, 2024
DriveLLaVA: Using Large Vision Models as Driving Agents for AVsDriveLLaVA in actionApr 8, 2024Apr 8, 2024
Boosting in Vision based Depth EstimationBoosting has remained a popular approach the squeeze out a few more fractions of performance. Chasing the long tail of nines in the…Apr 8, 2024Apr 8, 2024
Using Thermal Cameras and End-to-End Learning for Nighttime Navigation in Autonomous VehiclesEver wondered how autonomous vehicles (AVs) will navigate in low-light conditions? Traditional sensors like LiDAR and RADAR struggle in the…Apr 6, 2024Apr 6, 2024
Monocular Depth Estimation in Python using Monodepth2 and ManydepthEstimating 3D depth from a single camera has remained an ill-posed problem for a long time. This also has vast applications in the field of…Feb 7, 2022Feb 7, 2022