HE-Drive: Human-Like End-to-End Driving with Vision Language Models

1Horizon Robotics | 2The University of Hong Kong | 3University of the Chinese Academy of Sciences
4Beijing Jiaotong University

*denotes equal contribution |
denotes Corresponding authors



To the best of our knowledge, HE-Drive is the first human-like-centric end-to-end autonomous driving system, ensuring high performance while guaranteeing efficiency and comfort.

Overview Image

Comparison of Performance (i.e. collision rate), Efficiency (i.e. FPS), and Comfort Metrics

Abstract

In this paper, we propose HE-Drive: the first human-like-centric end-to-end autonomous driving system to generate trajectories that are both temporally consistent and comfortable. Recent studies have shown that imitation learning-based planners and learning-based trajectory scorers can effectively generate and select accuracy trajectories that closely mimic expert demonstrations. However, such trajectory planners and scorers face the dilemma of generating temporally inconsistent and uncomfortable trajectories.

To solve the above problems, Our HE-Drive first extracts key 3D spatial representations through sparse perception, which then serves as conditional inputs for a Conditional Denoising Diffusion Probabilistic Models (DDPMs)-based motion planner to generate temporal consistency multi-modal trajectories. A Vision-Language Models (VLMs)-guided trajectory scorer subsequently selects the most comfortable trajectory from these candidates to control the vehicle, ensuring human-like end-to-end driving.

Experiments show that HE-Drive not only achieves state-of-the-art performance (i.e., reduces the average collision rate by 71% than VAD) and efficiency (i.e., 1.9X faster than SparseDrive) on the challenging nuScenes and OpenScene datasets but also provides the most comfortable driving experience on real-world data.


Overview Image

HE-Drive Overview


HE-Drive Key Components

Block Image

Contribution 1: Diffusion-based Motion Planner


Block Image

Contribution 2: VLM-guided Trajectory Scorer

Experiments on nuScenes

Overview Image

nuScenes: Left Turn Results

Block Image

nuScenes: Right Turn Results

Block Image

nuScenes: Trajectory Scorer Driving Style Adjustment

Block Image

nuScenes: Quantitative Comparison of Performance and Efficiency


nuScenes: Qualitative Results

Block Image

nuScenes: Quantitative comparison of comfort indicators

Experiments on Real-World data

1s Image

1s

2s Image

2s

3s Image

3s

4s Image

4s

5s Image

5s

6s Image

6s

Real-World:Overtaking

Real-World: Turning

Block Image

Real-World: Quantitative comparison of comfort indicators

Experiments on OpenScene

Block Image

OpenScene: VQA Qualitative Results