CVPR 2024

aiMotive

Controllable Neural Reconstruction for Autonomous Driving

TL;DR: We introduce an automated pipeline designed for training neural reconstruction models by leveraging sensor streams gathered from a data collection vehicle. Subsequently, our simulator, aiSim, is employed to generate a controllable virtual counterpart of the real-world environment, enabling the replay of scenes in a closed-loop fashion.

Abstract

Neural scene reconstruction is gaining importance in autonomous driving, especially for closed-loop simulation of real-world recordings. This demonstration introduces an automated pipeline designed for the training of neural reconstruction models, utilizing sensor streams captured by a data collection vehicle. Subsequently, these models are deployed to replicate a virtual counterpart of the actual world. Additionally, the scene can be replayed or manipulated in a controlled manner. To achieve this, our simulator is employed to augment the recreated static environment with dynamic agents, managing occlusion and lighting. The simulator's versatility allows for the adjustment of various parameters, including dynamic agent behavior and weather conditions.

Method

The neural rendering models, employing NeRF and 3D Gaussian splatting, play a crucial role in reconstructing static scenes. Dynamic objects are identified through our in-house 3D object detector and masked during the training process. Following model training, the external renderer in aiSim is activated, leveraging the neural renderer model. Subsequently, dynamic agents generated by aiSim are seamlessly integrated with the model-produced background image through image composition and depth testing. Our approach facilitates the generation of digital twins for intricate real-world environments, eliminating the laborious and time-intensive efforts typically associated with the involvement of 3D artists.

Limitations

While aiSim is a mature product, its neural reconstruction method is currently in a research and development phase, and it comes with known limitations. The quality of road surface reconstruction is limited around the ego-vehicle due to camera obstruction masks. The accurate spawning of virtual objects necessitates a precise road mesh. To achieve this goal, we make use of HD maps that are not available for any arbitrary location. Consequently, the automatic generation of a suitable road surface is also a future work.

Additional results:

The environment is rendered with our neural renderer using a 3D Gaussian Splatting. The scene is extended with vulnerable road users rendered by aiSim.

Novel view synthesis from a new camera position and variable camera rotation.

Comparison of a real world recording and its reconstruction with dynamic object masking.

Neural reconstruction of a scene using a 3D Gaussian Splatting model.