CVPR 2025

The IEEE/CVF Conference on Computer Vision and Pattern Recognition 2025

Authors: Máté Tóth, Péter Kovács, Zoltán Bendefy, Zoltán Hortsin, Tamás Matuszka, Balázs Teréki

Hybrid Rendering for Multimodal Autonomous Driving: Merging Neural and Physics-Based Simulation

TL;DR: We developed a method that combines neural reconstruction with traditional physics-based rendering, enhancing both techniques to support autonomous driving development. Since our solution is integrated into aiSim, our simulator, it can be tested interactively in real time, making it ideal for demonstrations.

Abstract

Neural reconstruction has advanced significantly in the past year, and dynamic models are becoming increasingly common. However, these models are limited to handling in-domain objects that closely follow their original trajectories. This demonstration presents a hybrid approach that integrates the advantages of neural reconstruction with physics-based rendering. First, we remove dynamic objects from the scene and reconstruct the static environment using a neural reconstruction model. Then, we populate the reconstructed environment with dynamic objects in aiSim. This approach effectively mitigates the drawbacks of both methods—such as domain gap in traditional simulation and out-of-domain object rendering in neural reconstruction.

Method

We train our 3D Gaussian Splatting (3DGS) and NeRF-based models using synchronized data collected from vehicles equipped with RGB cameras, GNSS devices, and LiDAR sensors. The reconstructed environment allows for the virtual placement of dynamic agents at arbitrary locations, adjustments to environmental conditions, and rendering from novel camera viewpoints. We have significantly improved novel view synthesis quality—particularly for road surfaces and lane markings—while maintaining interactive frame rates using our novel training method (NeRF2GS), which enhances its applicability for autonomous driving tasks. This method combines the better generalization capability of NeRF-based methods with the real-time rendering speed of 3DGS (3D Gaussian Splatting) methods by training a customized NeRF model on the original images with depth-regularization coming from a noisy LiDAR point cloud and using it as a teacher model for 3DGS training, providing accurate depth, surface normal, and appearance supervision. Additionally, our method supports multiple sensor modalities (LiDAR, radar target lists), different camera models (e.g., fisheye), and accounts for camera exposure mismatches. It can also predict segmentation masks, surface normals, and depth maps even for large-scale reconstructions (>100 000 m2), using our block-based train parallelization approach.

Limitations

While aiSim is a well-developed product, its neural reconstruction method is still in the research and development phase and has known limitations and opportunities for open scientific discussion. Shadows cast by dynamic objects can lead to unrealistic reconstructions, an issue we aim to address using a shadow segmentation model. Accurately placing virtual objects requires a precise road mesh, which we currently generate using HD maps—these are not available for all locations. Consequently, developing an automated method for generating suitable road surfaces remains an area of future work.

Qualitative results:

Our method works in different operational design domains. Urban environment (San Francisco, CA).

Parking cars and the static environment is given by the neural recontsruction model while dynamic objects are added by aiSim (Sunnyvale, CA).

Rotating LiDAR sensor simulation within aiSim supported by our hybrid rendering method. Colors indicate LiDAR intensity.

Neural radar target reconstruction where the static environment is reconstructed using NeRF, and another neural network predicts the radar target list (colors indicate distance from the ego vehicle).

Novel view synthesis with 3DGS on a proving ground (ZalaZone, Hungary.)

Novel view synthesis and dynamic object removal with NeRF on a 2 km-long highway section (M0, Hungary).

NeRF novel view synthesis from Waymo Open Dataset with camera model change in extreme conditions (top row: RGB/normal, bottom row: depth/segmentation).

NeRF novel view synthesis from Waymo Open Dataset with camera model change in urban environments. (top row: RGB/normal, bottom row: depth/segmentation).

Novel view synthesis from extreme viewpoint with 3DGS (top row: RGB/depth, bottom row: normal/segmentation by a pretrained Mask2Former overlayed on RGB from 3DGS). The area of the reconstruction is about 165 000 m2.

Our hybrid rendering approach can also be applied to public datasets like Waymo.