CVPR2025
Abstract
Neural reconstruction has advanced significantly in the past year, and dynamic models are becoming increasingly common. However, these models are limited to handling in-domain objects that closely follow their original trajectories. This demonstration presents a hybrid approach that integrates the advantages of neural reconstruction with physics-based rendering. First, we remove dynamic objects from the scene and reconstruct the static environment using a neural reconstruction model. Then, we populate the reconstructed environment with dynamic objects in aiSim. This approach effectively mitigates the drawbacks of both methods—such as domain gap in traditional simulation and out-of-domain object rendering in neural reconstruction.
Method
We train our 3D Gaussian Splatting (3DGS) and NeRF-based models using synchronized data collected from vehicles equipped with RGB cameras, GNSS devices, and LiDAR sensors. The reconstructed environment allows for the virtual placement of dynamic agents at arbitrary locations, adjustments to environmental conditions, and rendering from novel camera viewpoints. We have significantly improved novel view synthesis quality—particularly for road surfaces and lane markings—while maintaining interactive frame rates using our novel training method (NeRF2GS), which enhances its applicability for autonomous driving tasks. This method combines the better generalization capability of NeRF-based methods with the real-time rendering speed of 3DGS (3D Gaussian Splatting) methods by training a customized NeRF model on the original images with depth-regularization coming from a noisy LiDAR point cloud and using it as a teacher model for 3DGS training, providing accurate depth, surface normal, and appearance supervision. Additionally, our method supports multiple sensor modalities (LiDAR, radar target lists), different camera models (e.g., fisheye), and accounts for camera exposure mismatches. It can also predict segmentation masks, surface normals, and depth maps even for large-scale reconstructions (>100 000 m2), using our block-based train parallelization approach.
Qualitative results:
Our method works in different operational design domains. Urban environment (San Francisco, CA).
Parking cars and the static environment is given by the neural recontsruction model while dynamic objects are added by aiSim (Sunnyvale, CA).
Rotating LiDAR sensor simulation within aiSim supported by our hybrid rendering method. Colors indicate LiDAR intensity.
Neural radar target reconstruction where the static environment is reconstructed using NeRF, and another neural network predicts the radar target list (colors indicate distance from the ego vehicle).
Novel view synthesis with 3DGS on a proving ground (ZalaZone, Hungary.)
Novel view synthesis and dynamic object removal with NeRF on a 2 km-long highway section (M0, Hungary).
NeRF novel view synthesis from Waymo Open Dataset with camera model change in extreme conditions (top row: RGB/normal, bottom row: depth/segmentation).
NeRF novel view synthesis from Waymo Open Dataset with camera model change in urban environments. (top row: RGB/normal, bottom row: depth/segmentation).
Novel view synthesis from extreme viewpoint with 3DGS (top row: RGB/depth, bottom row: normal/segmentation by a pretrained Mask2Former overlayed on RGB from 3DGS). The area of the reconstruction is about 165 000 m2.
Our hybrid rendering approach can also be applied to public datasets like Waymo.