Authors: Máté Tóth, Péter Kovács, Zoltán Bendefy, Zoltán Hortsin, Balázs Teréki, Tamás Matuszka

Hybrid Rendering for Multimodal Autonomous Driving: Merging Neural and Physics-Based Simulation

TL;DR: We developed a method that combines neural rendering with traditional physics-based rendering, enhancing both techniques to support autonomous driving development. Since our solution is integrated into aiSim, our simulator, it can be tested interactively in real time, making it ideal for demonstrations.

Abstract

Neural reconstruction has advanced significantly in the past year, and dynamic models are becoming increasingly common. However, these models are limited to handling in-domain objects that closely follow their original trajectories. This demonstration presents a hybrid approach that integrates the advantages of neural reconstruction with physics-based rendering. First, we remove dynamic objects from the scene and reconstruct the static environment using a neural reconstruction model. Then, we populate the reconstructed environment with dynamic objects in aiSim. This approach effectively mitigates the drawbacks of both methods—such as domain gaps in traditional simulation and out-of-domain object rendering in neural reconstruction. In addition, our NeRF2GS method enables high-quality novel view synthesis even after extreme viewpoint changes.

Method

We train our 3D Gaussian Splatting (3DGS) and NeRF-based models using synchronized data collected from vehicles equipped with RGB cameras, GNSS devices, and LiDAR sensors. The reconstructed environment allows for the virtual placement of dynamic agents at arbitrary locations, adjustments to environmental conditions, and rendering from novel camera viewpoints. We have significantly improved novel view synthesis quality—particularly for road surfaces and lane markings—by incorporating surface normal regularization, which enhances its applicability for autonomous driving tasks. Additionally, our method supports multiple sensor modalities (LiDAR, radar target lists), different camera models (e.g., fisheye), and accounts for camera exposure mismatches. It can also predict segmentation masks, surface normals, and depth maps. Our simulator has been validated through downstream tasks, HiL experiments, and closed-loop ADAS/AD tests.

Limitations

While aiSim is a well-developed product, its neural reconstruction method is still in the research and development phase and has known limitations. Accurately placing virtual objects requires a precise road mesh, which we currently generate using HD maps—these are not available for all locations. Consequently, developing an automated method for generating suitable road surfaces remains an area of future work.

Qualitative results:

Our method works in different operational design domains. Urban environment (San Francisco, CA).

Parking cars and the static environment is given by the neural reconstruction model while dynamic objects are added by aiSim (Sunnyvale, CA).

Our method can be used to resimulate real-world recordings.

Our hybrid rendering approach can also be applied to public datasets like Waymo.

Our AD software is driving in the neural rendered environment.

Novel view synthesis using equirectangular camera model with 3DGS.

Rotating LiDAR sensor simulation within aiSim supported by our hybrid rendering method. Colors indicate LiDAR intensity.

Neural radar target reconstruction where the static environment is reconstructed using NeRF, and another neural network predicts the radar target list (colors indicate distance from the ego vehicle).

Novel view synthesis with 3DGS on a proving ground (ZalaZone, Hungary).

Novel view synthesis and dynamic object removal with NeRF on a 2 km-long highway section (M0, Hungary).

NeRF novel view synthesis from Waymo Open Dataset with camera model change in extreme conditions (top row: RGB/normal, bottom row: depth/segmentation).

NeRF novel view synthesis from Waymo Open Dataset with camera model change in urban environments. (top row: RGB/normal, bottom row: depth/segmentation).

A third-party model’s detections on images rendered using our hybrid method.

Comparison of learned segmentation with the Mask2Former model after extreme viewpoint change (3 meters away from train trajectory). Green regions correspond to matches between the two models, while blue and orange colors represent regions classified by only the learned segmentation or Mask2Former respectively.

Diffusion model-enhanced novel view synthesis from extremely novel viewpoints (up to 9 meters away from the trajectory of the recording vehicle).

ICCV2025

ICCV 2025

Hybrid Rendering for Multimodal Autonomous Driving: Merging Neural and Physics-Based Simulation

Abstract

Method

Limitations

Qualitative results:

Hungary

USA

Japan

Asset Publisher

HD Map Creation for AD development and Simulation

Scalable Automated Driving and Next-Generation Simulation at CES 2026

aiMotive and LG to unveil advanced integrated IVI/ADAS controller at CES 2026