Occupancy forecasting
Optimized for Neural Processing Units (NPUs) with convolutional acceleration, CCLSTM delivers state-of-the-art (SOTA) performance in the 2024 Waymo Occupancy Flow Forecasting Challenge, while maintaining real-time efficiency and full end-to-end trainability from camera input to future motion prediction.
Why CCLSTM?
Predicting the future motion of dynamic agents is a cornerstone capability in autonomous driving. CCLSTM approaches this task using Occupancy Flow Fields—a rich, scalable representation that captures motion, spatial extent, and multi-modal futures in a unified framework.
Unlike traditional detection-and-tracking pipelines or transformer-based approaches, CCLSTM is:
- FULLY CONVOLUTIONAL – Built entirely from convolution operations, making it ideal for deployment on modern NPUs (e.g., aiSim).
- RECURRENT AND AUTOREGRESSIVE – Recursively encodes history with unlimited lookback and forecasts arbitrary horizons autoregressively.
- END-TO-END TRAINABLE – Integrates seamlessly with bird’s-eye view (BEV) encoders, requiring no intermediate heuristics or separate modules.
- EXPLAINABLE AND CONTROLLABLE – Preserves semantic richness and enables dynamic behavior control, such as planning with different driving styles.
An overview of CCLSTM. Rasterized input grids are concatenated along the channel dimension and encoded via a CNN. The encoded features are aggregated via the accumulator CLSTM. The hidden and cell states of the accumulator CLSTM are used to initialize the forecasting CLSTM. The forecasting CLSTM is then autoregressively called to predict encoded futures states. The future hidden states are then passed to a CNN Decoder, to produce occupancy and flow grids.
The Problem with Existing Methods
Most conventional solutions fall into two categories:
- HEURISTIC-BASED PIPELINES – Relying on hand-crafted rules and simple object tracking, these systems lose semantic depth and generalization.
- MULTISTAGE DEEP LEARNING MODELS – While more sophisticated, these solutions typically separate detection, tracking, and forecasting, limiting performance and end-to-end optimization.
CCLSTM bridges this gap by offering a single, unified model that is both interpretable and trainable as a whole, enabling safer, more adaptable decision-making in autonomous systems.
aiWare is perfect for aiDrive with CCLSTM
CCLSTM was designed with aiWare in mind. As an image-to-image conversion task, it relies heavily on convolution – aiWare’s core strength. This synergy results in unmatched efficiency and real-time performance, showcasing the power of software and hardware co-design.
Results
CCLSTM achieves state-of-the-art (SOTA) performance across all metrics in the Waymo 2024 Occupancy and Flow Challenge, and leads in three out of seven metrics in the 2022 edition of the same benchmark. In contrast to contemporary approaches, CCLSTM attains these results without relying on large transformer architectures, model ensembles, or test-time augmentations, making it far more practical for real-time deployment
Qualitative Results:
Technical Report:
The details of CCLSTM can be found in our technical report available on arxiv:
https://arxiv.org/pdf/2506.06128
https://arxiv.org/html/2506.06128v1