aiMotive introduces CCLSTM

A fully convolutional, recurrent neural architecture designed to predict future occupancy and motion flow fields

Optimized for Neural Processing Units (NPUs) with convolutional acceleration, CCLSTM delivers state-of-the-art (SOTA) performance in the 2024 Waymo Occupancy Flow Forecasting Challenge, while maintaining real-time efficiency and full end-to-end trainability from camera input to future motion prediction.

CCLSTM is the result of a patented innovation by Péter Lengyel, Research Engineer at aiMotive, offering a novel approach that combines convolutional and recurrent modeling to enhance motion forecasting accuracy and efficiency.

Why CCLSTM?

Predicting the future motion of dynamic agents is a cornerstone capability in autonomous driving. CCLSTM approaches this task using Occupancy Flow Fields—a rich, scalable representation that captures motion, spatial extent, and multi-modal futures in a unified framework.
Unlike traditional detection-and-tracking pipelines or transformer-based approaches, CCLSTM is:

FULLY CONVOLUTIONAL – Built entirely from convolution operations, making it ideal for deployment on modern NPUs (e.g., aiWare).
RECURRENT AND AUTOREGRESSIVE – Recursively encodes history with theoretically unlimited lookback and forecasts arbitrary horizons autoregressively.
END-TO-END TRAINABLE – Integrates seamlessly with bird’s-eye view (BEV) encoders, requiring no intermediate heuristics or separate modules.
EXPLAINABLE AND CONTROLLABLE – Preserves semantic richness and enables dynamic behavior control, such as planning with different driving styles.

An overview of CCLSTM. Rasterized input grids are concatenated along the channel dimension and encoded via a CNN. The encoded features are aggregated via the accumulator CLSTM. The hidden and cell states of the accumulator CLSTM are used to initialize the forecasting CLSTM. The forecasting CLSTM is then autoregressively called to predict encoded futures states. The future hidden states are then passed to a CNN Decoder, to produce occupancy and flow grids.

The Problem with Existing Methods

Most conventional solutions fall into two categories:

HEURISTIC-BASED PIPELINES – Relying on hand-crafted rules and simple object tracking, these systems lose semantic depth and generalization.
MULTISTAGE DEEP LEARNING MODELS – While more sophisticated, these solutions typically separate detection, tracking, and forecasting, limiting performance and end-to-end optimization.

CCLSTM bridges this gap by offering a single, unified model that is both interpretable and trainable as a whole, enabling safer, more adaptable decision-making in autonomous systems.

aiWare is perfect for aiDrive with CCLSTM

The latest generations of aiWare can efficiently run a wide range of operations, including those used in today’s trendiest architectures, but convolutions remain the most compute-efficient choice across virtually all platforms. CCLSTM embraces this fact, focusing on convolutional layers where they offer clear advantages in speed, efficiency, and deployability. Far from being a constraint, this is a strategic aiDrive-aiWare software and hardware co-design decision that avoids unnecessary complexity in favor of real-world performance, lower power consumption, and production-ready cost-efficiency.

Results

CCLSTM achieves state-of-the-art (SOTA) performance across all metrics in the Waymo 2024 Occupancy and Flow Challenge, and leads in three out of seven metrics in the 2022 edition of the same benchmark. In contrast to contemporary approaches, CCLSTM attains these results without relying on large transformer architectures, model ensembles, or test-time augmentations, making it far more practical for real-time deployment
https://waymo.com/open/challenges/2024/occupancy-flow-prediction/