Written by aiMotive / Posted at 5/27/21
aiMotive launches aiWare4, featuring advanced wavefront processing, upgraded safety and low-power features
The 4th generation of aiWare™ automotive NPU hardware IP delivers up to 64 TOPS per core, leveraging innovative wavefront-processing algorithms and upgraded memory architecture to deliver dramatically improved PPA* and improved built-in ISO26262 safety support
Budapest, Hungary, 27th May 2021 – aiMotive, one of the world’s leading suppliers of scalable modular automated driving technologies, today announced the latest release of its award-winning aiWare NPU hardware IP. Featuring substantial upgrades to on-chip memory architecture, innovative new wavefront-processing algorithms and enhanced ISO26262-compliant safety features, aiWare4 delivers the ultimate scalable solution from the most challenging single-chip edge applications to the highest performance central processing platforms for automotive AI. With aiWare4 many key metrics have been further improved, including TOPS/mm2, effective TOPS/W and range of high-efficiency CNN topologies.
Upgraded capabilities for aiWare4 include:
Scalability: up to 64 TOPs per core (up from 32 TOPS for aiWare3) and up to 256 TOPS per multi-core cluster, with greater configurability of on-chip memory, hardware safety mechanisms and external/shared memory support
Safety: Enhanced standard hardware features and related documentation ensuring straightforward ISO26262 ASIL B and higher compliance for both SEooC (Safety Element out of Context) and in-context safety element applications
PPA(Note 1): 8-10 Effective TOPS/W for typical CNNs (theoretical peak up to 30 TOPS/W) using a 5nm or smaller process node; up to 98% efficiency for a wider range of CNN topologies; more flexible power domains enabling dynamic power management able to respond to real-time context changes without needing to restart
Processing: Innovative Wavefront RAM (WFRAM) leverages aiWare’s latest wavefront-processing and interleaved multi-tasking scheduling algorithms, enabling more parallel execution, better multi-tasking capability and substantial reductions in memory bandwidth compared to aiWare3 for CNNs requiring access to significant external memory resources
aiWare4 continues to deliver industry-leading NPU efficiency (see note 2), enabling superior performance using less silicon. These latest upgrades also enable aiWare4 to execute a wide range of CNN workloads using only on-chip SRAM for single-chip edge AI or more highly-optimized ASIC or SoC applications.
“aiWare4 builds on the extensive experience we gained from working with our silicon and automotive partners, as well as insights from our aiDrive™ team into the latest trends and techniques driving the latest thinking in CNNs for automotive applications,” says Marton Feher, SVP hardware engineering for aiMotive. “We are proud that we offer the industry’s most efficient NPU for automotive inference and have now extended aiWare’s capabilities to achieve new levels of safety, flexibility, low-power operation and performance under the most demanding automotive operating environments.”
aiMotive will be shipping aiWare4 RTL to lead customers starting Q3 2021.
Note 1: PPA: Power, Performance and Area
Note 2: download the latest aiWare3 benchmark demonstrating up to 98% efficiency measured on Nextchip’s Apache5 SoC. NPU Efficiency measures % of claimed TOPS usable to execute theoretical GMACS of CNN workload. Request additional benchmark data here