Hardware IP for Automotive AI

NN Acceleration for Automotive AI

The aiWare NPU (Neural Processing Unit) is developed by engineers working side-by-side with our automated driving teams to create a unique solution targeting high performance L2-L4 automotive-grade real-time AI inference for AD/ADAS. The latest aiWare4+ IP delivers up to 1000 TOPS and industry-leading efficiency for a wide range of NN workloads including CNNs, LSTMs, RNNs and Transformer Networks.


Industry-leading efficiency

Up to 98% efficiency for a wide range of automotive NNs​


Scalable solution

Up to 1024 TOPS per chip using standard libraries and memories


The aiWare hardware has achieved ISO 26262 ASIL B certification. To download the safety certificate, click here


Production-proven hardware IP

Implemented for Nextchip Apache5 automotive production SoC; strong partnership leads to Nextchip licensing aiWare4 for next-generation Apache6 SoC


Highly deterministic solution​

Hardware determinism helps with certification for production safety environments; also enables offline high-accuracy performance estimation within 5% of final silicon months before chips available


Easy integration

Designed for ease of layout, simple software interfacing, and advanced tools for system integration and validation

Efficient & Scalable

Up to 1024 TOPS; up to 98% efficiency

High efficiency and patented dataflow ensure 2x to 5x higher performance for the same claimed TOPS for a wide range of workloads, minimizing system power consumption. Up to 3GHz operation in 5nm over full AEC-Q100 Grade 2 operation.

Tools & Solutions

Unique SDK from system prototyping to production optimization

aiWare Studio tools are widely acclaimed by OEMs and Tier1s, thanks to a unique approach focused on optimization of NNs by AI engineers. No need for low-level software engineers hand-optimizing NPU code – that’s all done by our advanced offline compiler.

Automotive Grade

Powering the most demanding L2-L4 AD applications

Designed from the ground up to be integrated into ASIL-B and higher certified solutions covering a broad range of automotive use cases. From ultra-small edge sensor enhancement up to high-performance central processors, aiWare is the ideal solution for L2 to L4 automotive applications with the most demanding performance requirements.

Driving your development

Features & benefits

Our products are designed to accelerate the realization of your automated driving goals. Click the button below to download our latest aiWare benchmark document and scroll down to see what we can bring to the table. 

Download the benchmark document

High-performance, Efficiency and Scalability

The aiWare hardware IP Core delivers high performance at low power, thanks to its class-leading high efficiency and highly optimized hardware architecture. With up to 256 Effective TOPS per core (not just claimed TOPS), aiWare is highly scalable, so can be used in applications requiring anything from 1-2 TOPS up to 1024 TOPS using multi-core. Thanks to a design focused on optimizing every data transfer every clock cycle, aiWare makes full use of highly distributed on-chip local memory plus dense on-chip RAM to keep data on-chip as much as possible, minimizing or eliminating external memory traffic. Innovative dataflow design ensures external memory bandwidth is minimized while enabling high efficiency regardless of input data size or NN depth.

Unique SDK plus optimized aiDrive-based AI

The aiWare Studio SDK has been recognized and acclaimed by OEMs and Tier1s around the world for its innovative approach to embedded NN optimization, focusing on iterating the NN itself, not just the NPU code used to execute it. This gives NN designers far more flexibility when implementing their NNs for production hardware platforms than other NPU solutions. The offline performance estimator within aiWare Studio is another highlight, delivering accurate performance estimation within 5% of final silicon on any desktop PC. And thanks to the comprehensive portfolio of aiDrive modular software for automotive AI, a wide range of software solutions fully optimized for aiWare are also available.

Powering the most demanding L2-L4 automated driving applications

The aiWare hardware IP Core delivers the features needed for production automotive deployment, not just claiming high theoretical benchmark results in the lab. The underlying technology for aiWare has been engineered from the ground up for automotive production deployment. With all design processes fully audited to automotive standards, aiWare is designed to be integrated into ASIL-B and higher certified solutions through comprehensive documentation and safety analysis. The RTL is fully characterized for operation to AEC-Q100 Grade 2 operating conditions and is complemented by comprehensive integration documentation to help you achieve optimal PPA layout quickly and easily.
Want to know more about the technical details?

Request additional NN workload files

Efficiency is the best way to assess any NPU’s ability – download the additional benchmark data to see how aiWare4 achieves industry-leading efficiencies up to 98% over a wide range of automotive CNN workload.

Request additional benchmark data
Interested in aiWare's hardware innovations?

Don't hesitate
to contact us!

Our team is always ready to work with exciting and ambitious clients. If you're ready to start your partnership with us, get in touch.

Contact Us
Interested in more of aiWare's capabilities?

aiWare™ specification

aiWare is a state-of-the-art NPU for automotive inference, with many features built-in to maximize performance for a wide range of automotive AI applications. 


The aiWare4+ RTL is fully synthesizable to deliver up to 256 TOPS @ 3GHz at 5nm, using standard libraries and memory compilers. The RTL is designed for easy integration, with no large buses and tiled computation modules to enable easy implementation at high clock speeds.

Up to 98%

Since aiWare was conceived to power multi-camera, heterogeneous sensor-based L2-L4  automated driving applications, it delivers optimal performance with large inputs such as 2M-8M pixel cameras, with no upper limits. NN depth is also unconstrained while delivering as high as 98% efficiency on real workloads.

Wide range of built-in functions

One goal of aiWare was to ensure that the overhead for the host CPU is the minimum possible: point aiWare to the start location of the input data, tell it what NN workload to execute, and then generate an interrupt when done. That's it! That is because, in addition to highly optimized convolution, aiWare supports a wide range of activation and pooling functions natively, enabling many NN workloads to be executed 100% within the aiWare NPU with no host CPU intervention.