aiWare™ specifications

Hardware IP for Automotive AI

 

Key features

Performance

256 TOPS per core @ 2GHz

Scalability

From 1 TOPS up to 1000+ TOPS (using multiple cores)

MACs/Cycle

Up to 65,536 MACs/core (BF16 or INT32 internal accuracy)

Support functions

Wide range of Activation, Pooling, Unary, Binary, Tensor Shaping, Attention and Linear operations to ensure 100% NN (DNN, VT, SSM, LLM and more) execution within aiWare NPU with no host CPU intervention

Configurability

•    Number of MACs
•    Size of on-chip local tightly-coupled SRAM & WFRAM
•    Safety features (ASIL-B standard)
•    Generic interfaces for both host CPU and local LPDDR or shared memory

NN depth and graph format

No depth and format limit. Excellent multi-headed and multi-input NN support

Quantization

State-of-the-art and constantly updated quantization algorithms shipped with SDK. SDK enables the application of proprietary quantization schemes/strategies. Underlying arithmetic ensures very low accuracy loss.

Data types

INT8 or FP8 Native
32-bit internal precision and dynamic per-layer scaling

 

ISO26262 safety

Compliance

As the only NPU certified for ISO 26262 compliance as a Safety Element out of Context (SEooC), rather than just process compliance featured by competitors, this solution delivers unmatched functional safety and significantly reduces integration effort for ADAS-targeted silicon.

Hardware

Configurable safety mechanisms for up to ASIL-D, enabling balance between silicon overhead and functional safety requirements and objectives

Software

Tools and runtime support developed using ISO26262-compliant processes

 

Memory

Core SRAM

Up to 16MBytes per core (configurable)

Wavefront SRAM

1-64MBytes per core (configurable)

External Memory

Dedicated off-chip DRAM or shared SOC memory 

Bandwidth reduction

On-chip compression
Wavefront-based scheduling optimizing on-chip memory usage per-cycle and per-layer

Main interface

AXI4 to LPDDR
AXI4 to host

 

Neural network development frameworks

Frameworks supported

Caffe/Caffe2, TensorFlow, PyTorch, ONNX, Khronos NNEF

Inference deployment

Binary compiled using aiWare Studio or command line tools offline
Single binary contains one or multiple NNs, weights and all scheduling info

Software runtime

Minimal host CPU management required during execution. Simple generic portable runtime API runs on any RTOS or HLOS; wrappers to popular APIs available on request

Development Tools

aiWare Studio provides comprehensive tools to import, analyse and optimize any NN with easy-to-use interactive UI

Evaluation Tools

aiWare Studio features offline performance estimator accurate to within 5% of final silicon
CPU- and GPU-based emulators
FPGA implementations also available

Application validation tools

Application validation tools

 

Target applications

Automotive Inference for automated driving  
High performance automotive multi-camera perception  
Large camera NN processing (no upper limit on input resolution)  
High data rate heterogeneous multi-sensor fusion  
Interested in aiWare's hardware innovations?

Get in touch with us

Our team is always ready to work with exciting and ambitious clients. If you're ready to start your partnership with us, get in touch.

Contact Us