aiWare™ specifications

Hardware IP for Automotive AI

 

Key features

Performance 256 TOPS per core @ 3GHz
Scalability From 1 TOPS up to 1000+ TOPS (using multiple cores)
MACs/Cycle Up to 65,536 MACs/core (FP16 or INT32 internal accuracy)
Efficiency Up to 98% for NNs such as vgg16 or Yolo
​​​​​​​>85% for a wide range of automotive CNNs
Convolution support Up to 100% efficiency achievable. MAC arrays optimized for convolution and deconvolution.
Support functions Wide range of Activation, Pooling, Unary, Binary, Tensor Introduction/Shaping and Linear operations to ensure 100% NN execution within aiWare NPU with no host CPU intervention
Configurability •    Number of MACs
•    Size of on-chip local tightly-coupled SRAM & WFRAM
•    Safety features (ASIL-B standard)
•    Generic interfaces for both host CPU and local LPDDR or shared memory
CNN depth No limits
Quantization State-of-the-art and constantly updated quantization algorithms shipped with SDK. SDK enables the application of proprietary quantization schemes/strategies. Underlying arithmetic ensures very low accuracy loss.
Data types INT8 or FP8 Native
32-bit internal precision and dynamic per-layer scaling
 

ISO26262 safety

Compliance ASIL-B and higher for both SEooC (Safety Element out of Context) and in-context safety element applications; compliance features and documentation customized for each customer to align with their own ISO26262 compliance plans
Hardware Configurable safety mechanisms for up to ASIL-D, enabling balance between silicon overhead and functional safety requirements and objectives
Software Tools and runtime support developed using ISO26262-compliant processes
 

Memory

Core SRAM Up to 16MBytes per core (configurable)
Wavefront SRAM 1-64MBytes per core (configurable)
External Memory Dedicated off-chip DRAM or shared SOC memory 
Bandwidth reduction On-chip compression
Wavefront-based scheduling optimizing on-chip memory usage per-cycle and per-layer
Main interface AXI4 to LPDDR
AXI4 to host
 

Neural network development frameworks

Frameworks supported Caffe/Caffe2, TensorFlow, PyTorch, ONNX, Khronos NNEF
Inference deployment Binary compiled using aiWare Studio or command line tools offline
Single binary contains one or multiple NNs, weights and all scheduling info
Software runtime Minimal host CPU management required during execution. Simple generic portable runtime API runs on any RTOS or HLOS; wrappers to popular APIs available on request
Development Tools aiWare Studio provides comprehensive tools to import, analyse and optimize any NN with easy-to-use interactive UI
Evaluation Tools aiWare Studio features offline performance estimator accurate to within 5% of final silicon
FPGA implementations also available
 

Target applications

Automotive Inference for automated driving  
High performance automotive multi-camera perception  
Large camera NN processing (no upper limit on input resolution)  
High data rate heterogeneous multi-sensor fusion  
Interested in aiWare's hardware innovations?

Get in touch with us

Our team is always ready to work with exciting and ambitious clients. If you're ready to start your partnership with us, get in touch.

Contact Us