aiWare™ specifications

Hardware IP for Automotive AI

 

Key features

Performance Up to 256 TOPS @ 2GHz
Scalability From 1 TOPS (zero DRAM) up to 256 TOPS
MACs/Cycle Up to 65,536 INT8 MACs (32-bit internal accuracy)
Efficiency Up to 98% for NNs such as vgg16 or Yolo
>85% for a wide range of automotive CNNs
Convolution support Up to 100% efficiency achievable. MAC arrays optimized for 2D and 3D convolution and deconvolution. Matrix multipliers not used - no need for Winograd or other transforms
Support functions Wide range of Activation, Pooling, Unary, Binary, Tensor Introduction/Shaping and Linear operations to ensure 100% CNN execution within aiWare NPU with no host CPU intervention
Configurability •    Size of on-chip local tightly-coupled SRAM
•    Size of on-chip WFRAM
•    Safety features (ASIL-B standard)
•    Generic interfaces for both host CPU and local LPDDR or shared memory
CNN depth No limit for any INT8 CNN
Quantization Advanced coarse-grain and fine-grain FP32 to INT8 conversion, often with zero loss of accuracy
Data types INT8 Native
32-bit internal precision and dynamic per-layer scaling
 

ISO26262 safety

Compliance ASIL-B and higher for both SEooC (Safety Element out of Context) and in-context safety element applications; compliance features and documentation customized for each customer to align with their own ISO26262 compliance plans
Hardware Configurable safety mechanisms for up to ASIL-D, enabling balance between silicon overhead and functional safety requirements and objectives
Software Tools and runtime support developed using ISO26262-compliant processes
 

Memory

Core SRAM Up to 16MBytes per core (configurable)
Wavefront SRAM 1-32MBytes per core (configurable)
External Memory Dedicated off-chip DRAM or shared SOC memory 
Bandwidth reduction On-chip compression 
Wavefront-based scheduling optimizing on-chip memory usage per-cycle and per-layer
Main interface AXI4 to LPDDR
AXI4 to host
 

Neural network development frameworks

Frameworks supported Caffe/Caffe2, TensorFlow, PyTorch, ONNX, Khronos NNEF
Inference deployment Binary compiled using aiWare Studio or command line tools offline
Single binary contains one or multiple NNs, weights and all scheduling info
Software runtime Minimal host CPU management required during execution. Simple generic portable runtime API runs on any RTOS or HLOS; wrappers to popular APIs available on request
Development Tools aiWare Studio provides comprehensive tools to import, analyse and optimize any NN with easy-to-use interactive UI
Evaluation Tools aiWare Studio features offline performance estimator accurate to within 5% of final silicon
FPGA implementations also available
 

Target applications

Automotive Inference for automated driving  
High performance automotive multi-camera perception  
Large camera NN processing (no upper limit on input resolution)  
High data rate heterogeneous multi-sensor fusion  
Interested in aiWare's hardware innovations?

Get in touch with us

Our team is always ready to work with exciting and ambitious clients. If you're ready to start your partnership with us, get in touch.

Contact Us