aiWare Specifications | High-Performance Automotive AI Processing

Summary aiWare4+ Apache5 Specifications

Key features

Performance	256 TOPS per core @ 2GHz
Scalability	From 1 TOPS up to 1000+ TOPS (using multiple cores)
MACs/Cycle	Up to 65,536 MACs/core (FP16 or INT32 internal accuracy)
Efficiency	Up to 98% for NNs such as vgg16 or Yolo >85% for a wide range of automotive CNNs
Convolution support	Up to 100% efficiency achievable. MAC arrays optimized for convolution and deconvolution.
Support functions	Wide range of Activation, Pooling, Unary, Binary, Tensor Introduction/Shaping and Linear operations to ensure 100% NN execution within aiWare NPU with no host CPU intervention
Configurability	• Number of MACs • Size of on-chip local tightly-coupled SRAM & WFRAM • Safety features (ASIL-B standard) • Generic interfaces for both host CPU and local LPDDR or shared memory
CNN depth	No limits
Quantization	State-of-the-art and constantly updated quantization algorithms shipped with SDK. SDK enables the application of proprietary quantization schemes/strategies. Underlying arithmetic ensures very low accuracy loss.
Data types	INT8 or FP8 Native 32-bit internal precision and dynamic per-layer scaling

ISO26262 safety

Compliance	ASIL-B and higher for both SEooC (Safety Element out of Context) and in-context safety element applications; compliance features and documentation customized for each customer to align with their own ISO26262 compliance plans
Hardware	Configurable safety mechanisms for up to ASIL-D, enabling balance between silicon overhead and functional safety requirements and objectives
Software	Tools and runtime support developed using ISO26262-compliant processes

Memory

Core SRAM	Up to 16MBytes per core (configurable)
Wavefront SRAM	1-64MBytes per core (configurable)
External Memory	Dedicated off-chip DRAM or shared SOC memory
Bandwidth reduction	On-chip compression Wavefront-based scheduling optimizing on-chip memory usage per-cycle and per-layer
Main interface	AXI4 to LPDDR AXI4 to host

Neural network development frameworks

Frameworks supported	Caffe/Caffe2, TensorFlow, PyTorch, ONNX, Khronos NNEF
Inference deployment	Binary compiled using aiWare Studio or command line tools offline Single binary contains one or multiple NNs, weights and all scheduling info
Software runtime	Minimal host CPU management required during execution. Simple generic portable runtime API runs on any RTOS or HLOS; wrappers to popular APIs available on request
Development Tools	aiWare Studio provides comprehensive tools to import, analyse and optimize any NN with easy-to-use interactive UI
Evaluation Tools	aiWare Studio features offline performance estimator accurate to within 5% of final silicon FPGA implementations also available

Target applications

Automotive Inference for automated driving
High performance automotive multi-camera perception
Large camera NN processing (no upper limit on input resolution)
High data rate heterogeneous multi-sensor fusion

Interested in aiWare's hardware innovations?

Get in touch with us

Our team is always ready to work with exciting and ambitious clients. If you're ready to start your partnership with us, get in touch.