aiWare Specifications | High-Performance Automotive AI Processing
Key features
| Performance | 256 TOPS per core @ 2GHz |
| Scalability | From 1 TOPS up to 1000+ TOPS (using multiple cores) |
| MACs/Cycle | Up to 65,536 MACs/core (FP16 or INT32 internal accuracy) |
| Efficiency | Up to 98% for NNs such as vgg16 or Yolo >85% for a wide range of automotive CNNs |
| Convolution support | Up to 100% efficiency achievable. MAC arrays optimized for convolution and deconvolution. |
| Support functions | Wide range of Activation, Pooling, Unary, Binary, Tensor Introduction/Shaping and Linear operations to ensure 100% NN execution within aiWare NPU with no host CPU intervention |
| Configurability | • Number of MACs • Size of on-chip local tightly-coupled SRAM & WFRAM • Safety features (ASIL-B standard) • Generic interfaces for both host CPU and local LPDDR or shared memory |
| CNN depth | No limits |
| Quantization | State-of-the-art and constantly updated quantization algorithms shipped with SDK. SDK enables the application of proprietary quantization schemes/strategies. Underlying arithmetic ensures very low accuracy loss. |
| Data types | INT8 or FP8 Native 32-bit internal precision and dynamic per-layer scaling |
ISO26262 safety
| Compliance | ASIL-B and higher for both SEooC (Safety Element out of Context) and in-context safety element applications; compliance features and documentation customized for each customer to align with their own ISO26262 compliance plans |
| Hardware | Configurable safety mechanisms for up to ASIL-D, enabling balance between silicon overhead and functional safety requirements and objectives |
| Software | Tools and runtime support developed using ISO26262-compliant processes |
Memory
| Core SRAM | Up to 16MBytes per core (configurable) |
| Wavefront SRAM | 1-64MBytes per core (configurable) |
| External Memory | Dedicated off-chip DRAM or shared SOC memory |
| Bandwidth reduction | On-chip compression Wavefront-based scheduling optimizing on-chip memory usage per-cycle and per-layer |
| Main interface | AXI4 to LPDDR AXI4 to host |
Neural network development frameworks
| Frameworks supported | Caffe/Caffe2, TensorFlow, PyTorch, ONNX, Khronos NNEF |
| Inference deployment | Binary compiled using aiWare Studio or command line tools offline Single binary contains one or multiple NNs, weights and all scheduling info |
| Software runtime | Minimal host CPU management required during execution. Simple generic portable runtime API runs on any RTOS or HLOS; wrappers to popular APIs available on request |
| Development Tools | aiWare Studio provides comprehensive tools to import, analyse and optimize any NN with easy-to-use interactive UI |
| Evaluation Tools | aiWare Studio features offline performance estimator accurate to within 5% of final silicon FPGA implementations also available |
Target applications
| Automotive Inference for automated driving | |
| High performance automotive multi-camera perception | |
| Large camera NN processing (no upper limit on input resolution) | |
| High data rate heterogeneous multi-sensor fusion |