aiWare specifications
Key features
Performance | 256 TOPS per core @ 2GHz |
Scalability | From 1 TOPS up to 1000+ TOPS (using multiple cores) |
MACs/Cycle | Up to 65,536 MACs/core (FP16 or INT32 internal accuracy) |
Efficiency | Up to 98% for NNs such as vgg16 or Yolo >85% for a wide range of automotive CNNs |
Convolution support | Up to 100% efficiency achievable. MAC arrays optimized for convolution and deconvolution. |
Support functions | Wide range of Activation, Pooling, Unary, Binary, Tensor Introduction/Shaping and Linear operations to ensure 100% NN execution within aiWare NPU with no host CPU intervention |
Configurability | • Number of MACs • Size of on-chip local tightly-coupled SRAM & WFRAM • Safety features (ASIL-B standard) • Generic interfaces for both host CPU and local LPDDR or shared memory |
CNN depth | No limits |
Quantization | State-of-the-art and constantly updated quantization algorithms shipped with SDK. SDK enables the application of proprietary quantization schemes/strategies. Underlying arithmetic ensures very low accuracy loss. |
Data types | INT8 or FP8 Native 32-bit internal precision and dynamic per-layer scaling |
ISO26262 safety
Compliance | ASIL-B and higher for both SEooC (Safety Element out of Context) and in-context safety element applications; compliance features and documentation customized for each customer to align with their own ISO26262 compliance plans |
Hardware | Configurable safety mechanisms for up to ASIL-D, enabling balance between silicon overhead and functional safety requirements and objectives |
Software | Tools and runtime support developed using ISO26262-compliant processes |
Memory
Core SRAM | Up to 16MBytes per core (configurable) |
Wavefront SRAM | 1-64MBytes per core (configurable) |
External Memory | Dedicated off-chip DRAM or shared SOC memory |
Bandwidth reduction | On-chip compression Wavefront-based scheduling optimizing on-chip memory usage per-cycle and per-layer |
Main interface | AXI4 to LPDDR AXI4 to host |
Neural network development frameworks
Frameworks supported | Caffe/Caffe2, TensorFlow, PyTorch, ONNX, Khronos NNEF |
Inference deployment | Binary compiled using aiWare Studio or command line tools offline Single binary contains one or multiple NNs, weights and all scheduling info |
Software runtime | Minimal host CPU management required during execution. Simple generic portable runtime API runs on any RTOS or HLOS; wrappers to popular APIs available on request |
Development Tools | aiWare Studio provides comprehensive tools to import, analyse and optimize any NN with easy-to-use interactive UI |
Evaluation Tools | aiWare Studio features offline performance estimator accurate to within 5% of final silicon FPGA implementations also available |
Target applications
Automotive Inference for automated driving | |
High performance automotive multi-camera perception | |
Large camera NN processing (no upper limit on input resolution) | |
High data rate heterogeneous multi-sensor fusion |