The aiWare hardware IP Core is highly customizable. It can be deployed as an on-chip accelerator within a SOC or as a stand-alone NN accelerator. On-chip and external memory sizes are highly configurable to optimize performance for customer requirements. aiWare maximizes host CPU offload using on-chip SRAM and external DRAM to keep execution and dataflow within the core. aiWare was designed with automotive real-time inference in mind. As a result, all implementations can be used in ASIL-B compliant subsystems, and ASIL-D compliant configurations are in development. aiWare delivers outstanding power efficiency within a tight thermal window for the most demanding embedded NN inference applications.
The aiWare IP core is fully synthesizable RTL needing no special libraries, enabling neural network acceleration cores from 1TMAC/s to 50+ TMAC/s. Optimized for efficiency at low clock speeds, the aiWare IP core can operate anywhere from 100 MHz to 1 GHz.The hardware IP core is also highly deterministic to increase safety, removing the complexity of caches or programmable cores. aiWare delivers more than 2 TMAC/s per W (4 TOP/s per Watt – 7nm estimated) while sustaining >95% efficiency under continuous operation. The IP core offers a range of ASIL-B–D compliant implementation options either on-chip with a host CPU SoC or as a dedicated NN-accelerator.
All implementations and aiWare-based solutions are supported by the same software APIs and SDK. The aiWare Network Compiler accepts any NN as input via the NNEF™ (available) or ONNX standard (Q2 2019). NNs are converted into a binary ready for execution on the aiWare core. Support for these standards enables the acceleration of NNs from a wide range of frameworks such as Caffe, TensorFlow or PyTorch. The SDK also includes tools to translate CNNs based on FP32, FP16 or INT16 into INT8 with little or no loss of precision. This robust toolflow supports the maximized re-use of existing hardware and software designs when implementing aiWare-based systems.
AImotive offers aiWare Evaluation Systems to enable the benchmarking of the hardware IP core. The FPGA-based evaluation platform delivers up to 200 GMAC/s (400-500 GOPS). Our proof-of-concept silicon implementation of aiWare, created with Verisilicon™ and Globalfoundries™, provides up to 1.6 TMACS/s (>3 TOPS). All evaluation systems can run our partners' own neural networks using the aiWare SDK and AImotive's benchmarks. Including the aiWare SDK, the systems rely on NNEF for flexibility. Thus, our partners can independently verify our benchmark results, run on our closely specified benchmark framework, while gaining insight into the performance they can expect from their technologies when using aiWare.