aiWare™ specifications

Hardware IP for Automotive AI

K E Y   F E A T U R E S

 

aiWare3P

Performance Up to 24 TOPS per core @ 1.5GHz
Scalability From 1 TOPS to 128 TOPS
MACs/Cycle Up to 8,192 INT8 (32-bit internal accuracy)
Efficiency Up to 98% for NNs such as vgg16 or Yolo
>85% for most automotive CNNs
Convolution support Up to 100% efficiency achievable. MAC arrays optimized for 2D and 3D convolution and deconvolution. Matrix multipliers not used - no need for Winograd or other transforms
Support functions Wide range of Activation, Pooling, Unary, Binary, Tensor Introduction/Shaping and Linear operations to ensure 100% CNN execution within aiWare core with no host CPU intervention
Configurability
  • Size of core SRAM
  • Safety features
  • Generic interfaces for both host CPU and local LPDDR or shared memory
     
CNN depth No limit for any INT8 CNN
Quantization Advanced coarse-grain and fine-grain FP32 to INT8 conversion, often with zero loss of accuracy
Data types INT8 Native
32-bit internal precision and dynamic per-layer scaling
Sparsity Not needed for well-optimized NNs
Multicore capability Up to 4 per NOC recommended (no upper limit)
 

aiWare4

Performance Up to 64 TOPS per core @ 2GHz
Scalability From 1 TOPS (zero DRAM) up to 1,024 TOPS
MACs/Cycle Up to 16,384 INT8 (32-bit internal accuracy)
Efficiency Up to 98% for NNs such as vgg16 or Yolo
>85% for most automotive CNNs
Convolution support Up to 100% efficiency achievable. MAC arrays optimized for 2D and 3D convolution and deconvolution. Matrix multipliers not used - no need for Winograd or other transforms
Support functions Wide range of Activation, Pooling, Unary, Binary, Tensor Introduction/Shaping and Linear operations to ensure 100% CNN execution within aiWare core with no host CPU intervention
Configurability
  • Size of core SRAM
  • Size of WFRAM
  • Safety features (ASIL-B standard)
  • Generic interfaces for both host CPU and local LPDDR or shared memory
CNN depth No limit for any INT8 CNN
Quantization Advanced coarse-grain and fine-grain FP32 to INT8 conversion, often with zero loss of accuracy
Data types INT8 Native
32-bit internal precision and dynamic per-layer scaling
Sparsity Not needed for well-optimized NNs
Multicore capability Up to 4 per NOC recommended (no upper limit)

I S O 2 6 2 6 2   S A F E T Y

 

aiWare3P

Compliance ASIL-B and higher for both SEooC (Safety Element out of Context) and in-context safety element applications; compliance features and documentation customized for each customer to align with their own ISO26262 compliance plans
Hardware Configurable safety mechanisms for up to ASIL-B, enabling balance between silicon overhead and functional safety requirements and objectives
Software Tools and runtime support developed using ISO26262-compliant processes
 

aiWare4

Compliance ASIL-B and higher for both SEooC (Safety Element out of Context) and in-context safety element applications; compliance features and documentation customized for each customer to align with their own ISO26262 compliance plans
Hardware Configurable safety mechanisms for up to ASIL-D, enabling balance between silicon overhead and functional safety requirements and objectives
Software Tools and runtime support developed using ISO26262-compliant processes

M E M O R Y

 

aiWare3P

Core SRAM Up to 32Mbytes per core (configurable)
Wavefront SRAM None
External Memory Dedicated off-chip DRAM or shared SOC memory 
Bandwidth reduction On-chip compression 
Main interface AXI4 to LPDDR
AXI4 to host
 

aiWare4

Core SRAM Up to 16MBytes per core (configurable)
Wavefront SRAM 1-32MBytes per core (configurable)
External Memory Dedicated off-chip DRAM or shared SOC memory 
Bandwidth reduction On-chip compression 
WFRAM scheduling
Main interface AXI4 to LPDDR
AXI4 to host

N E U R A L   N E T W O R K   
D E V E L O P M E N T   F R A M E W O R K S

 

aiWare3P

Frameworks supported Caffe/Caffe2, TensorFlow, PyTorch, ONNX, Khronos NNEF
Inference deployment Binary compiled using aiWare Studio or command line tools offline
Single binary contains one or multiple NNs, weights and all scheduling info
Software runtime Minimal host CPU management required during execution. Simple generic portable runtime API runs on any RTOS or HLOS; wrappers to popular APIs available on request
Development Tools aiWare Studio provides comprehensive tools to import, analyse and optimize any NN with easy to use interactive UI
Evaluation Tools aiWare Studio features offline performance estimator accurate to within 5% of final silicon
FPGA implementations also available
 

aiWare4

Frameworks supported Caffe/Caffe2, TensorFlow, PyTorch, ONNX, Khronos NNEF
Inference deployment Binary compiled using aiWare Studio or command line tools offline
Single binary contains one or multiple NNs, weights and all scheduling info
Software runtime Minimal host CPU management required during execution. Simple generic portable runtime API runs on any RTOS or HLOS; wrappers to popular APIs available on request
Development Tools aiWare Studio provides comprehensive tools to import, analyse and optimize any NN with easy to use interactive UI
Evaluation Tools aiWare Studio features offline performance estimator accurate to within 5% of final silicon
FPGA implementations also available

T A R G E T   A P P L I C A T I O N S

 

aiWare3P

Automotive Inference for automated driving  
High performance automotive multi-camera perception  
Large camera NN processing (no upper limit on input resolution)  
High data rate heterogeneous multi-sensor fusion  
 

aiWare4

Automotive Inference for automated driving  
High performance automotive multi-camera perception  
Large camera NN processing (no upper limit on input resolution)  
High data rate heterogeneous multi-sensor fusion  
Interested in aiWare's hardware innovations?

Get in touch with us

Our team is always ready to work with exciting and ambitious clients. If you're ready to start your partnership with us, get in touch.

Contact Us