aiWare Studio
1. Startup
Explanation of main areas of the tool
First, lets explore the tool itself. There are 3 main areas: the Device Area, where we define aiWare configurations we are going to analyse; the Networks area where we select what networks we are analysing, and the results area where tabs for each analysis run appear.
2. First measurement
Performing a measurement
Let’s now perform a performance estimation using Yolo v2. To do this, we select the network we want to use and the aiWare configuration. We then click the Measure button to configure the analysis. Here we can select many options, including input resolution, memory bandwidth, and features such as BWO or EMO. If hardware is connected, we can also select whether we are doing an estimation, or reading the profiling information from the device itself – they both use exactly the same user interface. But for now we’ll just do a simple estimation.
3. Results
Exploring the results generated
Once the analysis is performed – usually only taking a few minutes – we see the results in a new tab. Within this tab there are a number of views, starting with the Summary. In this view we see the key performance metrics, including IPS and efficiency, as well as the configuration details of both aiWare and the network we are analysing. We can examine the network using Graph view; the timeline in Timeline view, then explore in detail GMACs per layer and external memory bandwidth per layer. We can get an overview of how the network performed in the LAM & IO Utilization view.
4. Exploring resolution
Quick exploration of impact on performance for different resolution inputs
Once we’ve got our first analysis, we can then move on to explore how the NPU performs under different conditions. Let’s look at changing the input resolution to see how the network performs. We simply click the Measure button, select the resolution we want, then run the estimation. When finished we can move between the summary tabs for each different run to quickly compare the results.
5. Different CNNs
More complex CNN (Inception v4); exploring impact of changing memory bandwidth vs using BWO (external bandwidth optimization)
Let’s now take a look at a more complex network, Inception v4. As you can see this has many more layers, and the topology is more complex than Yolo. After performing an analysis, we can then look at the results to see where the bottlenecks are. We can then explore two options for improving performance: increasing memory bandwidth, or using the BWO (Bandwidth Optimization) hardware. Looks like BWO is a great solution here!