Powering autonomy

&imagePreview=1

Written by László Kishonti / Posted at 7/2/19

Powering autonomy

Changing Traditions

Tesla’s announcement of its Full Self-Driving chip, known as the FSD, has raised questions about the future of automotive hardware platforms. In creating the FSD, the ever-pioneering company has done something that traditional OEMs are less inclined to do. Invest massive amounts in internal development to control the supply chain.

Opposed to this, automotive OEMs have traditionally relied on Tier1s and system integrator companies to supply required ECUs. In a recent study, Shiv Patel of ABI research believes this approach will remain mostly unchanged. Patel states: “ABI Research also finds it highly unlikely that other OEMs will follow Tesla’s lead and move to develop their own chips in-house.”

The majority of OEMs simply lack the in-house expertise and resources to create a chip. We know. We’ve been developing our neural network accelerator hardware IP core, aiWare for over three years. Our hardware team has worked side-by-side with our self-driving development teams throughout this period. This has given them invaluable experience on the unique demands real automated driving workloads place on hardware. The only other company to develop software and hardware in this manner? Tesla. And while Tesla’s solution is proprietary, and will most likely not be available for licensing, aiWare is. But why is this important?

The Fall of the GPU

Almost all automated driving systems currently in development or production rely on GPUs for processing. However, what makes GPUs so enticing for prototyping eventually turns into their main shortcomings. They are easy to get a hold of, diverse, programmable, and powerful.

GPUs offer a platform that provides developers and engineers the flexibility to experiment with different solutions and find the best possible fit for a given use case, limited only by the performance of the silicon in the box. They are perfect for the first steps. However, as projects near production, the demands of the automotive industry catch up. ECUs must be powerful, consume as little power as possible, generate as little heat as possible, and on top of all this, they have to fit in a car.

If GPUs aren’t the solution, then what is? Simple: NN accelerators. At AImotive, we have always believed that only dedicated solutions can meet the demands of self-driving. That’s the premise behind aiWare. Unsurprisingly, the very same train of thought led to the Tesla FSD.

NN accelerators can be implemented either on-chip within a SOC or as external accelerators. The FSD is an example of the former, while aiWare supports both implementation options.

Nevertheless, I should note that, while the benefits of NN accelerators are many fold, they are not a replacement for CPUs and GPUs. As Márton Fehér stressed in his recent blog, self-driving systems require a lot of general purpose computing resources – alongside NN computation – to operate efficiently.

Optimized for Driving Automation

What do automotive NN accelerators do differently to overcome the shortcomings of GPUs? There are several approaches, and on a high level, aiWare shares fundamental characteristics with the FSD:

Dedicated hardware platform with no programmable elements or caches;
Optimize for convolution
Maximize on-chip SRAM while minimizing external memory access;
Store and process data in INT8 to maximize the efficiency of calculations.

The core aspect of designing solutions from the ground up is optimizing them for a particular set of tasks specific to the use case. As a result, all programmable elements of the hardware can be removed. This limits the flexibility of the platform but means it is orders of magnitudes more efficient for the tasks it’s built to perform. Programmability and flexibility are extremely power hungry.

At the Autonomy Investor Day event, Pete Bannon, Tesla’s VP Chip Engineering, went into the details of how the FSD has been optimized. The rationale behind optimizing for convolutions is obvious in light of Tesla’s workloads. Bannon stated that “98.1% is convolution; 1.6% deconvolution; 0.1% ReLU; 0.2% pooling. 99.7% of operations are Multiply-Add.” The workloads are similar for aiDrive, and as a result, aiWare has also been optimized for convolution.

Since the early days of our research, it has been evident to our hardware engineers that NN computation is a question of data and memory more than anything else. However, accessing external memory is not only a potential bottleneck for an accelerator; its another power hungry process eating away at scarce resources. This is why our team built aiWare to maximize on-chip SRAM and minimize external DRAM bandwidth. As Bannon reiterated: “SRAM on-chip consumes 100x less energy than external DRAM.”

Image of the Tesla FSD
The area of SRAM on the Tesla FSD emphasizes the importance of on-chip memory. (Source)

There are several ways of optimizing hardware performance without making changes to the chip design itself. One of these is the format in which data is stored and processed. It is common knowledge that certain calculations are performed more efficiently on data stored in INT8 than those stored in INT32 or FP32. Thus, computing neural networks in INT8 is another way of optimizing the power draw of the platform. This is why the aiWare SDK contains a tool to convert data in other formats to INT8 with little or even no loss of efficiency.

It is evident from what Tesla showed at the Autonomy Investors Day event that the FSD is an impressive piece of engineering, especially considering how quickly the whole solution was created. Based on what Pete Bannon, Tesla’s VP Chip Engineering said at the event, many of its performance gains are also built on similar grounds to those of aiWare. However, aiWare has been under development for the past three years, meaning many of its solutions are more mature than those utilized by Tesla.

Taking the Next Step to Self-Driving

It seems automotive OEMs are facing a problem. They lack the expertise needed to create their own chipsets, yet available hardware solutions are a limitation of the automated driving offerings they can deploy. The solution is collaboration. As we have stressed repeatedly over the past months, self-driving is a complex challenge, and no-one will be able to solve it alone. To overcome the limitations of processing platforms OEMs should look to work with engineering teams who have direct experience in designing hardware platforms, including NN accelerators, for automated vehicles. aiWare was created specifically for this purpose.