Paolo Burgio
Enabling predictable computing on reconfigurable embedded accelerators
The dawn of the Moore era, and the need for power efficient performance in modern Cyber-Physical Systems, made data cruncher accelerators a preferred choice for next-generation autonomous systems, such as Self-Driving Cars (SDCs) and drones (UAVs). Among the possible architectures, reconfigurable logics are an appealing choice, because they let engineers create hardware extension blocks nearly on-the-fly, as opposite to the long development process of ASICs. This means that we can achieve high power efficiency for data crunching workloads, such as computer vision and machine learning, using custom hardware IPs, and deploying them onto the programmable fabric in relatively short time. Todays' approach is to solder such accelerators side-by-side with "traditional" core ISAs, to let data parallel and control workloads co-exist into the same chip, and reduce inter-nodes communication. However, this means that tens-to-hundred of compute engines in the same die must share -at least- the interconnect and memory blocks. The resulting hardware design is so complex that is barely analyzable with traditional approaches, and this, ultimately, harnesses system predictability, making such a powerful architecture of little or no practical use in production systems, e.g., in vehicle ECUs. We propose to tackle this with proper HW/SW co-design methodologies, by instantiating on the reconfigurable logics custom IP architectures to mitigate the main threat to predictability, that is, the contention on the shared memory hierarchy. In my talk, I will show why this problem is so critical, and how we were able to fast prototyping the foundational basic hardware blocks of a fully functional autonomous vehicle on FPGA-based computer, together with our custom overlay architecture, to reduce memory interference. Our approach enables robust and safe autonomous driving onto embedded heterogeneous SoCs, because we are now capable of consolidating multiple critical and non-critical applications onto the same chip.
back to overview