Speakers
Description
The AMD Versal™ adaptive SoCs combine programmable logic (PL), processing system (PS), and AI Engines with leading-edge memory and interfacing technologies to deliver powerful heterogeneous acceleration for any application. The hardware and software are targeted for programming and optimization by data scientists and software and hardware developers. A host of tools, software, libraries, IP, middleware, and frameworks enable Versal adaptive SoCs to support all industry-standard design flows.
AI Engines are an array of very-long instruction word (VLIW) processors with single instruction multiple data (SIMD) vector units that are highly optimized for compute-intensive applications, specifically digital signal processing (DSP), 5G wireless applications, and artificial intelligence (AI) technology such as machine learning (ML).
AI Engines are hardened blocks that provide multiple levels of parallelism including instruction-level and data-level parallelism. Instruction-level parallelism includes a scalar operation, up to two moves, two vector reads (loads), one vector write (store), and one vector instruction that can be executed—in total, a 7-way VLIW instruction per clock cycle. Data-level parallelism is achieved via vector-level operations where multiple sets of data can be operated on a per-clock-cycle basis. Each AI Engine contains both a vector and scalar processor, dedicated program memory, local data memory, and can access adjacent local memory in any of three neighboring directions. It also has access to DMA engines and AXI4 interconnect switches to communicate via streams to other AI Engines or to the programmable logic (PL) or the DMA.
The new AIE-ML block, is a variant of AI Engine block primarily targeted for machine learning inference applications, delivers one of the industry's best performance per Watt for a wide range of inference applications.
As an application developer, it is possible to use one of the white box or black box flows for running a ML inference application on AIE-ML variants.
With the white box flow you can integrate custom kernels and dataflow graphs in the AI-ML variants C++ programming environment available within the AMD Vitis™ design flow.
A black box flow uses performance optimized Neural Processing Unit (NPU) IP from AMD to accelerate ML workloads in the AIE-ML variants. AMD Vitis™ AI is used as a front-end tool that parses the neural network graph, performs optimization, quantization of the graph, and generates compiled code that can be run on the AIE-ML variants hardware.
In the second part of the presentation we explore the integration of the AI Engine in Ryzen AI processors, highlighting the architecture and operation of the NPU. The NPU processing flow is described, from input reception to generating predictions, with autonomous access to DDR or HBM memory. Additionally, next-generation AI PC architectures are presented, featuring up to 12-core CPUs, advanced GPUs, and NPUs reaching 55 TOPS. The efficiency of the NPU is analyzed, showing up to a 31.6× performance improvement compared to the CPU and a significant reduction in power consumption. The section concludes with an overview of the unified AI software stack for Ryzen AI, ONNX model support, benchmarking tools, and a comparison between CPU and NPU efficiency.