# Introduction to FPGA

### **Corso INFN "Tecniche Di Machine Learning Con Dispositivi** FPGA per Gli Esperimenti Di Fisica Delle Particelle"

**Riccardo Travaglini - 02/11/2022** 



**Disclaimer:** This presentation is intended only for personal use of this course participants; Do not distribute - dot not modify Use only for education purposes

# Outline

- Digital circuits: combinational, sequential and synchronous logic
- FPGA hardware Xilinx devices as an example
- How to design
- 1. How do your rate your overall knowledge with FPGA (1:poor 5:expert)



6. Which one of the following topic IS completely UNKNOWN to you?





## Field Programmable Gate Array **Gate? Gate Array? Field Programmable?**

- Data => bits; Boolean algebra => bit operations
- Logic gates are basic build blocks of Boolean expressions (AND, OR, NAND, XOR....)
- Digital circuits: electrical circuits where I/O voltages (rarely current) take only 2 values (high and low), so representing bits
- Gates are circuits implementing logic gates
- Digital circuits + gates => combinational logic (electric/electronic circuits implementing boolean logic)
- FPGA: plenty of gates (array in a broader sense)
- No function at the power up: FPGA must be programmed
- Program vs. Configure





### **Combinational Logic Basic gates in modern FPGA**



nn = in\_0 \* weight\_0 + in\_1 \* weight\_1 + bias

Output is only function of inputs (memory-less)

Look-up Tables The f<del>unction generators</del> can implement:

- Any arbitrarily defined six-input Boolean function. •
- Two arbitrarily defined five-input Boolean functions, as long as these two functions • share common inputs.
- Two arbitrarily defined Boolean functions of three and two inputs or less. • Xilinx - UltraScale Architecture Configurable Logic Block User Guide (UG574)

Others combinational building blocks: MUX and Carry Logic

## Sequential logic and synchronous design Add memory

- Sequential logic: output function of inputs as internal state (~memory of past input)
- Memory component: flip flop
- Clock signal
- Full synchronous design -> FPGA







# Latency - Throughput

- Latency: timing from inputs to outputs
  - In clock cycles •
  - Absolute time (clock-cycles \* clock-period)

- Throughput: outputs in the time unit
  - number of output data per clock-cycles





## Pipelining What about latency and throughput?









# Outline

- Digital circuits: combinational, sequential and synchronous logic
- FPGA hardware Xilinx devices as an example



## **Configurable Logic Block** LUT, MUX, Carry Logic, Flipflops



Program FPGA means configure gates (f.i. LUTs) and enable selected connections among gates and elements



|                                                                                                                                                                                                                                   |                                                                                                                                                                                                                       | X                                                                                                          |                 |                                              |                                                                                                                           |                                                                            |                                                                    |                                                                    |             |
|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|------------------------------------------------------------------------------------------------------------|-----------------|----------------------------------------------|---------------------------------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------|--------------------------------------------------------------------|--------------------------------------------------------------------|-------------|
| ps                                                                                                                                                                                                                                |                                                                                                                                                                                                                       |                                                                                                            |                 |                                              |                                                                                                                           |                                                                            |                                                                    |                                                                    |             |
|                                                                                                                                                                                                                                   |                                                                                                                                                                                                                       |                                                                                                            |                 |                                              |                                                                                                                           |                                                                            |                                                                    |                                                                    |             |
|                                                                                                                                                                                                                                   |                                                                                                                                                                                                                       |                                                                                                            |                 |                                              |                                                                                                                           |                                                                            |                                                                    |                                                                    |             |
|                                                                                                                                                                                                                                   |                                                                                                                                                                                                                       |                                                                                                            |                 |                                              |                                                                                                                           |                                                                            |                                                                    |                                                                    |             |
|                                                                                                                                                                                                                                   |                                                                                                                                                                                                                       |                                                                                                            |                 |                                              |                                                                                                                           |                                                                            |                                                                    |                                                                    |             |
|                                                                                                                                                                                                                                   |                                                                                                                                                                                                                       |                                                                                                            |                 |                                              |                                                                                                                           |                                                                            |                                                                    |                                                                    |             |
| aakuume aauuame                                                                                                                                                                                                                   |                                                                                                                                                                                                                       | aaliiwee aak                                                                                               |                 |                                              | גזאנני מפארר:<br>אואני מפארר:<br>אואני                                                                                    | 0.00 L L L L L L L L L L L L L L L L L L                                   | aah isawas<br>aah isawas                                           | and performent                                                     | ask c szres |
| aak_uarres         aak_uarres         aak_uarres           aak_uarres         aak_uarres         aak_uarres           aak_uarres         aak_uarres         aak_uarres           aak_uarres         aak_uarres         aak_uarres | aak_Unerrez         a           aak_Unerrez         a           aak_Unerrez         a           aak_Unerrez         a           aak_Unerrez         a           aak_Unerrez         a           aak_Unerrez         a | 181, 520997 aak<br>181, 520997 aak<br>181, 520997 aak<br>181, 520997 aak<br>181, 520977 aak                |                 | 2                                            | ATTYEE G BAR, USAFFEE<br>ATTYEE G BAR, USAFFEE<br>ATTYEE G BAR, USAFFEE<br>ATTYEE G BAR, USAFFEE<br>ATTYEE G BAR, USAFFEE | aakuxamer<br>aakuxamer<br>aakuxamer<br>aakuxamer<br>aakuxamer<br>aakuxamer | aak, ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ; ;                         | ash, verez<br>ash, verez<br>ash, verea<br>ash, verea<br>ash, verea | and, c      |
| aak, juraa aa, aa, araa<br>aak, juraa aa, araa                                                                                        | ash_Uneras         a           ash_Uneras         a           ash_Uneras         a           ash_Uneras         a           ash_Uneras         a           ash_Uneras         a           ash_Uneras         a        | 141.(;21721 a.a.h.<br>141.(;21721 a.a.h.<br>141.(;21721 a.a.h.<br>141.(;21721 a.a.h.<br>141.(;21721 a.a.h. |                 | anku<br>anku<br>anku<br>anku<br>anku<br>anku | 41731 3.8.4, 2.31731<br>41731 3.8.4, 2.31731<br>41731 3.8.4, 2.31731<br>41731 3.8.4, 2.31731<br>41732 3.8.4, 2.31731      | ۵۵۹٬۷۲۳۵۱<br>۵۵۹٬۷۲۳۵۱<br>۵۵۹٬۷۲۳۵۱<br>۵۵۹٬۷۲۳۵۱<br>۵۵۹٬۷۲۳۵۱              | ash, yavas<br>ash, yavas<br>ash, yavas<br>ash, yavas<br>ash, yavas | and verai<br>and verai<br>and verai<br>and verai                   |             |
| อสสุบเหลา อสบบสาวา                                                                                                                                                                                                                | ante, i mervaa a                                                                                                                                                                                                      | າສະເປັນພາສາ ອາສຟ                                                                                           | เสมชาวม (สุดมาร | a aatex                                      | สมมาณ (สมมาณ)                                                                                                             | 0.02.0 <u>.</u>                                                            | and charm                                                          | ແຂ່ ເປັນຄະນາ                                                       | and Cistan  |



### Memory and processing Hard silicon resources

Several kind of RAMs;

- BRAM
- Distributed RAM
- Ultra RAM



FIF()s

### Digital signal processing slices

**XILINX Ultrascale DSP slice** 



# Input/Output Interface

- Configure I/O: map design I/O on physical chip pins (depends on PCB design, chip package)
- Several electrical standard are compatible (LVCMOS, LVDS, ...)
- High speed transceivers (up to tenth of Gbit/s) to interface:
  - Fast busses (PCIe, ...)
  - Optical driver/receiver
  - High bandwidth memory (DDRx, HBMx)

# **Xilinx Families**



The combination of TSMC's 16nm FinFET process with new UltraRAM and SmartConnect technologies enables AMD Xilinx to continue delivering 'More than Moore's Law' value to the market.





AMD Xilinx enhanced FPGA architecture that contains a step-function increase in both the amount of connectivity resources and the associated inter-die bandwidth in this second-generation 3D IC architecture.



requirements for key applications..









Continuous innovation on the process node enables new devices with optimal performance at the lower power across product families to meet







# Xilinx Virtex Ultrascale+

|                                             | Foundation |       |       |       |       |        |       |
|---------------------------------------------|------------|-------|-------|-------|-------|--------|-------|
| Device Name                                 | VU3P       | VU5P  | VU7P  | VU9P  | VU11P | VU13P  | VU19P |
| System Logic Cells (K)                      | 862        | 1,314 | 1,724 | 2,586 | 2,835 | 3,780  | 8,938 |
| CLB Flip-Flops (K)                          | 788        | 1,201 | 1,576 | 2,364 | 2,592 | 3,456  | 8,172 |
| CLB LUTs (K)                                | 394        | 601   | 788   | 1,182 | 1,296 | 1,728  | 4,086 |
| Max. Dist. RAM (Mb)                         | 12.0       | 18.3  | 24.1  | 36.1  | 36.2  | 48.3   | 58.4  |
| Total Block RAM (Mb)                        | 25.3       | 36.0  | 50.6  | 75.9  | 70.9  | 94.5   | 75.9  |
| UltraRAM (Mb)                               | 90.0       | 132.2 | 180.0 | 270.0 | 270.0 | 360.0  | 90.0  |
| DSP Slices                                  | 2,280      | 3,474 | 4,560 | 6,840 | 9,216 | 12,288 | 3,840 |
| Peak INT8 DSP (TOP/s)                       | 7.1        | 10.8  | 14.2  | 21.3  | 28.7  | 38.3   | 10.4  |
| PCle <sup>®</sup> Gen3 x16                  | 2          | 4     | 4     | 6     | 3     | 4      | 0     |
| PCIe Gen3 x16/Gen4 x8 / CCIX <sup>(1)</sup> | -          | —     | -     | —     | -     | -      | 8     |
| 150G Interlaken                             | 3          | 4     | 6     | 9     | 6     | 8      | 0     |
| 100G Ethernet w/ KR4 RS-FEC                 | 3          | 4     | 6     | 9     | 9     | 12     | 0     |
| Max. Single-Ended HP I/Os                   | 520        | 832   | 832   | 832   | 624   | 832    | 1,976 |
| Max. Single-Ended HD I/Os                   | 0          | 0     | 0     | 0     | 0     | 0      | 96    |
| GTY 32.75Gb/s Transceivers                  | 40         | 80    | 80    | 120   | 96    | 128    | 80    |
| GTM 58Gb/s PAM4 Transceivers                | -          | _     | -     | _     | -     | -      | -     |
|                                             |            |       |       |       |       |        |       |

# System on Chip

- PS: complete configurable and autonomous processing system
  - Running custom sw app (even Linux!)
- PL (called Fabric)
  - Custom peripheral IF
  - Accelerators

. . .

Connected via AXI busses



### Put them in action Ad-hoc board, Evaluation & Kits, System on Modules



ATLAS IBL Read Out Driver Form factor: VME 9U Xilinx Fpga: 1 Virtex-5 - 2 Spartan 6



Avnet MInized - Zyng 7000



Zynq UltraScale+ RFSoC ZCU216 Evaluation Kit



Xilinx Kria System-on-Modules

## **Classification by location** Edge, on premise, on cloud

- Edge Computing : move processing close to data "producers", typically sensors, IoT, ...
- On premise: dedicated resources on proprietary servers (es: FPGA accelerators)
- On cloud: dedicated resources accessible through cloud computing (e.g. Amazon AWS)



Ultra96 Avnet - Zynq







## **Persistent vs. Volatile programming** Technology dependent behaviour

- AMD/Xilinx and Intel are SRAM-based: not programmed at the power on
  - Often there's a companion non-volatile memory to speed-up programming the FPGA at the switch on

- Other technologies can have persistent programming best suited for specific applications (e.g. rad-hard)
  - Microchip (formerly Actel): flash-based
  - Fuse/Antifuse (eg. Quicklogic)



# Outline

- Digital circuits: combinational, sequential and synchronous logic
- FPGA hardware Xilinx devices as an example
- How to design •
- 4. How much are you proficient with VHDL/Verilog (1:not at all 5:expert)



5. How much are you proficient with High Level Synthesys (1:not at all - 5:expert)

7. Which of the following programming languages you are familiar with?



# Traditional design strategy

- Describe algorithms with an Hardware Description Language (VHDL, Verilog)
- Behavioural simulation
- Synthesise (convert description in basic logic elements gates, memories, DSP, ...)
- Place & Route driven by constraints
  - Place: map logic elements to the chosen FPGA
  - Route: enable/disable connections to route signal between logic elements
- Create programming file
- Load programming file to FPGA





20



### IP CORES Intellectual Property

- Reusable code
- Can be used as a "black box"
- Can be encrypted
- Xilinx provides a GUI (IP integrator) as a schematic editor for IP-based designs



## High level synthesis In a glance





### A subset of C/C++

### with synthesis directives

```
vadd: for(int i = 0; i < len; i++) {
#pragma HLS PIPELINE
c[i] = a[i] + b[i];
}</pre>
```

Focus on:

- Loops (pipelining, unrolling)
- Arrays (memories)
- Arbitrary precision data types (Integer and fixed-points)

# Accelerators and platforms **Based on dynamic programming**

• "For AMD Xilinx accelerator cards on premises or in the cloud, the Vitis target platform any connection details!"



Software Framework

Hardware Framework

automatically configures the PCIe interfaces that connect and manage communication between your FPGA accelerators and x86 application code—you don't need to implement

|                                        | Xilinx Device                                                                 |
|----------------------------------------|-------------------------------------------------------------------------------|
| User<br>Application<br>Code            | Dynamic Region                                                                |
| Platform<br>(Hardware and<br>Software) | CMP       CMC       ERT         HIF       DMA       CRI         Static Region |

# **Overlays**

From PYNQ\_Workshop for v2.6 - AMD/Xilinx University Program https://github.com/Xilinx/PYNQ Workshop

### > Overlays are generic FPGA designs that target multiple users with new design abstractions and tools

### > Overlay characteristics

- Post-bitstream programmable via software APIs
- Typically optimized for given application domains
- Encourages the use of open source tools & fast compilation
- Enables productivity by re-using pre-optimized designs
- Makes benefits of FPGAs accessible to new users

