# A Pattern Recognition Mezzanine based on Associative Memory and FPGA technology for Level 1 Track Triggers for the HL-LHC upgrade

D. Magalotti,<sup>a,f, \*</sup> L. Alunni<sup>a</sup>, N. Biesuz<sup>b</sup>, G.M. Bilei<sup>a</sup>, S. Citraro<sup>e</sup>, F. Crescioli<sup>c</sup>, L. Fanò<sup>a</sup>, G. Fedi<sup>b, ,</sup> G. Magazzù<sup>b</sup>, L. Servoli<sup>a</sup>, L. Storchi<sup>a</sup>, F. Palla<sup>b</sup>, P. Placidi<sup>a,d</sup>, A. Papi<sup>a</sup>, Y. Piadyk<sup>c</sup>, E. Rossi<sup>b</sup>, A. Spiezia<sup>g</sup> and L. Servoli<sup>b</sup>.

- <sup>a</sup> INFN Sezione di Perugia, Italy Via A. Pascoli, Italy
- <sup>b</sup><sub>b</sub> INFN Sezione di Pisa, Italy
- <sup>c</sup> <sub>c</sub> LPNHE, Paris, France
- <sup>d</sup> <sub>d</sub> Dipartimento di Ingegneria, Perugia, Italy
- Via G. Duranti, Italy
- <sup>e</sup> <sub>e</sub> Università di Pisa, Pisa, Italy
- <sup>f</sup> <sub>f</sub> UNIMORE, Modena, Italy
- Via Università, Italy

<sup>g</sup> IHEP, China E-mail: daniel.magalotti@pg.infn.it

ABSTRACT: The increase of luminosity at HL-LHC will require the introduction of tracker information at Level-1 trigger system for the experiments to maintain an acceptable trigger rate to select interesting events despite the one order of magnitude increase in the minimum bias interactions. To extract in the required latency the track information a dedicated hardware has to be used. We propose a prototype system (Pattern Recognition Mezzanine) as core of pattern recognition and track fitting for HL-LHC experiments, combining the power of both Associative Memory custom ASIC and modern Field Programmable Gate Array (FPGA) devices

KEYWORDS: Track trigger; FPGA; Associative Memory.

<sup>&</sup>lt;sup>\*</sup> Corresponding author.

# Contents

- 1. Introduction
- **2. The Pattern Recognition Mezzanine prototype** 2.1 The data distribution
- 3. The experimental setup
- 4. Conclusion

| Field Code C | 1 |
|--------------|---|
| Field Code C | 1 |
| Field Code C | 3 |
| Field Code C | 4 |
| Field Code C | 6 |

Field Code C

Field Code C

# 1. Introduction

The increase of the Large Hadron Collider luminosity to  $5 \times 10^{34}$  cm<sup>-2</sup> s<sup>-1</sup>, foreseen in the High Luminosity Phase in 2025, will bring the number of minimum bias interactions per bunch crossing up to about 140, causing an unmanageable increase in the trigger rate of the two general purposes detectors, ATLAS and CMS, due to background, if the thresholds will not be increased.

The tracker information, more precise for different particle and different event identification, is the key to solve this problem, if it could be processed in the required latency of few microseconds, too fast for the present implementation at higher trigger levels. A dedicated hardware processor is hence needed to select interesting configurations at the 40 MHz bunch-crossing rate.

Currently such class of processor is provided by the Associative Memory (AM) [1] technology, already adopted in the CDF experiment [2] and recently in the ATLAS Level-2 trigger, the Fast Tracker processor [3], where a longer latency is allowed. CMS is pursuing a vigorous R&D to demonstrate the feasibility of such an approach with state of the art AM technology embedded in ATCA board crates, and a similar approach is also studied for the ATLAS upgrade.

# 2. The Pattern Recognition Mezzanine prototype

The Pattern Recognition Mezzanine (PRM) is a custom mezzanine used to perform the track reconstruction combining the Application Specific Integrated Circuit (ASIC) and the Filed Programmable Gate Array (FPGA) technologies (<u>Figure 1Figure 1</u>). The PRM mezzanine is a  $14.9 \times 14.9 \text{ cm}^2$  and it hosts:

- two FMC connectors;
- a Xilinx Kintex 7 FPGA (XC7K355T);
- sixteen Associative Memory (AM05) devices [1];
- the area for the power regulators;
- an external DDRII memory.



Figure 14: The Pattern Recognition Mezzanine

The two FMC connectors are used to be interfaced with the Pulsar IIb [4] and they are compliant with the VITA 57.1 standard. The FMC connectors bring both the power and signal pins. The voltage rails provide by the Pulsar IIb are the 3.3 V and the 12 V with a maximum available power up to 150 W. The power regulator group are used to generate internally-of the voltage rails (1 V, 1.2 V, 1.8 V and 2.5 V) used from the FPGA, the AM device and the memory.

Regarding the signal lines, twelve high speed serial links are used to send and retrieve the data from the PRM with a total bandwidth in input and output up to 96 Gbps. Moreover sixty eight additional LVDS pair pins are used for the slow configuration and monitoring purpose of the PRM and to provide additional bandwidth for the data transfer.

The Kintex 7 (XC7K355T) FPGA manages all the data distribution inside the mezzanine. It receives the full resolution hits coming from the tracker layers by using the high speed serial link and a dual clock FIFO has been used to adapt the different clock domains: the recovery clock from the transceiver of the FPGA and the internal system clock.

A Data Organizer (DO) functionality has been implemented inside the FPGA that consists of a a-Smart Data Base containing the Super Strip Identification (SSID) and the Road ID (address of matched pattern) values according to the pattern back written in the AM device. It works in two different modalities [5]:

- *write mode*: it stores the incoming full resolution hits according to their SSID value;
- *read mode*: it retrieves the full resolution hit according to the Road ID obtained by the AM chip.

The generated SSIDs are <u>sent\_send</u> at the same clock cycle to all the AM devices. The <u>results</u> result of the pattern matching are collected inside the FPGA and all the possible combinations that can form actual tracks are produced. For each combination a fast linear fit is performed computing the  $\chi^2$  value and the track parameters. Those tracks with a  $\chi^2$  value below a predetermined threshold are considered to be good quality fits.

Field Code C

Field Code C

The FPGA implements several functions to control the AM devices (configuring and programming the bank) and also several tools for monitoring and debugging the internal data flow.

The sixteen AM devices are the 05 version [6] that provides a 2 kpatterns for each AM device for a total capacity to store up to 32 kpatterns per PRM. The pattern matching algorithm implemented by using the AM device looks for a given sequence of data constituting a predefined pattern and it is implemented by using custom Associative Memory (AM) devices. The AM device has characteristics similar to a Content-Addressable Memory (CAM) [7] and each pattern is stored in 8 independent 16-bit words, in which the coordinate locations of the position where the particle hits the silicon detector (hit) can be stored.

The <u>power powerful</u> of this device consists on the independent and simultaneous comparisons of each bus with the input data and every time a match <u>happens</u> all the matched patterns are read out.

The external memory provides up to 18Mbit of memory resources to store a copy of the pattern bank of the AM used to generate the SSIDs and to complete the track fitting operation.

#### 2.1 The data distribution

In this section we describe the architecture and the data distribution inside the PRM mezzanine. The PRM hosts 16 AM chips that are logically divided in two groups of 8 chips. The two groups of AM chips are served with independent line from the FPGA. For the input data distribution 8 independent serial links are replicating to all the eighth AM chips by using a cascade of two 1 to 4 fan-out buffers (Figure 2Figure 2 (a)). On the contrary the output bus of each AM chip is connected directly to the FPGA with a dedicate line (Figure 2Figure 2 (b)). The latency introduced by the hardware design is minimized and due only to the fan-out buffer that introduces a latency less than 380ps in each buses. A daisy chain for the JTAG connection is used to configure the AM chip dividing the AM chip in four groups of four chips.

The connection made by the FPGA are optimized to control in parallel the upper part and the lower part of the mezzanine so the event processing can be parallelized inside the mezzanine itself<sub>7</sub> and so the two part of the mezzanine can be controlled in parallel with different events.



Figure 22: (a) The logic distribution of the input data to the FPGA and to the AMchip for an half of mezzanine; (b) The definition of the connections between the output of the AM devices and the FPGA.

Field Code C

**Field Code C** 

Field Code C

# 3. The experimental setup

Figure 3 Figure 3 shows the test stand used to fully validate the design of the system and it consists of:

• a Xilinx Kintex Ultrascale FPGA KCU105 evaluation board [8] that provides the power, the LVDS connection and the High Speed Serial connections to the PRM mezzanine;

Field Code C

- the interface board to adapt the spatial mismatching between the FMC connectors;
- the PRM mezzanine.

The PRM is tested by using the evaluation board that has Ethernet connection thought the remote PC. We are used the IP bus system [9] to access the evaluation board and execute the different tests.



Figure <u>33</u>: The test stand for the Pattern Recognition Mezzanine.

These tests will be used at production time to validate the PRM before integrating them into the more complete tests executed in the ATCA crate interfacing with the Pulsar IIb. The serial links of the all FPGAs and of all the AM chips available on the PRM were tested by using this test stands.

The JTAG connection to all the four groups was tested in order to provide the possibility to configure and program the AM chips. Moreover the quality of all the 8 buses in input to each AM chips have been evaluated. A loopback card has been used to test the input/output links

between the KIntex-7 FPGA and the FMC connectors. The Pseudo Random Bit Sequences (PRBS) are used to test the links between the FPGA and the AM chips. The FMC connector links were tested by using a loopback card and the IBERT tool provided by the Xilinx that allows to measure the Bit Error Rate (BER) and the eye diagrams.

The signal integrity up to 8 Gbps has been tested by measuring the BER and the corresponding eye-scans in the receiver links of the PRM (Figure 4(a)) and in the receiver links of the Evaluation card (Figure 4(b)). The BER of the links was measured using a PRBS-7 sequence. The measured BER was less than  $2 \times 10^{-15}$  on the up and the down links.



Figure 4: Eye scans of one of the GTX lines running at 8 Gbps in both directions: (a) from the Kintex7 FPGA to the Ultrascale FPGA (b) and vice versa.

The serial links between the FPGA and the AM chips were tested using a serial data analyzer sending on the links a PRBS-7 sequence. The up links, see Figure 5, and the down links, see Figure 6, were successfully tested. In order to measure the BER of the links, the same PRBS-7 sequence was sent through the link for 8 h and no errors were measured, leading to a BER <  $8 \times 10^{-15}$  on the up and the down links.



Figure 5: Eye diagram of one of the GTX lines from FPGA to AM chip running at 2 Gbps.

Field Code C Field Code C

Field Code C



Figure 6: Eye diagram of one of the GTX lines from AM chip to FPGA running at 2 Gbps.

These tests will drive toward a next generation board design, targeting more powerful FPGA devices that allow us to increase the AM data transfer, diminish the power, and allow for even more reduced latency.

The power consumption of the board has been measured by powering and configuring the four groups of the AM chips one at a time. Table 1 summarizes the power consumption for different configurations of the PRM.

| Table <u>1</u> 4: | The | power | consumption | of | the | PRM | whit | the | configuration | of | the | different |
|-------------------|-----|-------|-------------|----|-----|-----|------|-----|---------------|----|-----|-----------|
| component         | S   |       |             |    |     |     |      |     |               |    |     |           |

| Configuration               | Power (W) |
|-----------------------------|-----------|
| FPGA programmed             | 27        |
| One set of 4AM configured   | 30        |
| Two set of 4AM configured   | 32        |
| Three set of 4AM configured | 34        |
| Four set of 4AM configured  | 36        |

# 4. Conclusion

The proposed processor for the L1 track trigger provides high computation power combining the Associative Memory and the FPGA technologies. It implements also the Data Organizer and Track Fitter stages to minimize the latency of the whole processing chain. The mezzanine board has been validated and it is ready to be integrated in the L1 tracking trigger system for demonstration purposes.

# Acknowledgments

This project has received funding from the European Union's seventh Framework Program for Research, Technological Development and Demonstration under Grant agreement no. 31744, and from the Progetto PRIN MIUR DM 28.12.2012 n.957.

# References

- [1] A. Annovi et al., Associative Memory for L1 Track Triggering in LHC environment, Nuclear Science, IEEE Transactions on, vol. 60, no. 5, pp. 3627–3632, Oct 2013
- [2] J. Adelman et al., The Silicon Vertex Trigger upgrade at CDF, Nuclear Instruments and Methods in Physics Research A 572 (2007) 361–364
- [3] F Alberti et al., Performance of the AMBFTK board for the FastTracker processor for the ATLAS detector upgrade, JINST 8,C01040, 2013
- [4] J Olsen et al. A full mesh ATCA-based general purpose data processing board, 2014 JINST 9 C01041
- [5] C. Gentsos et al. Future evolution of the Fast TracKer (FTK) processing unit, Proceeding of Science, TIPP2014, 209.
- [6] A. Andreani et al., Characterisation of an Associative Memory Chip for high-energy physics experiments, Instrumentation and Measurement Technology Conference (I2MTC) Proceedings, 2014 IEEE, Page(s) 1487 – 149, DOI: 10.1109/I2MTC.2014.6860993
- [7] Pagiamtzis, K. and Sheikholeslami, A. 2006. Content-addressable memory (CAM) circuits and architectures: A tutorial and survey, IEEE Journal of Solid-State Circuits. 41, 3, (Mar. 2006), 712-727. DOI=10.1109/JSSC.2005.864128
- [8] http://www.xilinx.com/products/boards-and-kits/kcu105.html
- [9] https://svnweb.cern.ch/trac/cactus