# INFN

# BROOKHAVEN NATIONAL LABORATORY

# **Radiation testing campaign results for** understanding the suitability of **FPGAs in detector electronics.**

<u>Alessandra Camplani<sup>(1)</sup></u>, Matthew Cannon<sup>(2)</sup>, Hucheng Chen<sup>(3)</sup>, Kai Chen<sup>(3)</sup>, Mauro Citterio<sup>(1)</sup>, Binwei Deng<sup>(4)</sup>, Chonghan Liu<sup>(4)</sup>, Tiankuan Liu<sup>(4)</sup>, Chiara Meroni<sup>(1)</sup>, James Kierstead<sup>(3)</sup>, Helio Takai<sup>(3)</sup>, Michael Wirthlin<sup>(2)</sup>, Jingbo Ye<sup>(4)</sup>.

<sup>(1)</sup>INFN Milan, Milan, Italy; <sup>(2)</sup>Brigham Young University, Provo,USA;

<sup>(3)</sup>Brookhaven National Laboratory, Upton, NY, USA; <sup>(4)</sup> Southern Methodist University, Dallas, USA.

# Introduction

SRAM based Field Programmable Gate Arrays (FPGAs) have been rarely used in High Energy Physics (HEP) due to their sensitivity to radiation. The last generation of commercial FPGAs based on 28 nm feature size and on Silicon On Insulator (SOI) technologies are more tolerant to radiation to the level that their use in front-end electronics is now feasible. FPGAs provide re-programmability, high-speed computation and fast data transmission through the embedded serial transceivers. They could replace custom application specific integrated circuits in front end electronics in locations with moderate radiation field. The use of a FPGA in HEP experiments is only limited by our ability to mitigate single event effects induced by the high energy hadrons present in the radiation field.

# Radiation environment in HEP experiments

Radiation background is due to a mixed-field of hadrons, electrons and photons. The expected background for the ATLAS Liquid Argon (LAr) calorimeter electronics is shown in Figure 1 and in Table 1 for Phase II run.

Radiation induced failures on electronics are tested in facilities:

With particle energy spectra similar to the expected HEP environment

At high rates to find Single Event Effects (SEE) with small cross sections Figure 2 shows the irradiation facilities used in the present study.

|                      | Simulation<br>(one year)                | Safety<br>Factor | Test Target*<br>(10 years)              |
|----------------------|-----------------------------------------|------------------|-----------------------------------------|
| lonizing<br>Dose     | 3.0 rad                                 | 10               | 100 krad                                |
| 1 MeV<br>eq. Neutron | 6.0 x 10 <sup>11</sup> cm <sup>-2</sup> | 2                | 1.2 x 10 <sup>13</sup> cm <sup>-2</sup> |
| Hadrons<br>(>20 MeV) | 8.5 x 10 <sup>10</sup> cm <sup>-2</sup> | 2                | 2 x 10 <sup>12</sup> cm <sup>-2</sup>   |









- Single event effects
- Total ionizing dose
- Displacement damage

# **FPGA Kintex 7**

Our experiments evaluate Xilinx Kintex-7 XCK7325T chip performance under radiation. Kintex 7 FPGA delivers high signal processing capability and low power consumption:

- 326K of logic cells
- 16 Mb of BRAM
- 400K User FFs
- 840 DSP Slices
- BRAM Built in Error Correcting Code (ECC)
- 1.0 V core voltage
- 16 GTX transceivers (MGTs)

Figure 3 - Xilinx Kintex 7 board.

Transceivers can operate at a range of 500 Mb/s - 12.5 Gb/s for a maximum operating bandwidth of 200 Gb/s.

Experiments performed using Kintex 7 Evaluation Board, Figure 3.



Table 1 - The radiation background in the ATLAS

How to provide radiation hardness

Mitigated using Triple Modular Redundancy (TMR) and scrubbing

LAr Barrel electronics

FPGAs are sensitive to Single Event Upset (SEU):

using majority vote on hardware outputs

almost always is coupled with scrubbing

single faults and many multiple fault combinations

Continuously monitor the configuration memory

Repair SEU within the memory in real-time

Defined as a change in the logic state of a cell

Non-destructive SEE

A scrubbing architecture:

Performs fault repair

TMR tolerates:

TMR is implemented by (Figure 4):

triplicating hardware resources

# Energy (GeV

output

output

Figure 1 - Simulated spectrum of particles for the ATLAS Liquid Argon calorimeter electronics.

Α

Figure 4 - Triple Modular Redundancy

technique.

input-

input

Energy (GeV)

Figure 2 - Comparison between the ATLAS LAr neutron spectrum with those from test facilities.

# **Parameters**

Rad-hard techniques must prevent:

- Build up of configuration errors in CRAM
- Errors that "breaks" SEC/DED code in BRAM
- Corruption on transmitted data
- Transmitter/receiver de-synchronization
- Parameters measured during the experiments are:
- FPGA Supply voltages and currents
- Configuration RAM (CRAM) cross section
- Block RAM (BRAM) cross section
- Data link failure rates
- Single Event Functional Interrupts (SEFIs)
- Single Event Latch-up (SEL)

# H4IRRAD results

H4IRRAD test area near the H4 beam line at CERN.

- One of the few mixed-field (hadrons and neutrons) test area available
- Attenuated primary 400 GeV/c proton beam from CERN SPS or secondary beam of 280 GeV/c proton beam over Cu target (1 m long, 7.5 cm diameter)

Large experimental area with difficult access to the Device Under Test (DUT), Figure 5.

- 50 hours of testing (1.8 x 10<sup>9</sup> hadrons)
- Large uncertainties on total particle fluence at DUT position
- Ionization dose deposited by hadrons not well known
- No current monitoring during the test

Estimated static cross section measurements results shown in Table 2.



Figure 5—H4IRRAD test facility

CRAM (cm<sup>2</sup>/bit) BRAM (cm<sup>2</sup>/bit)  $1.50 \times 10^{-14}$  $1.40 \times 10^{-14}$ H4IRRAD

Table 2. Static cross section measurements for

# **Neutron results**

Test performed at LANSCE - WNR (Los Alamos), max energy 800 MeV and at TSL (Uppsala), max energy 200 MeV. Wide neutrons spectrum similar to cosmic ray background. Over 30 hours of neutron testing (> $5.7 \times 10^{10}$  neutrons). Parameters under test BRAM and CRAM. Results shown in Table 3.

Mitigation strategies implemented on the DUT: TMR

Multi-level (internal and external) scrubbing No SEFI and no SEL observed during the experiment. Some preliminary tests on MGT data transmission:

- Two lanes tested at 5 Gbps
- Approximately 6 links failures observed
- Failures not correlated with error types

More details on MGT failures studied with proton tests.



Figure 6—LANSCE test facility

|        | CRAM (cm <sup>2</sup> /bit) | BRAM (cm <sup>2</sup> /bit) |
|--------|-----------------------------|-----------------------------|
| LANSCE | $6.89 \times 10^{-15}$      | 6.15 x 10 <sup>-15</sup>    |
| TSL    | 6.55x 10 <sup>-15</sup>     | -                           |

Table 3 - Cross section measurements for LANSCE and TSL



## H4IRRAD facility.

facilities

## **Proton results**

Two different experiments performed at The Svedberg Laboratory (TSL) in Uppsala, Sweden, with 180 MeV protons, to:

Re-measure the CRAM and BRAM cross section

Evaluate the performance of the GTX transceivers of a Kintex 7 FPGAs Proton beam is useful to simulate the complex radiation environment expected at the LHC accelerator. Protons deposit ionization dose and induce displacement damage.

In the first experiment, parameters under test were BRAM and CRAM. Results shown in Table 4 confirm the results obtained with neutrons.

|  |     | CRAM                     | BRAM                     | MGT Configuration             |
|--|-----|--------------------------|--------------------------|-------------------------------|
|  |     | (cm²/bit)                | (cm²/bit)                | Error* (cm <sup>2</sup> /bit) |
|  | TSL | 8.29 x 10 <sup>-15</sup> | 8.19 x 10 <sup>-15</sup> | BRAM x 400                    |

Table 4 - Cross section measurements for TSL facility \*Approximately 400 configuration bit used per lane

In the second experiment 13 bidirectional lane were tested with configuration scrubbing without TMR, Figure 7.

Events of interest identified during the experiment are (Table 5):

- Configuration Error: due to the circuitry surrounding the MGT and not a problem with the actual MGT
- Lane Error: caused by a MGT failure
- DUT Error: caused all 13 lanes to fail simultaneously and suggests that some form of global failure or (SEFI) is occurring
- Half of MGT errors could probably be prevented by applying common mitigation techniques to the circuit, such as TMR.

| Error Type         | Cross Section<br>(errors/lane)/(p/cm <sup>2</sup> ) |
|--------------------|-----------------------------------------------------|
| Config. Error      | 3.81 x 10 <sup>-12</sup>                            |
| Lane Error         | 2.51 x 10 <sup>-12</sup>                            |
| DUT Error          | 6.12 x 10 <sup>-13</sup>                            |
| Composite<br>Error | 7.27 x 10 <sup>-12</sup>                            |

Table 5 - Different error type cross section

| A modest increase in the core internal                   |  |  |  |
|----------------------------------------------------------|--|--|--|
| logic current, experienced during the                    |  |  |  |
| test is shown in Table 6.                                |  |  |  |
| The final power consumption is howev-                    |  |  |  |
| er within the component specification.                   |  |  |  |
| No significant functional errors were                    |  |  |  |
| observed after exposing the board to                     |  |  |  |
| $1.55 \times 10^{13}$ proton/cm <sup>2</sup> of fluence. |  |  |  |
| Ionization dose deposited during the                     |  |  |  |
| experiment exceeded 300 krad                             |  |  |  |
|                                                          |  |  |  |

| Figure 7 - Experimental setup at |
|----------------------------------|

TSL in Uppsala.

| internal             |                |        |
|----------------------|----------------|--------|
| ring the             | Current Increa |        |
| s howev-             | Day 1          | 7.64%  |
| ication.<br>Irs were | Day 2          | 17.89% |
| board to             | Day 3          | 21.07% |
| ce.                  | <b>T</b>     6 |        |

Table 6 - Current increase during the experiment

# **Cross section comparison**

|                    | CRAM (cm <sup>2</sup> /bit) | BRAM (cm <sup>2</sup> /bit) | Fluence (particle/cm <sup>2</sup> ) |
|--------------------|-----------------------------|-----------------------------|-------------------------------------|
| H4IRRAD* - Hadron  | $1.50 \times 10^{-14}$      | $1.40 \times 10^{-14}$      | 1.8 x 10 <sup>9</sup>               |
| LANSCE** - Neutron | 6.89 x 10 <sup>-15</sup>    | 6.15 x 10 <sup>-15</sup>    | 5.7 x 10 <sup>10</sup>              |
| TSL - Neutron      | 6.55 x 10 <sup>-15</sup>    | -                           | >5.7 x 10 <sup>10</sup>             |
| TSL*** - Proton    | 8.29 x 10 <sup>-15</sup>    | 8.19 x 10 <sup>-15</sup>    | 1.3 x 10 <sup>13</sup>              |

\*H4 normalization has an error of 50%

\*\*LANSCE normalization has an error of 18%

\*\*\*During proton irradiations more than ~300 krad of ionizing dose was deposited on the DUT

# **Conclusion and Outlook**

The results presented summarize a two years long study which is still on going.

Permanent operational failures were never observed up to the level of radiation to which the FPGA was exposed.

CRAM and BRAM cross section results indicate that TMR plus multi-level scrubbing are essential to mitigate SEU in Kintex 7 FPGAs.

The estimated MGT lane error rate suggests that the configurable logic which interfaces with MGT is the most sensitive part of the FPGA. However application of mitigation methods, not applied during our tests, could significantly reduce this sensitivity. Based on the results obtained in these experiments, in the full scale ATLAS LAr calorimeter system we could estimate a lane failure every 6.5 minutes, approximately. Future experiments are planned with improved TMR and scrubbing mitigation strategies to further reduce the present error rates.

We are working towards also implementing a test board in ATLAS to gather more realistic data.