

Corso di formazione INFN: Introduzione alle Tecniche di Trigger e Data Acquisition in Esperimenti di Fisica





#### Introduction to Radiation Effects in FPGAs and Mitigation Techniques

Raffaele Giordano **University of Naples "Federico II" and INFN Sezione di Napoli, I-80126, Italy**

Email: rgiordano@na.infn.it

# **Outline**

- Basics of radiation effects in microelectronics
- Introduction to radiation hardening by design in digital circuits
- Configuration scrubbing in reconfigurable logic
	- Focus on SRAM-based FPGAs



Corso di formazione INFN: Introduzione alle Tecniche di Trigger e Data Acquisition in Esperimenti di Fisica





#### **BASICS OF RADIATION EFFECTS IN MICROELECTRONICS**

#### Radiation

- Radiation is the transfer of energy by means of particles (including photons)
	- For photons usually people consider E > UV rays (tens of eV)
- Radiation effects in electronics can happen in
	- Artificially generated radiation environments [e.g. at particle accelerators (protons, neutrons, ions, electrons, gammas…)]
	- Space (mainly protons, alphas, heavier ions, electrons)
	- Earth Atmosphere (atmospheric neutrons)
- If high -reliability is required, even at sea level there is a concern for atmospheric neutrons
	- Medical Applications, Automotive, Data Centers…







#### Effects of Radiation on Electronics

- Total Dose (TD) effects, *i.e.* build-up over time
	- Consequences: degradation of device characteristics, µ-electronics but also Si-based detectors, optical components
	- Can be related to ionizing & non-ionizing energy loss
- Single Event Effects (SEEs)
	- Related to the passage of a single particle
	- Can be destructive and non-destructive
	- Consequences: data corruption, system shutdown, recoverable or nonrecoverable malfunctioning, device destruction

# Effects of Radiation on Electronics (2)





### Effects of Radiation on Electronics (3)

- Energy deposit by ionization may cause
	- charge build-up in insulating layers (cumulative effect)
	- charge injection into sensitive nodes (single ionizing event effects)
- Energy deposit by atomic displacement from lattice sites may cause
	- accumulation of damage to lattice/bulk (cumulative effects)
		- crystal structure damage
		- introduction of trap states and mid-band states

#### Radiation Dose

$$
D=\frac{E}{m}
$$

- Dose is the energy deposited per unit mass
	- Important for cumulative effects
	- Measurement units
		- Gy = 1 J/kg and rad. 1 Gy=100 rad, 1 rad = 1 cGy
	- Sometimes the material which the dose refers to is specified [e.g. Gy(Si) or Gy(H<sub>2</sub>O)]
- Two main types of dose:
	- Total Ionizing Energy Loss Dose (TID)
	- Non-Ionizing Energy Loss (NIEL) Dose also called Displacement Damage Dose (DDD)

#### Fluence and Flux

• Fluence (uppercase Φ) is number of impinging particles per unit area

$$
\Phi = \frac{N}{S} \qquad \text{[particles/cm}^2 \text{ or cm}^{-2}\text{]}
$$

• Flux (lowercase φ) is the fluence per unit time, i.e. the fluence rate

$$
\Phi = \frac{\Phi}{\Delta t} = \frac{F}{\Delta t} \quad \text{[particles/(cm2·s) or cm-2·s-1]}
$$



# TID Effects: V<sub>t</sub> shift in MOSFETs

- Charge can get trapped in
	- the oxide (oxide trapped charge) and at the Si/SiO<sub>2</sub> interface (interface trapped charge)
- Charge in the oxide is always positive
- Sign of charge at the interface depend on  $\frac{3}{8}$ MOSFET type P- or N-MOS
	- Positive charge in PMOS, negative in NMOS
- Trapped charge work as a series voltage source and impact threshold voltage of the MOS

$$
\Delta V_{OT} = -\frac{Q_{OT}}{C_{ox}}
$$

$$
\Delta V_{IT} = -\frac{Q_{IT}}{C_{ox}}
$$



After, J. R. Schwank *et al*., doi: 10.1109/TNS.2008.2001040

- $\epsilon_{ox}$  = oxide capacitance per unit area
- $Q_{OT}$  = charge trapped in the oxide per unit area
- $Q_{IT}$  = charge trapped at the SiO<sub>2</sub>/Si interface per unit area

# TID Effects: V<sub>t</sub> shift in MOSFETs

$$
\Delta V_t = \Delta V_{IT} + \Delta V_{OT} =
$$
\n
$$
= -\frac{1}{c_{ox}} (Q_{IT} + Q_{OT}) = -\frac{t_{ox}}{\epsilon_{ox}} (Q_{IT} + Q_{OT})
$$
\n•  $\epsilon_{ox}$  = dielectric constant  
\n•  $t_{ox}$  = thickness of oxide  
\n• In the simplest model  $Q_{OT} \propto t_{ox}$  so  
\n $\Delta V_{OT} \propto t_{ox}^2$   
\n• At low doses  $Q_{OT}$  dominates, at higher  
\ndoses  $Q_{IT}$  dominates  
\n• NMOS  $V_t$  shifts not monotonically,  
\nPMOS  $V_t$  shifts always down

• Holes trapped in  $SiO<sub>2</sub>$  are not stable, they disappear (anneal) over times from milliseconds to years



#### Single Event Effects



- Cross section (σ) for SEEs is  $\sigma = \frac{N_{SEE}}{\Phi}$ Φ
- Can be plotted against LET (e.g. for ions) or vs energy (e.g. for protons or neutrons)
- A fit with a Weibull function can be performed

$$
\sigma(LET) = \sigma_{sat} [1 - e^{-\left(\frac{LET - LET_{th}}{W}\right)^{S}}]
$$

- W and S are fit parameters
- LET $_{\text{th}}$  threshold LET for the SEE
- $\sigma_{\text{sat}}$  saturation cross section



# Single Event Effects (2)

- Taking into account energies in space and terrestrial radiation environments
- SEE in CMOS devices are mainly induced
	- Directly by heavy ions (Z>1)
	- Indirectly by protons and neutrons on Si or dopants (n+<sup>10</sup>B-><sup>4</sup>He+<sup>7</sup>Li), Oxigen, other atoms in the device (secondaries produce ionization)
- Neutrons cannot induce SEEs directly since they are neutral
- Protons *generally* cannot induce SEEs directly  $-$  LET < .1 MeV $\cdot$ cm<sup>2</sup> / mg (too low)
- For E > 50 MeV, secondaries from p and n produce almost identical effects
- Electrons and muons *generally* cannot induce SEEs directly (LET too low) and neither indirectly (cannot induce nuclear reactions)

#### Proton Secondaries in Silicon

• Secondaries contribution to overall LET depends very much on primary proton energy



Fig. 2. LET versus energy of nuclei produced by protons in silicon.

Fig. 8. Contribution to LET spectrum as a function of atomic number for 50 MeV protons.

Fig. 10. Contribution to LET spectrum as a function of atomic number for 200 MeV protons.

After, D. M. Hiemstra and E. W. Blackmore, doi: 10.1109/TNS.2003.821811

#### Single Event Transient

- In MOS transistors in OFF state, drain and substrate are reverse-biased PN junctions
- Charges diffuse and drift in the semiconductor due to electric field
	- This motion results in voltage transients between drain and source
- NMOS more sensitive than PMOS (e mobility higher than h)



Ion track After A. Chugg, «SET Generation & Definition –Overview» After A. Chugg, «SET Generation & Definition –Overview»



#### Single Event Upset

- SETs may happen in memory elements (e.g. latches, flipflop, SRAM cell…)
- In this case they might be captured by logic feedback of the element and become persistent errors
- They can be cleared by refreshing the correct memory content
- There is a minimum critical charge  $(Q_c)$  to be collected for a SEU to happen



 $Q_c = (L_{th} t \rho q) / E_{pair}$ 

 $E_{\text{pair}}$  =3.6 eV in Si

R. Giordano - Corso INFN TDAQ 2023 - RHBD

#### Impact of Scaling on Radiation Effects

- Generally newer technologies scale to smaller feature sizes
- Scaling of MOSFETs make some of the LET-related effects more important
- A single ion might impact drain, channel, source or even more than one transistor
- Critical charge for SEUs becomes lower => higher cross section, more MBUs
- Oxides become thinner => Vt shift milder



S. DasGupta et al. , "Effects of Well and Substrate Potential Modulation on Single Event Pulse Shape in Deep Submicron CMOS", IEEE Trans. Nucl. Sci, vol.54, n. 6 Dec. 2007.

#### Multiple Bit Upsets

- In modern CMOS technologies (65nm and beyond) sizes of ion tracks are comparable to the feature sizes of transistors
- A track can trigger upsets in several devices and generate upsets in multiple memory elements
- This increases as the feature scales in planar technologies
	- e.g. in 28nm CMOS SRAMs up to 25% SEUs are MBUs
- Different behaviour can be for newer non-planar devices, such as FinFETs



#### Radiation Hardening Techniques

- In order to make a device sufficiently radiation hard for an application, it is possible to pursue one or more of the following approaches
	- Hardening by process
		- modifying one or more steps of the fabrication process (e.g. SOI, removal of B, limit O in Si)
	- Hardening by layout
		- modifying the device geometry (e.g. enclosed layout transistors)
	- Hardening by design
		- modifying the overall circuit or system (usually by means of some form of redundancy)





Corso di formazione INFN: Introduzione alle Tecniche di Trigger e Data Acquisition in Esperimenti di Fisica





#### **RADIATION HARDENING BY DESIGN IN DIGITAL CIRCUITS**

#### RHBD Techniques in Digital ICs

- Triple Modular Redundancy
	- Several flavours
- "Safe" Finite State Machines
- Error Correcting Codes
	- Storage or transmission

#### Triple Modular Redundancy (TMR)



- **Key concept: minimize the # of parts of the system which can cause an overall failure [e.g. single point of failures (SPOFs)]**
- Generate three replicas of the same circuit (module) and majority vote their outputs
- Can mask an error in a single replica (A0, A1 or A2)
- > 3x area/power overhead
- This is a flushable design, faults (e.g. SETs and SEUs) will go through the logic and the voter will mask them

#### TMR in Non-flushable Designs



- In general a design is not flushable
- A feedback loop may keep errors in the module
- Voting according to this strategy masks a single module but it does not correct errors

#### Logic Loop Voting



- aka Distributed TMR in Synplify synthesis tools
- Majority vote the replica outputs
- Errors in each module are not only masked, but also removed by voters in each module
- Higher area penalty w.r.t. to plain TMR, but more reliable

#### Connections Between Tripled Modules



- When two tripled modules are interconnected also the voters providing input to the next module should be tripled (or you generate a SPOF)
- When a tripled module drives a non-tripled module a single voter must be used (that is a SPOF!)
- Ideally tripling should continue up to the output pins of the integrated circuit



# High-fan-out Networks

- A permanent or transient fault in a high-fan-out network (HFN) (e.g. clock, reset) may affect several elements
- TMR schemes might be ineffective against it IN
- To mitigate this issue the entire HFN can be replicated



P. Hao and S. Chen, doi: 10.1109/TDMR.2017.2733218

#### Safe Finite State Machines

- Error detection and reset of the FSM (or go to a default state)
- "Safe-encoding" FSMs available in standard synthesis tools include additional logic to detect transitions to **invalid** states (effective for one-hot encoded FSMs)
- Dedicated high-reliability synthesis tools can implement safer FSMs capable of detecting also invalid transitions to **valid** states (also when using area-efficient binary and Gray coding)
- Error correcting codes (e.g. Hamming) on state word are normally used to address this issue



#### Error Correcting Codes

- Cyclic redundancy checks (only error detecting)
	- Calculate the remainder of polynomial division between data and a predefined polynomial
	- Long sequence of shift and XOR operations, very unlikely to get the correct remainder if data gets altered
- Hamming codes
	- Parity-based code, single error correction and double error detection (SECDED)
	- m data bits, require n parity bits where n satisfies  $2^n \geq m+n+1$
	- Logarithmic growing overhead
	- E.g. 11 data bits require 4 parity bits, 1000 data bits require 10 data bits
- Reed -Solomon *(n, k)*
	- Considers original data as k symbols (each symbol may consist of multiple bits) and generates n symbols (n>k)
	- $-$  Can correct a maximum of  $[(n-k)/2]$  of the n symbols
	- A single incorrect bit in a symbol makes the whole symbol incorrect



#### Hamming





### Synthesis Tools for High Reliability

- Are the mentioned mitigation techniques meant to be implemented manually?
- Rarely. Usually, the safest way is to use tools for automated high-reliability synthesis
	- E.g. Some commercially available tools for FPGAs are XTMR (Xilinx only, US only), Synopsys Synplify Premier, Mentor Precision Hi-Rel
- Be careful because sometimes tools have limitations for TMR on IP blocks (e.g. Synplify does not support black boxes)



**Precision<sup>®</sup> Hi-Rel** 

**SYNOPSYS Synplify Pro®** 

#### Remarks About TMR

- In general it
	- Increases area & power
	- Reduces frequency performance (it worsens timing budget)
- In order to make it effective replicas should have an independent failure probability
	- Physical separation of replicas is a good practice
- The designer should carefully assess the benefit in reliability before adopting it
	- Sometimes using it only on critical functions of the circuit could be a fair trade off between reliability, power, area and performance



Corso di formazione INFN: Introduzione alle Tecniche di Trigger e Data Acquisition in Esperimenti di Fisica





### **CONFIGURATION (SELF-)SCRUBBING IN FPGAS**

#### Field Programmable Gate Arrays

- FPGAs are arrays of configurable logic resources
	- ❑ Configurable Logic Blocks
	- ❑ Programmable Interconnect
	- ❑ Input Output Blocks
	- ❑ Block RAMs (few Mb), multiply and accumulate for DSP, high-speed transceivers (up to 58 Gbps per differential pair)
- <sup>◼</sup> All programmable features are determined by the content of memory cells, collectively known as "configuration memory"
- Widely used in scientific and industrial applications for real-time data processing and transfer





### Configuration Memory Types in FPGAs

- All FPGAs presently on the market are fabricated with CMOS processes
- The technology for the configuration memory can vary



#### FPGAs Vs Radiation



#### SRAM-based Field Programmable Gate Arrays

- Many devices are TD-tolerant and latchup free
	- e.g. Xilinx 7-Series withstand TD > 1 kGy
- Highest performance among FPGAs – LVDS up to 1.6 Gbps, SerDes up to 112 Gbps
- Always remember the configuration is the circuit!
- SEUs in configuration RAM might alter functionality and must be corrected
- Radiation-hardened FPGAs exist but they are aimed at space applications mostly, price can be as high as 50 k\$ per device! (e.g. Virtex-5QV)
- We refer to AMD (Xilinx) devices in this lecture



#### Programmable Interconnect

PIP

- Programmable routing is a key element of an FPGA
- An interconnection is realized by properly closing matrix or opening switches between predefined metal lines Switch
	- Programmable Interconnection Points (PIPs)
	- Switch matrices (arrays of PIPs)
- Typically ~ 90% of configuration bits determine routing! (excluding BRAMs)



Xilinx XC3000

# Configuration Memory in Xilinx FPGAs



- Configuration RAM is divided in rows
	- Horizontal slices along the device
- Each row is divided in columns – vertical slices within a row
- Columns configure different resources (CLB, routing, IO, etc.)
- Each column is divided in frames
- The frame is the minimum accessible element of configuration memory
	- In 7-Series a frame is 3232b long
- Number of frames in a column depends on type of resources configured by that column
	- In 7-Series column size few tens of frames (28-128)
- In Xilinx devices one row is high exactly as one clock region

#### Bitstream

- The interface to configuration is managed by R/W to dedicated registers
- Before and after proper frame data there is the need to supply commands for properly initializing these registers according to the needed behavior
	- Read/write configuration frames
	- Activate IO/interconnect
	- FPGA shutdown/start
	- CRC check
	- IDCODE verification
	- Dynamic elements masking in readback
- Collectively these commands and data are called bitstream
- A bitstream is generated by CAD tools and can be loaded through several interfaces
	- JTAG, SelectMAP, ICAP, BPI, SPI
- How to look in a bitstream?
	- <http://torc-isi.sourceforge.net/> supports up to Virtex-6



#### E.g. Xilinx 7-Series



#### Configuration Registers



#### TMR & Configuration Scrubbing

- Reliability  $(R(t))$  is the probability of correct operation in a time interval [0,t] assuming correct operation at t=0 , R(t=0)=1
- Let us suppose we want to estimate the reliability of a TMR system where the simplex reliability decays exponentially



- Mean time to failure in TMR systems ( $M$ TTF $_{TMR}$ ) is worse than MTTF for each module (MTTF  $_{simplex}$ )
- For TMR to be beneficial,  $R_{\text{simplex}}$  must be > 0.5
- Configuration repair (aka "scrubbing") can restore R(t)



#### SEU Accumulation Vs Power Consumption



- $\blacksquare$  Irradiation @ 5 Gy/min (Si)
- 5174 SEUs in 484 s (time to failure)
- <sup>◼</sup> 641 SEU/min => 128 SEU/Gy (Si)
- Total current increase 15mA => 2.8 uA/SEU
- 62-MeV proton irradiation of Virtex5 LX50T FPGA
- Accumulation of SEUs in the configuration leads to current increase
	- Possibly clashes on programmable routing
- Dynamic and static core currents exhibit very similar trends => due to quiescent current trend
- Reconfiguration partially reduces the current increase

#### External Vs Internal Scrubbing

- Configuration scrubbing can be performed in several ways
	- **Blind scrubbing**, i.e. overwriting configuration retrieved from "golden" (radiation-hardened) memory (simple and fast, but no information about upsets is gathered)
	- **Readback scrubbing**, configuration is read back and compared to the "golden" image, only in case of differences it is written back
	- More complex readback scrubbing methods based on error correcting codes on frame data, or even tripled configuration
- The **scrubber can be internal or external** to the device
	- If internal it is subject to SEUs in the FPGA, but the system is more compact
	- There are vendor-provided, ready-made solutions
- Scrubbers normally cannot correct errors in BRAMs, distributed RAM, flipflops



# The Soft Error Mitigation Controller

- Xilinx Soft Error Mitigation (SEM) controller
	- Internal scrubber
	- **not radiation-tolerant**!
	- designed for very low upset rate environments (e.g. Earth atmosphere)
	- Accesses configuration via the internal configuration access port (ICAP)
	- Hamming codes at frame level and CRC at device level
	- Log/control via UART
	- Ready-to-use IP, just select options and generate core



#### Table 2-5: Non-SSI: Max Error Correction Latency (100 MHz) No Throttling on Monitor Interface

#### Notes:

1. BFR is an error condition due to a multi-bit upset in an enhanced repair checksum stored in block RAM.

#### **SEM resource occupation**



#### TMR-based External Scrubbing



- Redundant FPGAs with identical bitstreams and operating on the same data
- Pros
	- TMR also for functionality
	- Simple to implement
	- Very high correction capability
	- Low repair latency
- Cons
	- TMR to generate redundant configuration
	- 3x cost and power
	- need for a radiation-tolerant voter

#### TMR-based Internal Scrubbing



J. Tonfat et al., IEEE Trans. Nucl. Sci., vol. 62, no. 6, pp. 3080–3087, Dec. 2015

- Redundant modules in the same FPGA
- Pros
	- TMR for configuration and functionality
	- Very high correction capability
	- Low repair latency
	- No additional devices (memories or redundant FPGAs)
- Cons
	- 3x dynamic power wrt unprotected design
	- Requires some strategy for generating identical layout (i.e. identical configuration) for the tripled modules
	- A possible tool for this is Rapidsmith rapidsmith.sourceforge.net

#### Example of Custom ECC-based Scrubbing



- Clustering frames and adding parity bits to each frame for error detection
- Erasure code for error correction in clusters
	- Wrong frame gets "erased" (i.e. ignored) and replaced by means of code calculation
- Pros:
	- Minimal power consumption increase
	- Low memory overhead
- Cons:
	- Repair requires readback of whole cluster, latency depends on cluster size
	- Correction capability depends on number of clusters (1 frame per cluster can be corrected)
	- Efficiency of code depends on device

P. M. B. Rao, M. Ebrahimi, R. Seyyedi, and M. B. Tahoori, "Protecting SRAM-based FPGAs against multiple bit upsets using erasure codes," in Proc. 51st Annu. Des. Autom. Conf., San Francisco, CA, USA, Jun. 2014, pp. 1–6.

# Redundant-configuration-based Scrubbing



- 1. Generate FPGA bitstream and identify used configuration frames
- 2. Replicate used frames and keep track of redundant and remaining empty frames
- 3. During FPGA operation, **majority vote** redundant frames for error detection and correction and **check** empty frames through ICAP
- Self-contained, no need for external memories
- Robust, can correct all the bits in a frame
	- ❑ Assuming no errors in homologous redundant bits
- <sup>◼</sup> Configuration replicas do not receive clock => *no additional dynamic power consumption*
- Approach portable on several FPGA families
- Drawbacks:
	- ❑ need to analyze bitstream
	- ❑ need at least 3x bigger device

R. Giordano, "Method for generating redundant configuration in FPGAs," PCT Application no. PCT/IB2018/060461, 2018, Dec. 20 R. Giordano et al. "Configuration Self-repair in Xilinx FPGAs," in IEEE Trans. on Nucl. Sci., vol. 65, no. 10, pp. 2691-2698, Sept. 2017. **Open Access**  <https://ieeexplore.ieee.org/document/8456573>

# On-the-fly Self-Scrubbing at System Level

- Example from Belle II Aerogel Imaging Ring Cherenkov detector
- Star read-out topology
- FEB FPGAs are programmed with the same bitstream => redundancy at system-level



- Parallel readback of FEB (Spartan-6) configuration from Merger (Virtex-5)
- Real-time 4-out-of-6 bitwise majority voting on JTAG streams (TDOs) for error detection
- Quick single frame reconfiguration for error correction
- Readout topology is widely used in DAQ systems, easilyexportable solution
- R. Giordano *,* doi: 10.1109/TNS.2021.3127446



#### Mean Fluence Before Failure





- $\Phi_b$  = proton fluence to failure for Benchmark
- TMR on the Benchmark improves  $\Phi_{b}$  by 6.5x (No Scrubber) and 2.4x (Scrubber B)
- Scrubber B increases  $\Phi_h$  by 3.9x w/out TMR and by 1.3x w/ TMR
- Scrubber D increases  $\Phi_h$  by 1.5x
- $\Phi_{s}$  = proton fluence to failure for Scrubber

#### Power Consumption



- Experimental results during 62-MeV proton irradiation
- When scrubbing is not active, upsets accumulate, and power consumption at core (1.0V) and auxiliary power supplies (1.8V) increases with fluence
- When scrubbing is active, different behaviors before and after the failure of the scrubber
	- Before: upsets are detected and removed, power consumption stable ( $\Delta I < 1$ mA)
	- After: upsets begin to accumulate, power consumption at core grows, eventually benchmark fails

### (TMR) Domain Crossing Errors

- When TMR domains share resources (interconnect or CLBs) single and multiple bit upsets can affect more than one TMR domain
- For instance, SEUs in SMs may corrupt more than one TMR domain
	- bridging between TMR domains
	- disconnecting global logic constants (GLCs)
- Experimental studies on FPGAs show that TMR can be defeated even by single bit upsets
- Routing is a concern
	- Virtex-II accelerator test showed 48% critical SEUs involved routing

H. Quinn et al. doi: 10.1109/TNS.2007.910870.



Switch matrix





Switch matrix



#### Placement and Routing Hardening

- Reliability-oriented place and route algorithms minimize SPOFs, optimized to
	- avoid placing logic pertaining to different domains in the same CLB
	- reduce # of PIPs
	- reduce # of switch matrices which mix TMR domains
	- do not route GLCs through switch matrices
	- complete module isolation
- Examples
	- Xilinx Isolation Design Flow (available in Vivado and ISE)
	- RoRA academic tool designed by PoliTo\*
- \*L. Sterpone, M. S. Reorda and M. Violante, doi: 10.1109/RME.2005.1543031.



#### Xilinx Isolation Design Flow

# Fault-injection Testing

- We have seen that configuration access ports (SelectMAP, JTAG, ICAP, etc.) can be used to repair configuration
- They can also be used to intentionally alter the configuration to emulate SEUs and become a tool for testing
- This makes it possible to probe the sensitivity of the circuit to upsets before performing irradiation tests
	- A limitation of this approach is that
		- SEUs cannot be injected in the CAP itself (no SEFIs)
		- SETs are not emulated
	- Example of fault-injectors: Xilinx SEM\*, FLIPPER\*\*
	- It is also possible to do it via JTAG by means of dedicated scripts (configuration user guide)

\*www.xilinx.com/video/fpga/seu-integration-test-by-error-injection.html





# Thank you !

#### Questions?



Backup







# Single Event Latch-up

- CMOS devices include a parasitic PNPN structure
- Normal operation: both BJTs do not conduct (Base of Q1 at  $V_{DD}$ and Base of  $\dot{Q}$ 2 at  $V_{\rm sc}$ )
- Current loop path between Q1 and Q2 (in BJT  $I_C = \beta I_b$ )
- If  $\beta_1\beta_2$ >1 injection of charge in the loop might trigger a selfamplifying current
- Low-impedance path between VSS and VDD may generate a permanent failure of the component
- This potentially destructive effect is called single event latch-up (SEL)
- Countermeasures can be taken by reducing  $R_w$  and  $R_s$  or introducing STIs or using SOI technologies



#### Shallow Trench Isolation

#### Parasitic pnpn structure





# Other Single Event Effects



#### Configuration-Redundant Self-scrubbing Circuits

- An internal scrubber can access configuration memory by means of internal configuration access port (ICAP in Xilinx devices)
- The scrubber is a critical module
	- If working improperly it can **damage** the circuit instead of fixing it
	- Its MTTF must be sufficiently high in the given radiation environment or its benefits might be marginal



R. Giordano et al. "Configuration Self-repair in Xilinx FPGAs," in IEEE Trans. on Nucl. Sci., vol. 65, no. 10, pp. 2691-2698, Sept. 2017. **Open Access**  <https://ieeexplore.ieee.org/document/8456573>

# Design Flow for FPGAs

**Design Verification** 

- Description of logic with hardware description languages and requirements with constraints
- Synthesis of logic to device primitives (CLBs, BRAMs, etc.)
- Placement and routing
- Verification of functionality at several intermediated steps
- Static timing analysis
- Generation of the bitstream



#### Layout Hardening

- As mentioned in NMOS devices, positive charge trapped in shallow trench isolation oxides activates parasitic paths at the side of the channel
- Enclosed layout does not have STIs around the channel
- Typical area overhead >2-4x, power overhead > 2x
- Example
	- DARE90U library from IMEC for UMC 90nm CMOS supports up to 3 kGy TID
	- https://dare.imec-int.com/technologies



After L. Ratti, Total Dose Effects in Electronic Devices and Circuits – Legnaro, April 13th 2011

#### DARE90 Enclosed Layout Transistor

After G. Thys

Microelectronics Presentation Days 2010



#### Redundant Configuration at System Level

- Case study: Aerogel Imaging Ring Cherenkov of Belle II detector
	- 420 Aerogel tiles read by Hybrid APDs (HAPDs), each HAPD read by one Spartan-6 FPGA
	- Spartan-6 45nm CMOS process Boron for P doping, sensitivity to thermal neutrons (25 meV)
	- 3.3kSEU/h expected in the FEB FPGAs @collider design luminosity





• Merger boards aggregate data from FEBs to on a highspeed link and manage FPGA configuration via JTAG