## L1 Tracking Triggers for the High Luminosity ATLAS and CMS Trackers

Fabrizio Palla INFN Pisa

on behalf of ATLAS and CMS



# Why a Track Trigger at L1



- HL-LHC physics goals require excellent Trigger selectivity on basic objects (leptons, jets, taus, b-jets, MET)
  - This might be jeopardized by the increased level of pileup events (140 on average)
    - •Huge rate of  $\mu$  from heavy flavors  $\Rightarrow$  use better p<sub>T</sub> resolution from tracker •Prompt electrons at L1 need to be separated from huge  $\gamma \Rightarrow$  Tracker tracks
    - •High  $E_T$  jets from (many) different primary vertices  $\Rightarrow$  jet-vertex association
    - Photon isolation in Calorimeters compromised by large pileup  $\Rightarrow$  use tracks





F. Palla INFN Pisa



## The challenge and the way out





Fisica Nuclea

F. Palla INFN Pisa



# Trigger architectures

### PUSH path (CMS)

Reduced Tracker information readout at 40 MHz and then combined with calorimeter & muon at L1

Trigger objects made from tracking, calorimeter & muon inside a Global Trigger module

### PULL path (ATLAS)

Use calorimeter & muon detectors to produce a "Level-0" to request tracking information in specific regions

Tracker sends out information from regions of interest to form a new combined L1 trigger



stituto Nazionale li Fisica Nucleare





### The L0+L1 scheme

✓Level-0:

- Coarse calo and muon data
- Rate 40 MHz  $\rightarrow$  500 kHz
- Latency < 6.4  $\mu$ s
- Defines Region of Interest (ROIs) for L1

✓ Level-1:

- Tracker data only from ROIs
- Refined information from calo and muons
- Rate 500 kHz → 200 kHz
- Latency < 20  $\mu$ s

#### Issues for FTK to be used in Phase 2

- $\bigcirc$  the larger pileup (x2.5), rate (x5) and granularity
  - increase in the number of patterns by ~one order of magnitude • no  $p_T$  filtering - rise  $p_T$  threshold
- $\bigcirc$  need to cope with shorter latency (20  $\mu$ s instead of 200  $\mu$ s)

F. Palla INFN Pisa







### **ATLAS readout**



ATLAS Tracker for HL-LHC



z (m)



### ATLAS L0 and Regional Readout Requests (R3) implementations



L0 Trigger accept rate 500 kHz

- On a LO accept, copy data from primary to secondary buffer
- Identify "region of interest" (1-10% of the detector on each L0 accept)
- Generate a "Regional Regional Request" (R3)
  - Reading only ~10% of the Tracker data, the total bandwidth is only 50% more with the Track Trigger than without.
- To reduce the latency, a prioritization scheme is envisaged, by using a dedicated R3 buffer





Istituto Nazionale di Fisica Nucleare

### Simulation results - ATLAS





# Select only hits from "high-p<sub>T</sub>" tracks

Select "high-p<sub>T</sub>" tracks (>2 GeV) by correlating hits in 2 nearby sensors (stub)



- In the end-cap, it depends on the location of the detector
  - ➡ End-cap configuration typically requires wider spacing (up to ~ 4 mm)

 $\Delta z = \Delta R / tg 9$ 





# CMS 2S modules







### Electronics







## **CMS PS modules**



#### P(ixel)S(strip) module $\bigcirc$ strips = 100 $\mu$ m x 2.4 cm $\bigcirc$ pixels = 100 $\mu$ m x 1.5 mm • Pixels are logically OR-ed for finding coincidence in the r- $\phi$ plane, and the precise z-coordinate is retained in the pixel storage and provided to the power converter trigger processors. 2 W concentrator 200 mW 2 x 8 SSA chips 512 mW L1 Data Pipeline @ BX (40 MHz) GBT & Pixel/Strip Memory & @ L1 (1 MHz) 2 x 8 MPA opto package Up to 60 bit Data compression 12 bits 3004 mW 800 mW each row Output @160 MHz Interface To the **Trigger Logic for Stub** @ BX (40 MHz) **SSA** @ BX (40 MHz) Concentrator Finding Up to 80 bit chip concentrator 200 mW Trigger Logic



## Similar method in ATLAS



### ATLAS considering same approach

Solution Needs higher sensors' separation, though smaller pitch (75 $\mu$ m instead of



# Data organization and dispatch



 $\bigcirc$  Example CMS: 8(r- $\phi$ )x6(r-z) trigger sectors (some 10% overlapping)

Each sector ~200 stubs on average; tails up to ~500 stubs/event in 140 evts pileup+ttbar (to be compared with ATLAS-Phase 1 ~2000)

About 600 Gb/s per one trigger tower



Send data to Track-finding processors

### Full mesh ATCA shelfs

- Capable of "40G" full-mesh backplane on 14 slots = 7.2 Tb/s
- Several options being investigated, all include time multiplexing data transfer from a set of receiving processors boards to pattern recognition and track finding engines
- O(10) time multiplexed at the shelf level

• keep latency < 5  $\mu$ s, including pattern recognition and track fitting

Number of connections to trigger processors







### **Tower interconnections**







40G full-mesh backplane



# Pattern recognition and Track fit



Associative Memories (pattern recognition) + FPGA (track fit)

- ©CMS trigger sectors need ~1M patterns: only 8 state of the art AM06-chip
  - Higher I/O speed (currently 2Gb/s/layer) to reduce time multiplexing
- - Alternative methods under study (Hough Transform, Retina)
- Search Alternative approaches under study

Purely FPGA based







## Expected performances



# Pattern recognition efficiency ~99% Excellent track parameters resolutions





### Usage of L1 Tracks - ATLAS



Rate vs. tau finding efficiency curves for taus from the decay of a 120 GeV Higgs boson for the inclusive tau trigger at  $7x10^{34}$  cm<sup>-2</sup> s<sup>-1</sup> for different track multiplicity and minimum track p<sub>T</sub> requirements.

The bands show the rate vs. efficiency parameterized for different L1 cluster  $E_T$  thresholds, shown as the small numbers next to the corresponding points on each band.



No Trk: 20 kHz, > 76 GeV  $\Rightarrow$  10% effcy.



### Usage of L1 Tracks - CMS





Matching Drift Tube trigger primitives with L1Tracks: large rate reduction:

> 10 at threshold > ~ 14 GeV. Normalized to present trigger at 10 GeV. Removes flattening at high  $P_t$ 



Rate reduction brought by matching L1 e/ $\gamma$  to L1Track stubs for  $|\eta| < 1$ . Red: with current (5x5 xtal) L1Cal granularity. Green : using single crystal-level position resolution improves matching



# **CMS Gains from Track Trigger**



Preliminary simulation studies demonstrate addition of L1 tracking trigger provides significant gains in rate reduction with good efficiency for physics objects. Note these results are "work in progress".

| Trigger,<br>Threshold         | Algorithm                                          | Rate reduction                                                          | Full eff. at the plateau           | Comments                                                           |
|-------------------------------|----------------------------------------------------|-------------------------------------------------------------------------|------------------------------------|--------------------------------------------------------------------|
| Single<br>Muon,<br>20 GeV     | Improved Pt, via<br>track matching                 | ~ 13<br>(  η  < 1 )                                                     | ~ 90 %                             | Tracker isolation may help further.                                |
| Single<br>Electron,<br>20 GeV | Match with cluster                                 | > 6 (current granularity)<br>>10 (crystal granularity)<br>(   η   < 1 ) | 90 %                               | Tracker isolation can<br>bring an additional factor<br>of up to 2. |
| Single<br>Tau,<br>40 GeV      | CaloTau – track<br>matching<br>+ tracker isolation | O(5)                                                                    | O(50 %)<br>(for 3-prong<br>decays) |                                                                    |
| Single<br>Photon,<br>20 GeV   | Tracker isolation                                  | 40 %                                                                    | 90 %                               | Probably hard to do much better.                                   |
| Multi-jets,<br>HT             | Require that jets<br>come from the same<br>vertex  |                                                                         |                                    | Performances depend a lot on the trigger & threshold.              |



### Pros and cons



#### CMS (based on push architecture)

Pros:

 $\blacksquare$  only a fraction of the tracker data readout -  $p_T$  filtering

~200 stubs/sector - 48 sectors

Iarge flexibility to use tracks in Global Trigger (including MET)

Cons:

readout of tracker (trigger) data at 40 MHz

ATLAS (based on pull architecture)

Pros:

only portions of tracker data readout at 500 kHz

Cons:

 $\blacksquare$  hits from low  $p_T$  tracks not filtered: a problem for pattern recognition

Phase 1 ~200-500 hits/sector in 64 sectors

Phase 2: increase by a factor ~10  $\Rightarrow$  larger data rate

Simprovements limited only to few "objects" ( $\mu$ , E/ $\gamma$ ,  $\tau$ , jets) with LOA compliant rates



#### INFN interests:

**OCMS** 

Electronics of the PS modules (Pv)

●L1 track finding (Fi, Pd, Pi, Pg, Ts)

Electronics (Bo, Ge, Mi)

Mechanics (Ge, Mi)

●L1 track finding (LNF, Mi, Pi, Pv)

L1 track finding INFN (see A. Annovi's talk)

©CMS + ATLAS

Tracking algorithms (PCA and retina) simulation and firmware development

FMC fabrication (includes AM procurements)

ATCA and DAQ development

New AM chip developments

stituto Nazionale li Fisica Nucleare

# ollaborations and related projects



### Collaborations and related projects on L1 Track finder

- On-going collaborations
  - ●AM chip:LPNHE, IMEC
  - ●ATCA: FNAL, Kalrsruhe, Lyon, Nortwestern
  - Simulation: LPNHE, Lyon, FNAL, Kalrsruhe , Lyon, Nortuestern, UCL, Uppsala, Purdue, Cornell, CERN, India
- On-going projects
  - ●FP7-PEOPLE-2012-IAPP: P. Giannetti
  - ●PRIN 2012: H-TEAM: G. Tonelli
  - ANR:(LPNHE, IPNL, Lyon)
  - ●FP7-PEOPLE-ITN INFIERI: F. Palla
- Section Future applications for funds
  - SIR, ERC, Pillar II (ICT-4)



# Conclusions



Tracker detector helps drastically to reduce the rate of uninteresting events at L1

- Several trigger architectures exploited
  - On-detector data reduction using pT-modules (CMS), L0 pretrigger (ATLAS)
  - Implications on Tracker detector layouts ongoing
- Some demonstrators being built to validate the full chain
- Large gains in combining tracking with other subdetectors
   Electrons, Muons, Jets and MET





## ATLAS Fast Track - AM based

### FTK: Fast TracK processor - 20

# Uses Associative Memory (AM) approach

1. Pattern recognition: using coarse resolution with AMs





3.Fit tracks using high-resolution hits with FPGAs. Linear approximation, (instead of full helix) with pre-computed constants





stituto Nazionale li Fisica Nucleare



# **R&D Topics: Trigger**



- Increase of rate from Level-0 to HLT to read out
  - Absolute rate & balance between levels
  - L1 complexity vs. HLT input rates
- L1 Trigger Latency
  - When which is needed & consequences on electronics
- Section 2018 Secti
  - Associative Memories
  - Study techniques: sharpen  $p_T$  threshold, e- &  $\mu$  ID, Isolation, primary vertex for jets, multi-object triggers, possibility of pixel b-tag.
  - Interplay with tracker design
- Solution Interconnects and the set of the se
- New packaging & interconnect technologies
  Output
  Outpu
- Solution States and the set of FPGAs in L1 Trigger





### Increased bandwidth ~12 due to

- whe larger pileup (x2.5)

### Simplications

- @replace DF using an ATCA based system
- Sincrease the lower pT threshold from 1 to 2 GeV
- Sincrease in the number of patterns by ~one order of magnitude
- Sincrease the speed of the processing to cope with the shorter latency



# AM working principle



Final chip (AM06) to be submitted by mid-2014 128k patterns/8 layers 100 MHz clock frequency Serial bus stituto Nazionale di Fisica Nucleare



### The pattern



#### $\rightarrow$ Superstrip definition:



 $\rightarrow$  A superstrip is simply a bunch of strips in one module of the tracking detector.

 $\rightarrow$  The superstrip address is the info sent to the AM board. Is is coded on a certain number of bits, depending on the superstrip resolution.

| Superstrip enco         | oding               | Generic superstrip address definition                     |                          |  |
|-------------------------|---------------------|-----------------------------------------------------------|--------------------------|--|
| Z module                | Z inner             | ∳ module                                                  | φ inner                  |  |
| Z segment <i>(1bit)</i> |                     | strip tracker module example<br>∳ for 3.2mm pitch (5bits) |                          |  |
| 2 4 8 16                | 32 2 2 4            | 8 2 4                                                     | 8 16 32 64               |  |
|                         | <del>م</del> module |                                                           | 6mm pitch <i>(6bits)</i> |  |

 $\rightarrow$  The encoding is divided into 4 parts, giving module and intra-module SS position in Z and  $\phi$  direction (*R is not necessary*)

 $\rightarrow$  We are not using pixel info yet, so our Z intramodule encoding is very basic for the moment.



### Electronics







# Track fitting - high quality helix parameters and $\chi^2$



Principal component analysis (Other techniques under consideration)
 Over a narrow region in the detector, equations linear in the local silicon hit coordinates give resolution nearly as good as a time-consuming helical fit.

Nucl.Instrum.Meth.A623:540-542,2010 doi:10.1016/j.nima.2010.03.063

$$p_i = \sum_{j=1}^{14} a_{ij} x_j + b_i$$



•pi's are the helix parameters and 2 components.

 $\boldsymbol{\cdot} \boldsymbol{x}_{j} \boldsymbol{\cdot} \boldsymbol{s}$  are the hit coordinates in the silicon layers.

•a<sub>ij</sub> & b<sub>i</sub> are pre-stored constants determined from full simulation or real data tracks.

•The range of the linear fit is a "sector" which consists of a single silicon module in each detector layer.

•This is VERY fast in FPGA DSPs.

### $\bigcirc$ ~3000 fitting engines/trigger sector for CMS



### CMS: toward a demonstrator







Each board is capable to receive data from up to 48 modules at 3.25Gbps, with total 156 Gbps per Board/RTM. 8 boards can receive up to 384 modules (one trigger tower worth)

The input data is then divided into 4 time slices, each slice is sent to 1 of 4 Pattern Recognition board, with 40Gbps full-mesh (4x40=160Gbps > 156Gbps).

Each Pattern Recognition board receives  $\underline{up}$  to 8 x 40Gbps = 320 Gbps input data over full Mesh backplane. The events can then be time multiplexed on board for each mezzanine to handle (x1, x2, x4 possible, flexible). Each board send out its output from RTM to next stage for each time slice. Also communicate with other boards in other crates for data sharing in phi & eta for each time slice

# First ideas on Pixel@L1at CMS



### Considering to use 3<sup>rd</sup> generation pixels detectors to trigger

- ●65 nm chip, 2x2 cm<sup>2</sup>: output bandwidth ~3 Gbps
- Main problem is the bandwidth (~1 GHz/cm<sup>2</sup> hits)
  - Need a L0 trigger and a clusterization algorithm on chip.





### Current involvement CMS

#### Electronics R&D

- See FE Asics: UK, France, Italy, CERN
- MPA assembly, including TSV: CERN and US
- Module design and assembly: CERN, Germany, UK, US
- Mechanics: CERN, Germany, France, US
- DAQ: CERN, Germany, France, UK, India
- ✓L1 Track finding: CERN, <u>Italy</u>, France, UK, US
- ♀L1 track finding (conveners: F. Palla, T. Liu)
  - Simulations (both at High and Low Level): <u>INFN</u>, FNAL, Lyon, Cornell, RAL, KIT
  - Hardware demonstrator (see Annovi's talk): <u>INFN</u>, FNAL, Lyon, Cornell, Northwestern
    - INFN Interests
      - Tracking algorithms (PCA and retina) simulation and firmware development
      - FMC fabrication (includes AM procurements)
      - ATCA development

stituto Nazionale li Fisica Nucleare



### ATLAS & CMS Triggered vs. Triggerless Architectures



### 1 MHz (Triggered):

Network:

1 MHz with ~5 MB: aggregate ~40 Tbps

Links: Event Builder-cDAQ: ~ 500 links of 100 Gbps

Switch: almost possible today, for 2022 no problem

When the second second

General purpose computing: 10(rate)x3(PU)x1.5(energy)x200kHS6 (CMS)

Factor ~50 wrt today maybe for ~same costs

Specialized computing (GPU or else): Possible

### **40** MHz (Triggerless):

Setwork:

40 MHz with ~5 MB: aggregate ~2000 Tbps

Event Builder Links: ~2,500 links of 400 Gbps

Switch: has to grow by factor ~25 in 10 years, difficult

Section Front End Electronics

Readout Cables: Copper Tracker! – Show Stopper

HLT computing:

General purpose computing: 400(rate) x3(PU)x1.5(energy)x200kHS6 (CMS)
Factor ~2000 wrt today, but too pessimistic since events easier to reject w/o L1

This factor looks impossible with realistic budget

Specialized computing (GPU or ...)

Could possibly provide this ...