













## STRATEGIERIERADE F.Pastore (Royal Holloway Un. of London) francesca.pastore@cern.ch

Scuola INFN "Introduzione alle Tecniche di Trigger e Data Acquisition in Esperimenti di Fisica" Napoli, 12/10/2023

#### THE CONTENTS OF THIS LECTURE

- → Triggering e data taking a LHC
- → Strategie per il futuro High-Lumi LHC
- → Quattro esperimenti, quattro differenti approcci e sviluppi di architetture TDAQ
- → E qualche esempio interessante



# IL TRIGGER E LA PRESA DATI AD LHC

TDAQ for large discovery experiments



#### LHC ENGINE AND ITS CHALLENGES



 $E_{cms} = 14 \text{ TeV}$   $L = 10^{34} / \text{cm}^2 \text{ s}$ BC clock = 40 MHz

Search for rare events overwhelmed in abundant low-energy particles

#### Three major challenges for T/DAQ

- Face High Luminosity:
  - → fast electronics, to resolve in time
  - → fine granularity detector, to resolve in space → high data volume
- **→** Search for rare physics:
  - high rejection or large data collection
- Be radiation resistant:
  - → very costly for electronics → survive up to 100 Mrad = 1 MGy

#### LHC DATA DELUGE

p-p collisions  $E_{cms} = 13-14 \text{ TeV}$   $L = 10^{34} / \text{cm}^2 \text{ s}$  BC clock = 40 MHz



- High Luminosity with collisions close in time and space (1 collision/25ns)
  - abundant data in time and space
- Search for rare physics from hadronic collisions:
  - to store all the possibly relevant data is UNREALISTIC and often UNDESIRABLE
- Three approaches are possible:
  - Reduce the amount of data (packing and/or filtering)
  - Have faster data transmission and processing
  - Both!

#### LHC BECOMING IMPRESSIVELY LUMINOUS

European Council (2014): "CERN is the strong European focal point for particle physics in next 20 years"

#### LHC / HL-LHC Plan





⇒ Experiments go beyond design specifications (1x10³⁴/cm²s) and need upgrade as well, to improve or at least maintain the design performance

#### READOUT AND DAQ THROUGHPUTS

$$R_{DAQ} = R_T^{max} \times S_E$$



more channels, more complex events

As the data volumes and rates increase, new architectures need to be developed

#### **UPDATED FIGURE!**



#### LOOKING FOR MORE DATA IN THE FUTURE



#### RECAP ON T/DAQ SYSTEMS AND SCALING



- **→** More Rate **→** More buffers
- → More channels → Parallelism → Segmented Readout (and trigger)
- → More Front-end elements → Multiple processing units (local data)
  - → Decouple storage from processing unit (PU) → Data collection
- **→** Extend trigger latency **→** Multi-level trigger
- → Avoid dead-time and back-pressure → Dataflow control

#### MANY PLAYERS, COMPLEX TDAQ ARCHITECTURES

#### Buffering and parallelism

#### Maximum 1-2% deadtime



Level-1

Readout Buffers



High speed electronics



Readout links and buffering

#### **Level-1 triggers**

- → Set max Readout rate
- Hardware, synchronous
- Readout parallelism
- → Latency ~ µsec/event

Readout

L1/Readout

**Event building** 

**Event filtering** 

Petabyte archive

SWITCH NETWORK

Computing Services

Large data network with dedicated technology

**Dedicated PC farms** 

DAQ

HLT/DAQ

#### **Higher level triggers**

- Set max storage rate
- Software, asynchronous
- Event parallelism
- → Latency < 1 sec/event</p>

11

#### LEVEL-1 TRIGGER REQUIREMENTS





- Full synchronisation at 40 MHz (LHC clock)
  - ➤ large optical time distribution system
- → Synchronous: pipeline processing (at fixed latency)
- **→** Low latency (fast processing and high speed links)
- → Scalable
- **→** Massively parallel
- **→** Bunch Crossing identification capability

| ALICE | No pipeline |
|-------|-------------|
| ATLAS | 2.5 μs      |
| CMS   | 3 μs        |
| LHCb  | 4 μs        |

**GLOBAL TRIGGER** 

#### Fast, robust electronics

Latency dominated by cable/transmission delay

#### HLT/DAQ REQUIREMENTS



- **→** Robustness and redundancy
- **⇒** Scalability to adapt to Luminosity, detectors,...
- → Flexibility (10-years experiments)
- **→** Based on commercial products
- **→ Limited cost**

Prefer use of PCs (linux based), Ethernet protocols, standard LAN, configurable devices

- PC farms on networks for Event Building and Event Filtering (HLT)
  - ➤ farm processing: one event per processor (larger latency, but scalable)
  - additional networks regulates the CPU assignment and traffic





## COME PREPARIAMO L'HIGH LUMINOSITY

What about ... tomorrow?



#### ONE EVENT AT HIGH-LUMINOSITY (L=7.5X10<sup>34</sup> /CM<sup>2</sup>/S)



- → 200 collisions per bunch crossing (any 25 ns)
- → ~ 10 000 particles per event
- → Mostly low p<sub>T</sub> particles due to low transfer energy interactions



Physics program for the future is towards more rare processes at the same energy scale



fL-LHC tt event in ATLAS ITK at <μ>=200

#### WHAT DO YOU EXPECT FOR THE FUTURE?



#### WHAT DO YOU EXPECT FOR THE FUTURE?



#### ADDITIONAL COMPLICATION AT HL-LHC

#### Luminosity x10, complexity x100: we cannot simply scale current approach

#### x10 higher Luminosity means...

- **→** More interactions per BC (pile-up)
  - Less rejection power (worse pattern recognition and resolution)
  - → Larger event size
- → Larger data rates:
  - → FE readout rate @L1: 0.1 → 1 MHz
  - → DAQ throughput:
    1 → 50 Tbps

ATLAS/CMS numbers

#### But cannot...

- **→** Increase trigger thresholds
  - → Need to maintain physics acceptance
- **→** Scale dataflow with Luminosity
  - → H/W: more parallelism more links more material and cost
  - → S/W: processing time not linear ~ L





**Trigger-less DAQ** 



High performance farms

**Triggering detectors** 

What we do?

Trigger-less DAQ

high detector granularity

Tension between TDAQ architecture and FE complexity

High performance farms

refine calibrations, as offline

**Triggering detectors** 

complex ASIC logic

What we do?

How?

Trigger-less DAQ

readout

high detector granularity

high speed electronics/links

Tension between TDAQ architecture and FE complexity

High performance farms

refine calibrations, as offline

large buffers, long latency

**Triggering detectors** 

complex ASIC logic

trigger-driven design

What we do?

How?

**Example** 

Tension between TDAQ architecture and FE complexity

**Trigger-less DAQ** 



high detector granularity

high speed electronics/links

R&D on detectors Front-End

High performance farms

refine calibrations, as offline

large buffers, long latency

tight: offline=online (LHCb, ALICE)

soft: decouple trigger/DAQ (ATLAS, CMS)

**Triggering detectors** 

complex ASIC logic

trigger-driven design

hardware track trigger (CMS)

LHCP-2022

#### THE REAL-TIME ADVENTURE





#### TRENDS: COMBINED TECHNOLOGY









Multicore Processors

Nvidia GPUs: 3.5 B transistors



GPUs\*

Virtex-7 FPGA: 6.8 B transistors



**FPGAs** 

(\*) Access to the nVIDIA® GPUs through the CUDA and CUBLAS toolkit/library using the NI LabVIEW GPU Computing framework.



The right choice can be combining the best of both worlds by analysing which strengths of FPGA, GPU and CPU best fit the different demands of the application

#### **GENERAL TDAQ TRENDS**

#### Use COTS for network .... and processing



#### → Deal with dataflow instead of latency

- **→ decouple** DAQ from High Level Triggers
- decouple dataflow from storage, with temporary buffers

#### → Use networks as soon as possible

- toward commercial bidirectional point-tomultipoint architecture
- → Increase data aggregation at the Event Building
  - reducing request rates on DAQ software
  - per time-frame, per orbit instead of perevent



#### EVOLUTION OF PROCESSING POWER TO BREAK WALLS



- CPU frequencies are plateauing
- Local memory/core is decreasing
- Number of cores is increasing
  - → Exploiting CPU h/w, with more complicated programming
    - → Vectorisation, low-level memory...
  - → Multithreading processing
    - → To reduce memory footprint
  - **→** Use of co-processors:
    - → High Performance Computing (HPC) often employ GPU architecture to achieve record-breaking results!
  - **→** Examples in LHC experiments:
    - → data reduction (ALICE & LHCb)
    - → trigger selection (<u>CMS & ATLAS</u>)

This requires fundamental re-write/optimization of our software

Data Source: https://github.com/karlrupp/microprocessor-trend-data

2010

2020

2000

1970

1980

1990

Year

Read: HPC computing

### QUATTRO ESPERIMENTI A CONFRONTO

How to maximise physics acceptance



#### LHC EXPERIMENTS FOR A DISCOVERY MACHINE

#### Goal: explore TeV energy scale to find New Physics beyond Standard Model



3000

Proposed: 1992, Approved: 1996, Started: 2009

#### LHC EXPERIMENTS FOR A DISCOVERY MACHINE

Goal: explore TeV energy scale to find New Physics beyond Standard Model



Proposed: 1992, Approved: 1996, Started: 2009

#### DIFFERENT PHYSICS SEARCHES

.... and LHC operations

- ◆ ATLAS/CMS: p-p collisions at full Luminosity
  - search in high energy scale
- LHCb: p-p collisions at reduced Luminosity
  - search complex topologies of b-quark decays
- ALICE: heavy-ion collisions
  - ~2000 mb
  - search in high energy density



- → Expected rates and S/B ratio
- **→** Signal topology and complexity
- ⇒ Size of event (number of channels, particle multiplicity)



#### ENHANCED TRIGGER SELECTIONS



- → ATLAS/CMS: Trigger power: reducing the data-flow at the earliest stage
- → ALICE/LHCb: Large data-flow: low trigger selectivity due to large irreducible background

#### **COMPARING BY NUMBERS**

LHC experiments share the same CERN budget for computing resources, which is the constrain between trigger and DAQ power

Allowed storage and processing resources





# ATLAS AND CMS

Studying the Standard Model at the high energy frontier



#### ATLAS/CMS TRIGGER STRATEGY





Search in high-energy scale

- → Discover large mass particles through their <u>high-energy</u> products
- → Discovery = inclusive selections

$$\frac{\textit{everything}}{\textit{Higgs}} = \frac{\sigma_{tot}}{\sigma_{H(500\,\mathrm{GeV})}} \approx \frac{100\,mb}{1\,pb} \approx 10^{11}$$

approximately 106 rejection

- ⇒ Easy selection of high-energy leptons over background ==> @L1
  - → Against thousands of particles/collisions (typically low momentum jets)
- → Remember: 90M readout channels and full Luminosity ==> 1 MB/event

#### ATLAS & CMS DESIGN PRINCIPLES





#### Same physics plans, different competitive approaches for detectors and DAQ

→ Same trigger strategy and data rates

#### 1 MB \* 100 kHz= 100 GB/s readout network



inclusive trigger selections

- → Different DAQ architectures
  - → ATLAS: minimise data flow bandwidth with multiple levels and regional readout
  - → CMS: large bandwidth, invest on commercial technologies for processing and communication



#### ATLAS/CMS: HLT/DAQ REQUIREMENTS



#### Final storage and processing resources (at Tier0) allow order of few GB/s output

Evolved from 1GB/s to current almost 5GB/s



#### **Network and Farm size**

- 1MB/event at 100kHz for O(100ms) **HLT latency** 
  - Network: 1MB\*100kHz = **100GB/s**
  - HLT farm: 100kHz\*100ms = O(10<sup>4</sup>) CPU cores
- Can add intermediate steps (level-2) to reduce resources, at cost of complexity (at ms scale)

See S.Cittolin, DOI: 10.1098/rsta.2011.0464

# CMS: 2-STAGE EVENT BUILDING IN RUN 1





## **NETWORK EVOLUTION**

#### (Run 1: 100 GB/s network)

# Myrinet widely used when DAQ-1 was designed

- → high throughput, low overhead
- direct access to OS
- → flow control included
- new generation supporting 10GBE

#### Run 2: 200 GB/s network

- → Increased event size to 2MB
- → Technology allows single EB network (56 Gbps FDR Infiniband)
- → Myrinet —>10/40 Gbps Ethernet



**Choose best prize/bitps!** 

## EVOLUTION FROM RUN-1 TO RUN-2







100 kHz L1 rate L1 rate Myrinet 1 Gb/s Ethernet 100 GB/s 8 slices 13000 core, **CMS DAQ 1** 1260 host

filter farm

max. 1.2 GB/s to storage

Event size up to 2MB 100 kHz



CMS DAQ 2

16000+ core, 900 host filter farm

~ 3-6 GB/s to storage

# ATLAS: REGION OF INTEREST (ROI) DATAFLOW



HLT selections based on <u>regional readout and reconstruction</u>, seeded by L1 trigger objects (Rol)



**Rol=Region of Interest** 

- → Total amount of Rol data is minimal: a few % of the Level-1 throughput
  - one order of magnitude smaller readout network ...
  - ... at the cost of a higher control traffic and reduced scalability

#### ATLAS REGIONAL TDAQ ARCHITECTURE



Overall network bandwidth:  $\sim 10 \text{ GB/s}$  (x10 reduced by regional readout)



complex data router to forward different parts of the detector data, based on the trigger type

# TRACK-TRIGGER IS KEY FOR HL-LHC (RUN 4)





Silicon tracking systems provide incredibly high resolution, crucial for controlling rates









#### Tracking challenges

- Readout ~800M channels, ~50 Tbps
- Combinatorics (10<sup>4</sup> hits/BC)

#### combinatorics scales like L<sup>N</sup>

L=luminosity, N=number of layers

Tracking reconstruction not feasible @40MHz, nor in few microseconds





#### stubs in CMS PT modules



# LHCb, THE B-MESON OBSERVATORY

The lightest experiment to study the heavy b-quark

http://lhcb-public.web.cern.ch/lhcb-public/



## LHCB DESIGN PRINCIPLES



- → Precision measurements and rare decays in the B system
  - → Large production ( $\sigma_{BB}$ ~500 µb), but still  $\sigma_{BB}$ / $\sigma_{Tot}$  ~ 5x10<sup>-3</sup>
  - → Interesting B decays are quite <u>rare</u> (BR ~ 10<sup>-5</sup>)





- → Single-arm spectrometer and low L → reduced event size
- → Selection of B mesons → search for B-decay topologies
  - → related to high mass and long lifetime of the b-quark



# LHCB TRIGGER STRATEGY





40 MHz bunch crossing rate



LO Hardware Trigger: 1 MHz readout, high  $E_T/P_T$  signature

450 kHz

400 kHz μ/μμ 150 kHz e/v



**Software High Level Trigger** 

29000 Logical CPU cores

Offline reconstruction tuned to trigger time constraints

Mixture of exclusive and inclusive selection algorithms

J





5 kHz (0)3 GB/s) to storage

2 kHz Inclusive Topological 2 kHz
Inclusive/
Exclusive
Charm

1 kHz Muon and DiMuon

#### **Low input rate and occupancy**

- Limited acceptance: 10 MHz
- **→** Limited Luminosity =2 x 10<sup>32</sup>cm<sup>-2</sup>s<sup>-1</sup>
- Select Bs in hadronic triggers
- Reject complex/busy events

60kB \* 1MHz= 60 GB/s readout network

Multitude of exclusive selections

# SCHEMA EVOLUTION





40 MHz bunch crossing rate



LO Hardware Trigger: 1 MHz readout, high E<sub>T</sub>/P<sub>T</sub> signatures

450 kHz h<sup>±</sup> 400 kHz μ/μμ

150 kHz e/γ

**Software High Level Trigger** 

Partial event reconstruction, select displaced tracks/vertices and dimuons

150 kHz

Buffer events to disk, perform online detector calibration and alignment

Full offline-like event selection, mixture of inclusive and exclusive triggers

12.5 kHz Rate to storage

Can increase efficiency on B-hadrons? YES, use more precision!!



#### **Synchronous with DAQ**

◆ Use <u>tracks</u> for selections on B-decay vertices (in 35ms)

# Split with a large buffer (4PB)!

#### **Deferred Processing**

★ Reconstruct with offline-like calibrations (in 350ms), becoming real-time physics analysis

# **UPGRADES FOR RUN 3**





Can increase luminosity x10?
Can increase b-hadron efficiency x2?

#### YES, remove limit from L0 -1MHz readout!



Allow detector readout and reconstruction at unprecedented rate: 30MHz !!

# TRIGGER-LESS?





From Run1 to Run3, TDAQ system evolved to handle more readout rate

Key strategy: reduce data size at FE and suppress pileup with tracking



- ♦ Run2: ~ 100k cores < 6 ms</p>
- Run3: modern CPU & co-processors (FPGA/ GPU)



Forward tracking
pr > 500 MeV, δp/p ~ 0.5%

PV finding

Rate reducing cuts
Output < 1 MHz

Muon Identification

Simplified Kalman fit

Particle Identification

Online Tracking

Velo tracking

Velo-UT tracking

 $p_T > 200 \text{ MeV}, \delta p/p \sim 15\%$ 

arXiv:2105.04031

# LHCB IN RUN3: NETWORK IS DATAFLOW



Readout @ 30 MHz Event size ~ 150kB

- → Data reduction with custom FPGA-card (PCle40), also used in ALICE
  - Data-packing for sub-detectors (zero-suppression, clustering)
- → Data pushed to the Event Building with massive link usage:
  - → ~10,000 GBT (4.8 Gb/s, rad-hard)

DAQ network < 40 Tbit/s Record rate: <100 kHz



PCIe-gen3: simple protocol, large bandwidth PCIe: maximum flexibility in later networking choice

Ref for PCIe40

# NETWORK TRAFFIC COMPARISON



Same data volume as ATLAS/CMS HL-LHC upgrades! But earlier and for less money



# ALICE: THE SMALL BIG-BANG

Recording heavy ion collisions

http://alice-daq.web.cern.ch



# **ALICE STRATEGIES**





# DESIGNED FOR HEAVY ION COLLISIONS



- → 19 different detectors
- → With high-granularity and timing information
  - in particular the Time Projection
    Chamber (**TPC**) has very high
    occupancy, and slow response
- → Large event size (> 40MB)
  - → TPC producing 90% of data
- Complex event topology
  - → low trigger rate: max 3.5 kHz



cms = 5.5 TeV per nucleon pair Pb-Pb collisions at  $L = 10^{27}$  cm<sup>-2</sup>s<sup>-1</sup>

#### → Challenges for TDAQ design:

- detector readout: up to ~50 GB/s
- → storage: 1.2 TB/s (Pb-Pb)

#### **UPGRADING TO RUN 3**



#### → LHC heavy ion programme: extend statistics by x100!

- Increase detector granularity (===> increase event size!)
- Increase storage bandwidth x O(100)
  - Offline reconstruction also challenging due to combinatorics
- Increase readout rates ~kHz → 50 kHz (===> need new and faster electronics)
  - Rate very close to TPC readout !!

#### New TDAQ challenges!

| RORC 1                                  | C-RORC                                          | CRU                                                                                   |                                                  |
|-----------------------------------------|-------------------------------------------------|---------------------------------------------------------------------------------------|--------------------------------------------------|
|                                         |                                                 |                                                                                       |                                                  |
|                                         |                                                 | ~3TB                                                                                  | /s detector readout                              |
|                                         |                                                 |                                                                                       |                                                  |
| 2 ch @ 2 Gb/s<br>PCle gen.1 x4 (1 GB/s) | 12 ch @ up to 6 Gb/s<br>PCle gen.2 x 8 (4 GB/s) | 24 ch @ 5 Gb/s<br>PCle gen.3 X 16 (16 GB/s)                                           |                                                  |
| Custom DDL protocol                     | Custom DDL protocol (same protocol but faster)  | GBT                                                                                   |                                                  |
| Protocol handling<br>TPC Cluster Finder | Protocol handling<br>TPC Cluster Finder         | Protocol handling<br>TPC Cluster Finder<br>Common-Mode correction<br>Zero suppression | New Common Readout Unit (CRU) hased on PCIe40 ca |

(CRU),

Run 1 Run 2

#### INCREASING THROUGHPUTS WITH COTS



- → Data compression in GPUs and FPGAs ==> x2 readout rate
- → Network evolution: 2.5GB/s (2010)  $\Rightarrow$  6GB/s (2015) ==> x2 DAQ throughput



Tracking processing based on GPUs since Run1!

## RUN 3 DAQ: ONLINE RECONSTRUCTION



#### Higher rates with smaller data?

#### Store reconstruction. discard raw data

#### Very heterogeneous system

- Synchronous, with continuous data
  - → Data compression in FPGA/CPU
  - 30s to analyse 20ms-time frame
- Asynchronous, reconstruction in GPUs
  - ⇒ 250 EPN servers with 8 GPU-cards
  - Require large-memory GPUs!



- Common online/offline software
  - Same calibrations and resources

**Detectors electronics** 3.4 TB/s (over 8500 GBTs links) Base Line correction, zero suppr. **Data reduction** CRU/FPGA Readout Calibration 0 Data aggregation **CPU** Local data processing 500 GB/s **Data aggregation** Data aggregation **GPU** Reconstruction Synchronous global CPU data processing **Calibration 1** 90 GB/s Data storage (60 PB) 20 GB/s 1 year of compressed data Write 170 GB/s, Read 270 GB/s More Asynchronous (hours) reconstruction event reconstruction with final calibration

Calibration 2

## SUMMARY OF THE SUMMARIES

- → LHC experiments are among the largest and most complex TDAQ systems in HEP, to cope with a very difficult environment (always top LHC Luminosity)
- Continuous upgrade following the LHC luminosity, with different approaches
  - → ATLAS/CMS high-rate readout and Event Building, based on robust trigger selections
  - → LHCb pioneer online-offline merging with large data throughputs
  - → ALICE drives the GPU evolution and data compression
- → With a general trend, towards higher bandwidths and comodity HW
  - Scalability not obvious. Challenge remains for front-end and back-end technologies and efficient (cost, time, power) computing farms
  - → Moore's law still valid for processors but needs more effort to be exploited
- → Each experiment trying to gain advantage from others' developments
  - → joined efforts already started for hardware/software
  - → sometimes stealing ideas ("... but we can do better than that...")

# BACK-UP SLIDES



#### A 2-DIM FOLDED EVENT BUILDING



#### Large farm of equal nodes with 8 PCIe40 boards, specialised by firmware



- → EB network is oversized: able to manage 64Tb/s (320 network cards x 200Gb/s)
- → Large rejection at HLT1: use O(200) GPU! throughput at ~100kHz
- → Storage Buffer HLT1-HLT2 = 40 PB (3000 hard-disks) enough for days
  - → SSD faster but have short lifetime wrt high read-write rate, so prefer hard-disks

#### A 2-DIM FOLDED EVENT BUILDING



#### Large farm of equal nodes with 8 PCIe40 boards, specialised by firmware



- → EB network is oversized: able to manage 64Tb/s (320 network cards x 200Gb/s)
- → Large rejection at HLT1: use O(200) GPU! throughput at ~100kHz
- → Storage Buffer HLT1-HLT2 = 40 PB (3000 hard-disks) enough for days
  - → SSD faster but have short lifetime wrt high read-write rate, so prefer hard-disks

# LHC: THE SOURCE

#### The clock source

- → ~3600 bunches in 27km
- → distance bw bunches: 27km/3600 = 7.5m
- → distance bw bunches in time: 7.5m/c = 25ns



At full Luminosity, every 25ns, ~23 superimposed p-p interaction events



# PIPELINED TRIGGERS

- → Allow trigger decision longer than clock tick (and no deadtime)
  - → Execute trigger selection in defined clocked steps (fixed latency)
  - → Intermediate storage in stacked buffer cells
  - → R/W pointers are moved by clock frequency
- → Tight design constraints for trigger/FE
- → Analog/digital pipelines
  - → Analog: built from switching capacitors
  - → Digital: registers/FIFO/...
- **→** Full digitisation before/after L1A
  - → Fast DC converters (power consumption!)
- → Additional complication: synchronisation
  - BC counted and reset at each LHC turn
  - → large optical time distribution system



# LOCAL TIMING AND ADJUSTMENTS





- **→** Common optical system: TTC
  - → radiation resistance
  - → single high power laser
- **→** Large distribution
  - → experiments with ~10<sup>7</sup> channels

- → Align readout & trigger at (better than)25ns and correct for
  - → time of flight (25 ns  $\approx$  7.5m)
  - → cable delays (10cm/ns)
  - → processing delays (~100 BCs)

# TRIGGERS FOR MUONS





#### Dedicated detectors:

- → low occupancy for fast pattern recognition
- optimal time-resolution for BC-identification

#### → L1 processing (40 MHz)

- pattern matching with patterns stored in buffers
- simplified fit of track segments

#### → High level processing (100 kHz)

- → full detector resolutions
- → match segments with tracks in the ID
- → isolation



#### **EVOLUTION OF THE FILTER FARM**



# Full readout, but <u>regional reconstruction</u> in HLT seeded by L1 trigger objects



#### Integrated Cloud capability (New!)

→ Added ability to run WLCG grid jobs in FUs during stops/interfill



#### File-based communication

- → HLT and DAQ completely decoupled
- Network filesystem used as transport (and resource arbitration) protocol (LUSTRE FS)

# CMS: LOW-PT TRACK FILTERING



# Track filtering (low p<sub>T</sub>)

# Track finding options

Reduce readout 40 --- 1 MHz by detector coincidences

**→** Special outer tracker modules

"stub"

1 mm

40MHZ

- → two layers of silicon at few mm
- using cluster width and stacked trackers
- **→** Design tracker to have coherent p<sub>T</sub> threshold in the full volume
  - → exploiting strong magnetic field of CMS

fail



Hough **Transform** 



**Associative Memories** 

- Data rates > 50-100 Tbps
- Three R&D efforts: FPGA/ASIC

# LHC COMPUTING TOWARDS NEW PARADIGMS



#### Run1 + Run2

- → Data storage
  - → 339 PB on tapes, 173 PB on disks
- → Global CPU time delivered by Worldwide LHC Computing Grid (WLCG)
  - ⇒ about 900,000 cores

#### Run 3

➡ Evolution of current technologies and current (flat) funding is ok

#### Run 4

- **→** Linear increase of digitisation time
- Factorial increase of reconstruction time
- → Larger events, lots of more memory



see [Ref]

- Need factor 2-3 more storage and computing resources for HL-LHC
  - new developments and R&D projects for data management and processing, SW multithreading, new computing models and data compression