## My activities

#### Roberto Ammendola

#### Istituto Nazionale di Fisica Nucleare, Sezione Roma Tor Vergata

#### December 2018

・ロト ・四ト ・ヨト ・ヨト

2

- 2002: Graduated in Physics in Tor Vergata
- 2003-2005: INFN Scholarship
- 2005-2009: INFN Technological Research Fellowship
- 2010-2018: Tecnologo TD
- 2018: Tecnologo TI

Keywords: FPGA, HPC, Interconnection Networks, Computing Architectures

イロト イヨト イヨト イヨト

## The legacy of APE research

- Array Processor Experiment is a 25+ years old project at INFN.
- Developing fully custom and hybrid parallel computing machines.
- Research advances in floating point engines, interconnection networks, system integration, compilers, software libraries, ...
- Since 2003 we started exploring a way to use commodity hardware (a.k.a. clusters) with a custom interconnect.
- Programmable hardware is the key technology for this work.



1988





## APENet: a custom interconnect for HPC

Research topics:

- Custom network topologies  $\Rightarrow$  multi-dimensional torus
- Offloading communication tasks  $\Rightarrow$  RDMA, Address translation, . . .
- Enhance communication with special devices ⇒ "GPU Direct"





# Nanet (2014-2018): real time stream processing at NA62

- investigate the usage of heterogeneous computing devices in HEP DAQ and low level trigger systems
- bring the power and flexibility of modern heterogeneous computing devices, such as GPUs and high-end FPGAs, close to the data source to improve low level trigger performances
- meet the real-time and throughput requirements of target systems
- heterogeneous processing pipeline capable of reconstructing online the Cherenkov rings geometry (center and radius) of NA62 RICH detector
- Level 0 trigger can use this refined primitives to tag/veto different decay channels directly at level-0
- Demonstrated processing latency of 260us, compatible with time budget of L0 trigger.



# Patent (2015): speedup of collective communications

Use progammable logic resources to offload host-to-network communication task:

- descriptor-based host-to-network interaction
- send requests are notified using a TX ring buffer
- hardware-initiated DMAs
- TX done and RX done are notified to Software using an event queue ring buffer
- dedicated hardware resources to store payload and destination descriptors



## Collaboration in European Projects

#### ExaNeST-EuroEXA

- Design and prototyping of 10<sup>1</sup>8 Floating Point Operation per second class supercomputers
- ExaNeST: 2015-2019 timespan, 8 MEU budget, 12 partners
- EuroEXA: 2018-2020 timespan, 20 MEU budget, 16 partners
- Our role is on designing network topology and architecture
- Key features: power-efficient ARM processors, unified low-latency interconnect, NVM storage and liquid cooling system
- WaveScales experiment within the Human Brain Project (SGA1 and SGA2)
  - goal of matching experimental measures and simulations of slow waves during deep-sleep and anesthesia and the transition to other brain states
  - focus is the development of dedicated large-scale parallel/distributed simulation application: Distributed and Plastic Spiking Neural Networks (DPSNN) simulation engine
  - performance study on low-power ARM-based architectures
  - energy-to-solution analysis

イロト イボト イヨト イヨト

### Turbonet

- 2018-2019 project in CSN5
- developing new architectures for 3D FFT-based large scale simulations
- i.e. in pseudo-spectral Navier-Stokes simulation for turbulent fluid dynamics 90% of computational time is spent in FFT.
- double precision is a scientific requirement
- the aim is to demonstrate that a fully pipelined architecture is a key value in reducing cost and time



### Other collaborations and activities

- NA62, CTA, CSES-Limadou, ...
- Vice referente TT
- RUP acquisti Caen e materiale informatico

#### REMEMBER: SURVEY MAPPATURA COMPETENZE

### Thanks for your patience



© 10 / 10