CaloChallenge Workshop

Europe/Rome
Villa Mondragone

Villa Mondragone

Michele Faucci Giannelli (Istituto Nazionale di Fisica Nucleare)
Description

The workshop is organised to discuss the results of the #calochallenge, this year's ML competition in high energy physics focussing on fast calorimeter simulation using generative models.

Every contributor will have the opportunity to present the method developed. Plenty of time will be devoted to discussions about the benefits and limitations of the different approaches.

The event will take place in Villa Mondragone in the hills above Frascati, near Rome. Participants are encouraged to stay in Frascati from where a bus will be organised to and from the Villa. A social dinner is scheduled on May 30th at 8PM at "Ristorante Al Fico", in Grottaferrata. The transport will be provided.

The event will be hybrid, allowing for virtual participation.  If you would like to participate remotely, please register and indicate the remote option.  We expect all speakers to be in person.

Local Organizing Committee:
Michele Faucci Giannelli (INFN - Roma2)
Marco Vanadia (INFN - Roma2)
Umberto De Sanctis (Universita' Roma2 Tor Vergata)

International Advisory Committee:
Ben Nachman (LBNL)
David Shih (Rutgers)
Michele Faucci Giannelli (NFN - Roma2)
Claudius Krause (University of Heidelberg)
Dalila Salamani (CERN)
Gregor Kasieczka (University of Hamburg)
Anna Zaborowska (CERN)

Participants
  • Ali Kaan Güven
  • Aman Desai
  • Anatolii Korol
  • Andres Vargas Hernandez
  • Anna Zaborowska
  • Benjamin Nachman
  • Benno Käch
  • Bingzhi Li
  • Claudius Krause
  • Dalila Salamani
  • Daniele Dal Santo
  • David Shih
  • Dirk Kruecker
  • Federico Andrea Guillaume Corchia
  • Florian Rehm
  • Henry Day-Hall
  • Hosein Hashemi
  • Humberto Alonso Reyes Gonzalez
  • Humberto Reyes-González
  • Ian Pang
  • Jaydip Singh
  • Jesse Cresswell
  • Jichang Ryu
  • Kevin Pedro
  • Kristina Jaruskova
  • Lorenzo Rinaldi
  • Luigi Favaro
  • Marco Letizia
  • Matteo Franchini
  • Michele Faucci Giannelli
  • Moritz Scham
  • Oz Amram
  • Piyush Raikwar
  • Qibin LIU
  • Rui Zhang
  • Sandro Wenzel
  • Serena Palazzo
  • Sergey Korpachev
  • Shivam Raj
  • Simon Schnake
  • Thorsten Lars Henrik Buss
  • Tong Qiu
  • Umberto De Sanctis
  • Vinicius Mikuni
    • 1
      Bus at piazza Marconi

      Bus will leave at 9:00 for Villa Mondragone

    • 2
      Registration
    • 3
      Introduction
      Speaker: Michele Faucci Giannelli (Istituto Nazionale di Fisica Nucleare)
    • 4
      Welcome from INFN Tor Vergata
      Speaker: Anna Di Ciaccio (Istituto Nazionale di Fisica Nucleare)
    • 5
      Welcome from Fellini
      Speaker: LAURA BANDIERA (Istituto Nazionale di Fisica Nucleare)
    • 6
      ML in industry
      Speaker: Serena Palazzo (CS)
    • 10:35
      Coffee
    • GANs
    • 13:00
      Lunch
    • VAEs
      • 11
        Latent Generative Models for Calo Simulation with VQ-VAE

        Simulation of calorimeter response is important for modern high energy physics experiments. With increasingly large and high granularity design of calorimeters, the computational cost of conventional MC-based simulation of each particle-material interaction is becoming a major bottleneck.

        We propose a new generative model based on a two-stage generative model which is similar to recently latent diffusion models. The first stage model aims only to compress the calorimeter response into a regularized latent space based on Vector Quantized Variational Autoencoders (VQ-VAE). The second stage model handles generative sampling in the latent representation, conditional on phase space parameters such as pT. This second stage can be trained independently of the first, and we demonstrate prior models based on RNNs and diffusion.

        For the Calo Challenge Dataset 1, our demonstration model achieves a speedup of more than 10^4 over GEANT4. The modeling of energy deposition and shower shape is comparable to other approaches such as CaloFlow, with substantially fewer parameters and a factor of ~2 speedup.

        For Datasets 2, we have designed a fully convolutional architecture, employing novel operations which maintain equivariance w.r.t. the cylindrical geometry of the data. This combined with the two-stage modeling approach shows promising generation quality with substantially faster training and approximately 2x faster inference time relative to CaloScore.

        In addition to cylindrical convolution and two-stage VQ-VAE training, we also introduce a novel, self-supervised proxy model that can be used as a perceptual loss function to help with training any AutoEncoder-based model. While some challenges remain in achieving ultra high fidelity suitable for certain physics applications, we present several innovative techniques that may help solve the puzzle of fast and accurate calorimeter simulation.

        Speaker: Qibin LIU (TDLI., Shanghai JiaoTong University)
      • 12
        CERN-Geneva VAE
        Speaker: Dalila Salamani (CERN)
      • 13
        CaloMan: Fast generation of calorimeter showers with density estimation on learned manifolds

        The efficient simulation of particle propagation and interaction within the detectors of the Large Hadron Collider (LHC) is of primary importance for precision measurements and new physics searches. The most computationally expensive simulations involve calorimeter showers, which will become ever more costly and high-dimensional as the Large Hadron Collider moves into its High Luminosity era. The computational costs can be heavily reduced by replacing (parts of) the simulation pipeline with generative networks. We thus propose to model calorimeter showers, first by learning a lower-dimensional manifold structure with an auto-encoder, and to then perform density estimation on this manifold with a normalising flow. Our approach, lies on the notion that the seemingly high-dimensional data of HEP experiments, actually has a much lower intrinsic dimensionality. In machine learning, this is known as the manifold hypothesis, which states that high-dimensional data is supported on low-dimensional manifolds. By reducing the dimensionality of the data we enable fast training and generation without compromising accuracy.

        Speaker: Humberto Alonso Reyes Gonzalez (Istituto Nazionale di Fisica Nucleare)
      • 14
        Discussion
    • 16:00
      Coffee
    • Diffusion
      • 15
        Score-based Generative Models for Calorimeter Shower Simulation

        Diffusion generative models are a new class of generative algorithms that have been shown to produce realistic images even in high dimensional spaces, currently surpassing other state-of-the-art models for different benchmark categories and applications. In this work we introduce CaloScore, a score-based generative model for collider physics applied to calorimeter shower generation. Three different diffusion models are investigated using the Fast Calorimeter Simulation Challenge 2022 dataset. CaloScore is the first application of a score-based generative model in collider physics and is able to produce high-fidelity calorimeter images for all datasets, providing an alternative paradigm for calorimeter shower simulation.

        Speaker: Vinicius Mikuni (LBNL)
      • 16
        Denoising Diffusion Models for High Fidelity Calorimeter Simulation

        Denoising diffusion models have recently become state of the art in the ML community because of their stable training procedure and ability to generate high quality images in reasonable computation times. We employ diffusion models for the task of generating calorimeter showers within the context of the CaloChallenge. Our diffusion models are based on 3D cylindrical convolutional networks, which take advantage of symmetries of the underlying data representation. For dataset 1, which has a basic cylindrical geometry but irregular binning between the different layers, we employ a differentiable embedding procedure that learns a reversible mapping from the original data format to a regular geometry on which cylindrical convolutions can be applied. We find our diffusion approach is able to generate high quality showers for all three datasets, achieving classifier AUC scores of ~0.7 or better.

        Speaker: Oz Amram (Fermilab)
      • 17
        CaloClouds: Fast Geometry-Independent Highly-Granular Calorimeter Simulation

        Simulating showers of particles in highly-granular detectors is a key frontier in the application of machine learning to particle physics.
        Achieving high accuracy and speed with generative machine learning models would enable them to augment traditional simulations and alleviate a major computing constraint.
        This work achieves a major breakthrough in this task by for the first time directly generating a point-cloud of $O(1000)$ space points with energy depositions in the detector in 3D-space without relying on a fixed-grid structure. This is made possible by two key innovations: i) using recent improvements in generative modelling we apply a diffusion model and ii) an initial even higher-resolution point-cloud of up to $40000$ so-called GEANT4 steps which are subsequently down-sampled to the desired number of up to $6000$ space points.
        We showcase the performance of this approach using the specific example of simulating photon showers in the planned electromagnetic calorimeter of the International Large Detector (ILD) and achieve overall good modelling of physically relevant distributions.

        Speaker: Anatolii Korol (Deutsches Elektronen-Synchrotron (DESY))
      • 18
        Discussion
    • 19
      Bus to dinner
    • 20
      Bus at piazza Marconi
    • Normalised Flow
      • 21
        (inductive) CaloFlow

        We apply CaloFlow to GEANT4 showers of Dataset 1, producing high-fidelity samples with a sampling time of less than 0.1ms per shower. We validated the fidelity of the samples using multiple metrics, including a classifier metric. To generalize CaloFlow to the higher dimensional Datasets 2 and 3, we propose a new approach called Inductive CaloFlow. This approach involves training the flow on the pattern of energy deposition in both the current and previous layer of a GEANT4 event. Inductive CaloFlow can efficiently generate new events for large calorimeter geometries and reproduces GEANT4-like events with high fidelity. With both approaches, a teacher-student pairing was used to increase sampling speed without significant loss of sample quality.

        Speaker: Ian Pang (Rutgers University)
      • 22
        Generating Accurate Showers in Highly Granular Calorimeters Using Normalizing Flows

        Normalizing flows are a type of generative models that can be trained directly by minimizing the negative log-likelihood. It has been shown that flows can accurately model showers in low complexity calorimeters. We show how normalizing flows can be improved and adapted to accurately model showers in calorimeters with significantly higher complexity. One of the key points here is to move away from dense layers to convolutional layers. We show our results on datasets 2 and 3 of the CaloChallenge.

        Speaker: Thorsten Buss (University of Hamburg)
      • 10:30
        Coffee
      • 23
        CaloPointFlow - Generating Calorimeter Showers as Point Clouds

        In particle physics, precise simulations of the interaction processes in calorimeters are essential for scientific discovery. However, accurate simulations using GEANT4 are computationally very expensive and pose a major challenge for the future of particle physics. In this study, we apply the CaloPointFlow model, a novel generative model based on normalizing flows, to fast and high-fidelity calorimeter shower generation. We use the CaloPointFlow model, an adapted version of the PointFlow model for 3D shape generation, to generate calorimeter showers using point clouds that exploit the sparsity and leverage the geometry of the data. We preprocess the voxelized datasets of the Fast Calorimeter Simulation Challenge 2022 to point clouds and apply the CaloPointFlow model to all three datasets without any adaptation. Furthermore, we evaluate the performance of our model on metrics such as energy resolution, longitudinal and transverse shower profiles, and shower shapes, and compare it with GEANT4. We demonstrate that our model can produce realistic and diverse samples with a sampling time of around 30 million single 4D points per minute. However, the model also has some limitations, such as its inability to capture the point-to-point correlation and its generation of multiple points per cell, which are in contradiction to the data. To address these issues, we propose a novel method that uses a second sampling step to compute the marginal likelihoods of each cell being hit and sample the energies accordingly. We also discuss some ideas on how to handle the point-to-point correlations in future work. The main strengths of our model are its ability to handle diverse datasets, its fast and stable convergence, and its highly efficient point production.

        Speaker: Simon Schnake (DESY / RWTH Aachen)
      • 24
        Discussion
    • 25
      Going into production, FastCalo in the bigger picture
      Speaker: Michele Faucci Giannelli (Istituto Nazionale di Fisica Nucleare)
    • 26
      CaloChallenge result
      Speaker: Claudius Krause (Heidelberg University)
    • 27
      Discussion
    • 13:30
      Lunch
    • 28
      Bus depart to Frascati