CaloChallenge Workshop
Villa Mondragone
The workshop is organised to discuss the results of the #calochallenge, this year's ML competition in high energy physics focussing on fast calorimeter simulation using generative models.
Every contributor will have the opportunity to present the method developed. Plenty of time will be devoted to discussions about the benefits and limitations of the different approaches.
The event will take place in Villa Mondragone in the hills above Frascati, near Rome. Participants are encouraged to stay in Frascati from where a bus will be organised to and from the Villa. A social dinner is scheduled on May 30th at 8PM at "Ristorante Al Fico", in Grottaferrata. The transport will be provided.
The event will be hybrid, allowing for virtual participation. If you would like to participate remotely, please register and indicate the remote option. We expect all speakers to be in person.

Local Organizing Committee:
Michele Faucci Giannelli (INFN - Roma2)
Marco Vanadia (INFN - Roma2)
Umberto De Sanctis (Universita' Roma2 Tor Vergata)
International Advisory Committee:
Ben Nachman (LBNL)
David Shih (Rutgers)
Michele Faucci Giannelli (NFN - Roma2)
Claudius Krause (University of Heidelberg)
Dalila Salamani (CERN)
Gregor Kasieczka (University of Hamburg)
Anna Zaborowska (CERN)
- 
                    
                        
                            
                        
                    
                    - 
        
            
                
                    1
                
            
        
        Bus at piazza MarconiBus will leave at 9:00 for Villa Mondragone 
- 
        
            
                
                    2
                
            
        
        Registration
- 
        
            
                
                    3
                
            
        
        IntroductionSpeaker: Michele Faucci Giannelli (Istituto Nazionale di Fisica Nucleare)
- 
        
            
                
                    4
                
            
        
        Welcome from INFN Tor VergataSpeaker: Anna Di Ciaccio (Istituto Nazionale di Fisica Nucleare)
- 
        
            
                
                    5
                
            
        
        Welcome from FelliniSpeaker: LAURA BANDIERA (Istituto Nazionale di Fisica Nucleare)
- 6
- 
        
            
        10:35
    
    
        
        Coffee
- 
        
            
        
        GANs- 
        
            
                
                    7
                
            
        
        CaloShowerGANSpeakers: Michele Faucci Giannelli (Istituto Nazionale di Fisica Nucleare), Rui Zhang
- 
        
            
                
                    8
                
            
        
        MDMA-GANSpeaker: Benno Käch (Deutsches-Elektronen Synchrotron (DESY))
- 
        
            
                
                    9
                
            
        
        BolognaGAN: a containerized solutionSpeaker: Federico Andrea Guillaume Corchia (Istituto Nazionale di Fisica Nucleare)
- 
        
            
                
                    10
                
            
        
        Discussion
 
- 
        
            
                
                    7
                
            
        
        
- 
        
            
        13:00
    
    
        
        Lunch
- 
        
            
        
        VAEs- 
        
            
                
                    11
                
            
        
        Latent Generative Models for Calo Simulation with VQ-VAESimulation of calorimeter response is important for modern high energy physics experiments. With increasingly large and high granularity design of calorimeters, the computational cost of conventional MC-based simulation of each particle-material interaction is becoming a major bottleneck. We propose a new generative model based on a two-stage generative model which is similar to recently latent diffusion models. The first stage model aims only to compress the calorimeter response into a regularized latent space based on Vector Quantized Variational Autoencoders (VQ-VAE). The second stage model handles generative sampling in the latent representation, conditional on phase space parameters such as pT. This second stage can be trained independently of the first, and we demonstrate prior models based on RNNs and diffusion. For the Calo Challenge Dataset 1, our demonstration model achieves a speedup of more than 10^4 over GEANT4. The modeling of energy deposition and shower shape is comparable to other approaches such as CaloFlow, with substantially fewer parameters and a factor of ~2 speedup. For Datasets 2, we have designed a fully convolutional architecture, employing novel operations which maintain equivariance w.r.t. the cylindrical geometry of the data. This combined with the two-stage modeling approach shows promising generation quality with substantially faster training and approximately 2x faster inference time relative to CaloScore. In addition to cylindrical convolution and two-stage VQ-VAE training, we also introduce a novel, self-supervised proxy model that can be used as a perceptual loss function to help with training any AutoEncoder-based model. While some challenges remain in achieving ultra high fidelity suitable for certain physics applications, we present several innovative techniques that may help solve the puzzle of fast and accurate calorimeter simulation. Speaker: Qibin LIU (TDLI., Shanghai JiaoTong University)
- 12
- 
        
            
                
                    13
                
            
        
        CaloMan: Fast generation of calorimeter showers with density estimation on learned manifoldsThe efficient simulation of particle propagation and interaction within the detectors of the Large Hadron Collider (LHC) is of primary importance for precision measurements and new physics searches. The most computationally expensive simulations involve calorimeter showers, which will become ever more costly and high-dimensional as the Large Hadron Collider moves into its High Luminosity era. The computational costs can be heavily reduced by replacing (parts of) the simulation pipeline with generative networks. We thus propose to model calorimeter showers, first by learning a lower-dimensional manifold structure with an auto-encoder, and to then perform density estimation on this manifold with a normalising flow. Our approach, lies on the notion that the seemingly high-dimensional data of HEP experiments, actually has a much lower intrinsic dimensionality. In machine learning, this is known as the manifold hypothesis, which states that high-dimensional data is supported on low-dimensional manifolds. By reducing the dimensionality of the data we enable fast training and generation without compromising accuracy. Speaker: Humberto Alonso Reyes Gonzalez (Istituto Nazionale di Fisica Nucleare)
- 
        
            
                
                    14
                
            
        
        Discussion
 
- 
        
            
                
                    11
                
            
        
        
- 
        
            
        16:00
    
    
        
        Coffee
- 
        
            
        
        Diffusion- 
        
            
                
                    15
                
            
        
        Score-based Generative Models for Calorimeter Shower SimulationDiffusion generative models are a new class of generative algorithms that have been shown to produce realistic images even in high dimensional spaces, currently surpassing other state-of-the-art models for different benchmark categories and applications. In this work we introduce CaloScore, a score-based generative model for collider physics applied to calorimeter shower generation. Three different diffusion models are investigated using the Fast Calorimeter Simulation Challenge 2022 dataset. CaloScore is the first application of a score-based generative model in collider physics and is able to produce high-fidelity calorimeter images for all datasets, providing an alternative paradigm for calorimeter shower simulation. Speaker: Vinicius Mikuni (LBNL)
- 
        
            
                
                    16
                
            
        
        Denoising Diffusion Models for High Fidelity Calorimeter SimulationDenoising diffusion models have recently become state of the art in the ML community because of their stable training procedure and ability to generate high quality images in reasonable computation times. We employ diffusion models for the task of generating calorimeter showers within the context of the CaloChallenge. Our diffusion models are based on 3D cylindrical convolutional networks, which take advantage of symmetries of the underlying data representation. For dataset 1, which has a basic cylindrical geometry but irregular binning between the different layers, we employ a differentiable embedding procedure that learns a reversible mapping from the original data format to a regular geometry on which cylindrical convolutions can be applied. We find our diffusion approach is able to generate high quality showers for all three datasets, achieving classifier AUC scores of ~0.7 or better. Speaker: Oz Amram (Fermilab)
- 
        
            
                
                    17
                
            
        
        CaloClouds: Fast Geometry-Independent Highly-Granular Calorimeter SimulationSimulating showers of particles in highly-granular detectors is a key frontier in the application of machine learning to particle physics. 
 Achieving high accuracy and speed with generative machine learning models would enable them to augment traditional simulations and alleviate a major computing constraint.
 This work achieves a major breakthrough in this task by for the first time directly generating a point-cloud of $O(1000)$ space points with energy depositions in the detector in 3D-space without relying on a fixed-grid structure. This is made possible by two key innovations: i) using recent improvements in generative modelling we apply a diffusion model and ii) an initial even higher-resolution point-cloud of up to $40000$ so-called GEANT4 steps which are subsequently down-sampled to the desired number of up to $6000$ space points.
 We showcase the performance of this approach using the specific example of simulating photon showers in the planned electromagnetic calorimeter of the International Large Detector (ILD) and achieve overall good modelling of physically relevant distributions.Speaker: Anatolii Korol (Deutsches Elektronen-Synchrotron (DESY))
- 
        
            
                
                    18
                
            
        
        Discussion
 
- 
        
            
                
                    15
                
            
        
        
- 
        
            
                
                    19
                
            
        
        Bus to dinner
 
- 
        
            
                
                    1
                
            
        
        
- 
                    
                        
                            
                        
                    
                    - 
        
            
                
                    20
                
            
        
        Bus at piazza Marconi
- 
        
            
        
        Normalised Flow- 
        
            
                
                    21
                
            
        
        (inductive) CaloFlowWe apply CaloFlow to GEANT4 showers of Dataset 1, producing high-fidelity samples with a sampling time of less than 0.1ms per shower. We validated the fidelity of the samples using multiple metrics, including a classifier metric. To generalize CaloFlow to the higher dimensional Datasets 2 and 3, we propose a new approach called Inductive CaloFlow. This approach involves training the flow on the pattern of energy deposition in both the current and previous layer of a GEANT4 event. Inductive CaloFlow can efficiently generate new events for large calorimeter geometries and reproduces GEANT4-like events with high fidelity. With both approaches, a teacher-student pairing was used to increase sampling speed without significant loss of sample quality. Speaker: Ian Pang (Rutgers University)
- 
        
            
                
                    22
                
            
        
        Generating Accurate Showers in Highly Granular Calorimeters Using Normalizing FlowsNormalizing flows are a type of generative models that can be trained directly by minimizing the negative log-likelihood. It has been shown that flows can accurately model showers in low complexity calorimeters. We show how normalizing flows can be improved and adapted to accurately model showers in calorimeters with significantly higher complexity. One of the key points here is to move away from dense layers to convolutional layers. We show our results on datasets 2 and 3 of the CaloChallenge. Speaker: Thorsten Buss (University of Hamburg)
- 
        
            
        10:30
    
    
        
        Coffee
- 
        
            
                
                    23
                
            
        
        CaloPointFlow - Generating Calorimeter Showers as Point CloudsIn particle physics, precise simulations of the interaction processes in calorimeters are essential for scientific discovery. However, accurate simulations using GEANT4 are computationally very expensive and pose a major challenge for the future of particle physics. In this study, we apply the CaloPointFlow model, a novel generative model based on normalizing flows, to fast and high-fidelity calorimeter shower generation. We use the CaloPointFlow model, an adapted version of the PointFlow model for 3D shape generation, to generate calorimeter showers using point clouds that exploit the sparsity and leverage the geometry of the data. We preprocess the voxelized datasets of the Fast Calorimeter Simulation Challenge 2022 to point clouds and apply the CaloPointFlow model to all three datasets without any adaptation. Furthermore, we evaluate the performance of our model on metrics such as energy resolution, longitudinal and transverse shower profiles, and shower shapes, and compare it with GEANT4. We demonstrate that our model can produce realistic and diverse samples with a sampling time of around 30 million single 4D points per minute. However, the model also has some limitations, such as its inability to capture the point-to-point correlation and its generation of multiple points per cell, which are in contradiction to the data. To address these issues, we propose a novel method that uses a second sampling step to compute the marginal likelihoods of each cell being hit and sample the energies accordingly. We also discuss some ideas on how to handle the point-to-point correlations in future work. The main strengths of our model are its ability to handle diverse datasets, its fast and stable convergence, and its highly efficient point production. Speaker: Simon Schnake (DESY / RWTH Aachen)
- 
        
            
                
                    24
                
            
        
        Discussion
 
- 
        
            
                
                    21
                
            
        
        
- 
        
            
                
                    25
                
            
        
        Going into production, FastCalo in the bigger pictureSpeaker: Michele Faucci Giannelli (Istituto Nazionale di Fisica Nucleare)
- 
        
            
                
                    26
                
            
        
        CaloChallenge resultSpeaker: Claudius Krause (Heidelberg University)
- 
        
            
                
                    27
                
            
        
        Discussion
- 
        
            
        13:30
    
    
        
        Lunch
- 
        
            
                
                    28
                
            
        
        Bus depart to Frascati
 
- 
        
            
                
                    20
                
            
        
        


