Artificial intelligence and modern physics: a two-way connection

Name: Artificial intelligence and modern physics: a two-way connection
Start: 2024-09-29T08:00:00+02:00
End: 2024-10-05T19:00:00+02:00
Location: Monopoli (BA)

29 Sept 2024, 08:00 → 5 Oct 2024, 19:00 Europe/Rome

Monopoli (BA)

Resort Porto Giardino

Description

Artificial intelligence is revolutionizing scientific discovery, enabling the creation, analysis, and manipulation of ever-growing and highly complex datasets. The Ph.D. school “Artificial Intelligence and Modern Physics: a two-way Connection” is an interdisciplinary program that integrates machine learning methodologies into various aspects of modern physics research, bridging the gap between fundamental physics and advanced data science techniques to equip participants with the skills to tackle complex problems in astrophysics, particle physics, and medical physics. Supported by international lecturers at the forefront of AI and modern physics research, school participants will develop their foundational knowledge in AI through lectures. During hands-on sessions, participants will experiment with cutting-edge tools for exploring the cosmos, probing the fundamental building blocks of matter, and advancing medical techniques.

Contact

aiphy@unimib.it

Sunday 29 September
- 16:00 → 19:00
  
  Registration 3h
- 19:00 → 19:30
  
  Welcome aperitiv 30m
Monday 30 September
- 08:45 → 09:15
  
  Welcome and introduction 30m
  
  2024-09-30-AIPHY.pdf
- 09:20 → 13:00
  Lecture
  - 09:20
    
    Data access and preparation 1h 40m
    
    Speaker: Andrea Beschi
  - 11:20
    
    Data access and preparation 1h 40m
    
    Speaker: Andrea Beschi
- 13:00 → 14:00
  
  Lunch 1h
- 14:00 → 18:00
  Hands-on session
  - 14:00
    
    Hands-on session 1 (data access and preparation) 2h
    
    Colab notebook (make your copy!)
    
    data_preparation_practice.zip
  - 16:00
    
    Hands-on session 2 (data access and preparation) 2h
    
    data_preparation_practice_solution.zip
Tuesday 1 October
- 09:20 → 13:00
  Lecture
  - 09:20
    
    Bayesian statistics 1h 40m
    
    Speaker: Christopher J. Moore
    
    ChristopherMoore_AIPHY_Monopoli2024-2.pdf
    
    NestedSampling_DemoVisualisation
  - 11:20
    
    ML elements 1h 40m
    
    Speaker: Dr Riccardo Finotello (CEA Paris-Saclay)
    
    ML elements - GitHub repository
    
    ML elements - slides
- 13:00 → 14:20
  
  Lunch 1h 20m
- 14:20 → 18:00
  Hands-on session
  - 14:20
    Hands-on session 3 (Bayesian statistics) 1h 40m
    
    1-Fragment_C_Hole_Measurements.csv
    
    General description of the task
    
    https://docs.google.com/document/d/1JeT1CdxG5VSRESSRxoZkfJUpRHfK3J2AbkGvLn1S198/edit?usp=sharing
    
    Some example code to help you get started if you are struggling
    
    https://colab.research.google.com/drive/1sYLQ_w0V6qFi_G3Edez0UAs8h6S-qPqx?usp=sharing
  - 16:00
    
    Hands-on session 4 (ML elements) 1h 40m
Wednesday 2 October
- 09:20 → 13:00
  Lecture
  - 09:20
    
    ML elements 1h 40m
    
    Speaker: Riccardo Finotello (CEA LIST)
    
    ML elements - GitHub repository
  - 11:20
    
    ML tools 1h 40m
    
    Speaker: Stefano Giagu (Sapienza Università di Roma and Istituto Nazionale di Fisica Nucleare)
    
    DNN_ModelsForSparseInteractions_CNN_and_GNN.pdf
- 13:00 → 14:20
  
  Lunch 1h 20m
- 14:20 → 18:00
  Hands-on session
  - 14:20
    
    Hands-on session 5 (ML elements) 1h 45m
  - 16:20
    
    Hands-on session 6 (ML tools) 1h 40m
    
    GitHub repository for the exercises
Thursday 3 October
- 09:20 → 13:00
  Lecture
  - 09:20
    
    ML tools 1h 40m
    
    Speaker: Stefano Giagu (Sapienza Università di Roma and Istituto Nazionale di Fisica Nucleare)
    
    DNN_UncertaintyQuantification.pdf
  - 11:20
    
    Bayesian statistics 1h 40m
    
    Speaker: Christopher J. Moore
- 13:00 → 14:20
  
  Lunch 1h 20m
- 14:20 → 16:00
  Lecture
  
  Convener: Guido Sanguinetti (SISSA)
  - 14:20
    
    ML applications to biomedical data 1h 40m
    
    Speaker: Guido Sanguinetti (SISSA)
    
    Monopoli_lecture-3.pdf
- 16:20 → 18:20
  Hackathon
  
  hackathon.pdf
  - 16:20
    
    Hackathon session 1 1h 45m
- 20:00 → 22:00
  
  Social dinner 2h
  
  Terrace above the reception building
  
  group_picture.jpeg
Friday 4 October
- 09:20 → 11:00
  Hackathon
  
  hackathon.pdf
  - 10:40
    
    Anomaly detection for new physics searches in HEP 20m
    
    Identify rare new physics process through an anomaly detection technique based on deep neural network (Graph Neural Network architecture).
    
    Material for the exercise i.e. datasets and examples have been copied to the leonardo cluster and are available at:
    /leonardo/home/usertrain/a08trb55/anomalyDetection/LHCO
  - 10:40
    
    Classification of Order and Species of Mosquitoes 20m
    
    Conventional manual counting methods for the monitoring of mosquito species and populations can hinder the accurate determination of the optimal timing for pest control in the field. In this exercise is required to train a deep learning-based automated image analysis algorithm, for a two-fold task: the classification of different species and order of mosquito, based on a professionally made dataset of mosquitos photographs from multiple species.
    
    Dataset
    
    Jupyter example
    
    Presentazione _50524_Sarleti.pptx
    
    s41598-021-92891-9.pdf
  - 10:40
    
    Dual Intelligence: Tackling Classification and Regression Challenges 20m
    
    We propose an hackathon divided into two challenges: in the first one, participants will tackle a regression problem using the well-known 'Asteroid Dataset', where they need to estimate the diameter of various types of asteroids. In the second challenge, participants will face a classification problem aimed at reconstructing the diagnosis of diabetes for different types of patients based on survey responses.
    
    Materials
  - 10:40
    
    Exploitation of jets features to tag VBF like events in high energy particle physics 20m
    
    The goal of the project is to develop a tagger for distinguishing VBF-like signal events from background ones using properties of the two leading jets (tag jets) and eventually the third one. The key idea is to exploit the color flow patterns in signal events, where the QCD activity between jets is greater between each jet and the beam spot with respect to the region between the two tag jets. Relevant features for the tagger may include jet kinematic properties, jet substructure variables, and color flow observables. Signal (VBF) and background (Drell Yan) samples will be provided for training and validation, the study will be done at parton level.
    
    github repo
    
    tree_DY_10k.root
    
    tree_Zjj_10k.root
  - 10:40
    
    Integrating Spliced and Unspliced Gene Expression Data for Improved Cell Type Annotation in Single-Cell RNA Sequencing 20m
    
    Cell type annotation is one of the primary tasks in single-cell RNA sequencing analysis and it is of a significance importance and difficulty in computational biology. This difficulty arises from two primary factors. First, gene expression levels typically exist on a continuum rather than being discrete, making it harder to draw clear boundaries between different cell types. Second, variations in gene expression don't always correlate with functional differences at the cellular level, adding complexity to the classification process.
    Given these challenges, a more precise and reliable method for cell type annotation is always welcomed. We propose an approach that integrates additional layers of biological data, mainly spliced and unspliced expression, to look at gene expression estimates from a different perspective. By combining these pseudo-multi-modal layers of information, we might achieve more accurate cell type classification and better capture the functional nuances of different cell populations.
    
    Dataset and introductory notebook
  - 10:40
    
    STARFINDER - Machine Learning Techniques for Evaluating Globular Cluster’s Stars Membership Probabilities 20m
    
    Abstract
    
    Globular clusters (GCs), spheroidal conglomeration of stars tightly bound together by means of gravitational force, are among the oldest objects that live within our galaxy. A key characteristic of these objects is their high density, significantly greater than the average galactic star density (between $\sim10^4$ to $\sim10^6$ stars within a spheroid of radius up to $\sim100\,pc$, in stark contrast to the local average stellar density of about $\sim1-2\,\frac{\text{stars}}{pc^3}$), so that they can be considered collisional systems. The ESA's Gaia (Global Astrometric Interferometer for Astrophysics) mission, which has mapped nearly 2 billion stars in our galaxy up to its third data release, provides the largest set of high-resolution data available, enabling the detailed study of GCs' internal dynamics.
    
    However, the high density of these regions presents a challenge for Gaia's 1.45-meter primary mirror, often resulting in compromised data quality and insufficient resolution. Consequently, accurately associating stars with clusters becomes difficult due to poor estimates and high errors in the parameters.
    
    Machine Learning (ML) algorithms offer a promising solution to this problem. As demonstrated in referenced paper [1], techniques inspired by ML such as Mixture Modelling, which uses Markov-Chain Monte Carlo, Extreme Deconvolution and Maximum Likelihood Estimation, can be employed to infer the general distribution properties of the cluster, distinguishing them from field star distributions. Enhancing these methodologies with neural networks such as Generative Adversarial Networks, which could be used to simulate stellar populations based on observational data, would allow for the assignment of membership probabilities to each source in the sample, significantly increasing the number of sources available, up to a factor of $10^2$, and thereby enhancing the statistical robustness of subsequent astrophysical analyses.
    
    References
    
    [1] Vasiliev, Baumgardt (2021). \emph{Gaia EDR3 view on Galactic globular clusters}; MNRAS 505, 5978–6002
    
    Catalogue.xlsx
    
    Hackathon project - GC_ML.ipynb
    
    o_cen_data.csv
- 11:20 → 13:00
  Hands-on session
  - 11:20
    
    Hands-on session 7 (ML tools) 1h 40m
- 13:00 → 14:20
  
  Lunch 1h 20m
- 14:20 → 18:00
  Hackathon
  
  hackathon.pdf
  - 14:20
    
    Hackathon session 3 1h 40m
  - 16:20
    
    Hackathon session 4 1h 10m
  - 17:30
    
    Hackathon: final reports 20m
    
    Each group should prepare 3 slides:
    1) Problem statement
    2) Chosen architecture and strategy
    3) Results (showing the FOM indicated in the challenge)
    Send us the slides by email at aiphy@unimib.it

Choose timezone

Artificial intelligence and modern physics: a two-way connection

Monopoli (BA)

Abstract

References