Artificial intelligence and modern physics: a two-way connection

Europe/Rome
Monopoli (BA)

Monopoli (BA)

Resort Porto Giardino
Description

Artificial intelligence is revolutionizing scientific discovery, enabling the creation, analysis, and manipulation of ever-growing and highly complex datasets. The Ph.D. school “Artificial Intelligence and Modern Physics: a two-way Connection” is an interdisciplinary program that integrates machine learning methodologies into various aspects of modern physics research, bridging the gap between fundamental physics and advanced data science techniques to equip participants with the skills to tackle complex problems in astrophysics, particle physics, and medical physics. Supported by international lecturers at the forefront of AI and modern physics research, school participants will develop their foundational knowledge in AI through lectures. During hands-on sessions, participants will experiment with cutting-edge tools for exploring the cosmos, probing the fundamental building blocks of matter, and advancing medical techniques.

Contact
    • 16:00
      Registration
    • 19:00
      Welcome aperitiv
    • Lecture
      • 14
        ML tools
        Speaker: Stefano Giagu (Sapienza Università di Roma and Istituto Nazionale di Fisica Nucleare)
      • 15
        Bayesian statistics
        Speaker: Christopher J. Moore
    • 13:00
      Lunch
    • Lecture
      Convener: Guido Sanguinetti (SISSA)
    • Hackathon
      • 17
        Hackathon session 1
    • 18
      Social dinner

      Terrace above the reception building

    • Hackathon
      • 19
        Anomaly detection for new physics searches in HEP

        Identify rare new physics process through an anomaly detection technique based on deep neural network (Graph Neural Network architecture).

        Material for the exercise i.e. datasets and examples have been copied to the leonardo cluster and are available at:
        /leonardo/home/usertrain/a08trb55/anomalyDetection/LHCO

      • 20
        Classification of Order and Species of Mosquitoes

        Conventional manual counting methods for the monitoring of mosquito species and populations can hinder the accurate determination of the optimal timing for pest control in the field. In this exercise is required to train a deep learning-based automated image analysis algorithm, for a two-fold task: the classification of different species and order of mosquito, based on a professionally made dataset of mosquitos photographs from multiple species.

      • 21
        Dual Intelligence: Tackling Classification and Regression Challenges

        We propose an hackathon divided into two challenges: in the first one, participants will tackle a regression problem using the well-known 'Asteroid Dataset', where they need to estimate the diameter of various types of asteroids. In the second challenge, participants will face a classification problem aimed at reconstructing the diagnosis of diabetes for different types of patients based on survey responses.

      • 22
        Exploitation of jets features to tag VBF like events in high energy particle physics

        The goal of the project is to develop a tagger for distinguishing VBF-like signal events from background ones using properties of the two leading jets (tag jets) and eventually the third one. The key idea is to exploit the color flow patterns in signal events, where the QCD activity between jets is greater between each jet and the beam spot with respect to the region between the two tag jets. Relevant features for the tagger may include jet kinematic properties, jet substructure variables, and color flow observables. Signal (VBF) and background (Drell Yan) samples will be provided for training and validation, the study will be done at parton level.

      • 23
        Integrating Spliced and Unspliced Gene Expression Data for Improved Cell Type Annotation in Single-Cell RNA Sequencing

        Cell type annotation is one of the primary tasks in single-cell RNA sequencing analysis and it is of a significance importance and difficulty in computational biology. This difficulty arises from two primary factors. First, gene expression levels typically exist on a continuum rather than being discrete, making it harder to draw clear boundaries between different cell types. Second, variations in gene expression don't always correlate with functional differences at the cellular level, adding complexity to the classification process.
        Given these challenges, a more precise and reliable method for cell type annotation is always welcomed. We propose an approach that integrates additional layers of biological data, mainly spliced and unspliced expression, to look at gene expression estimates from a different perspective. By combining these pseudo-multi-modal layers of information, we might achieve more accurate cell type classification and better capture the functional nuances of different cell populations.

      • 24
        STARFINDER - Machine Learning Techniques for Evaluating Globular Cluster’s Stars Membership Probabilities

        Abstract

        Globular clusters (GCs), spheroidal conglomeration of stars tightly bound together by means of gravitational force, are among the oldest objects that live within our galaxy. A key characteristic of these objects is their high density, significantly greater than the average galactic star density (between $\sim10^4$ to $\sim10^6$ stars within a spheroid of radius up to $\sim100\,pc$, in stark contrast to the local average stellar density of about $\sim1-2\,\frac{\text{stars}}{pc^3}$), so that they can be considered collisional systems. The ESA's Gaia (Global Astrometric Interferometer for Astrophysics) mission, which has mapped nearly 2 billion stars in our galaxy up to its third data release, provides the largest set of high-resolution data available, enabling the detailed study of GCs' internal dynamics.

        However, the high density of these regions presents a challenge for Gaia's 1.45-meter primary mirror, often resulting in compromised data quality and insufficient resolution. Consequently, accurately associating stars with clusters becomes difficult due to poor estimates and high errors in the parameters.

        Machine Learning (ML) algorithms offer a promising solution to this problem. As demonstrated in referenced paper [1], techniques inspired by ML such as Mixture Modelling, which uses Markov-Chain Monte Carlo, Extreme Deconvolution and Maximum Likelihood Estimation, can be employed to infer the general distribution properties of the cluster, distinguishing them from field star distributions. Enhancing these methodologies with neural networks such as Generative Adversarial Networks, which could be used to simulate stellar populations based on observational data, would allow for the assignment of membership probabilities to each source in the sample, significantly increasing the number of sources available, up to a factor of $10^2$, and thereby enhancing the statistical robustness of subsequent astrophysical analyses.

        References

        [1] Vasiliev, Baumgardt (2021). \emph{Gaia EDR3 view on Galactic globular clusters}; MNRAS 505, 5978–6002

    • Hands-on session
      • 25
        Hands-on session 7 (ML tools)
    • 13:00
      Lunch
    • Hackathon
      • 26
        Hackathon session 3
      • 27
        Hackathon session 4
      • 28
        Hackathon: final reports

        Each group should prepare 3 slides:
        1) Problem statement
        2) Chosen architecture and strategy
        3) Results (showing the FOM indicated in the challenge)
        Send us the slides by email at aiphy@unimib.it