16–20 Jun 2025
THotel, Cagliari, Sardinia, Italy
Europe/Rome timezone

✨ Event Tokenization and Next-Token Prediction for Anomaly Detection at the LHC

18 Jun 2025, 17:22
3m
T1a+T1b

T1a+T1b

Poster Session B Patterns & Anomalies 🔀 Simulations & Generative Models

Speaker

Ambre Visive (Nikhef - University of Amsterdam)

Description

Advances in Machine Learning, particularly Large Language Models (LLMs), enable more efficient interaction with complex datasets through tokenization and next-token prediction strategies. This talk presents and compares various approaches to structuring particle physics data as token sequences, allowing LLM-inspired models to learn event distributions and detect anomalies via next-token (or masked token) prediction. Trained only on background events, the model reconstructs expected physics processes. At inference, both background and signal events are processed, with reconstruction scores identifying deviations from learned patterns—flagging potential anomalies. This event tokenization strategy not only enables anomaly detection but also represents a potential new approach for training a foundation model at the LHC. The method is tested on simulated proton-proton collision data from the Dark Machines Collaboration and applied to a four-top-quark search, replicating ATLAS conditions during LHC Run 2 ($\sqrt{s} = 13 \text{ TeV}$). Results are compared with other anomaly detection strategies.

AI keywords anomaly detection; tokenization; Large-Language Model; transformers; next-token prediction

Primary author

Ambre Visive (Nikhef - University of Amsterdam)

Co-authors

Dr Clara Nellist (Nikhef - University of Amsterdam) Mrs Polina Moskvitina (Nikhef - Radboud University) Dr Roberto Ruiz de Austri (Valencia University, IFIC) Dr Sascha Caron (Nikhef - Radboud University)

Presentation materials