Speaker
Description
Deep generative models have become powerful tools for alleviating the computational burden of traditional Monte Carlo generators in producing high-dimensional synthetic data. However, validating these models remains challenging, especially in scientific domains requiring high precision, such as particle physics. Two-sample hypothesis testing offers a principled framework to address this task. We propose a robust methodology to assess the performance and computational efficiency of various metrics for two-sample testing, with a focus on high-dimensional datasets. Our study examines tests based on univariate integral probability measures, namely the sliced Wasserstein distance, the mean of the Kolmogorov-Smirnov statistics, and the sliced Kolmogorov-Smirnov statistic. Additionally, we consider the unbiased Fréchet Gaussian Distance and the Maximum Mean Discrepancy. Finally, we include the New Physics Learning Machine, an efficient classifier-based test leveraging kernel methods. Experiments on both synthetic and realistic data show that one-dimensional projection-based tests demonstrate good sensitivity with a low computational cost. In contrast, the classifier-based test offers higher sensitivity at the expense of greater computational demands.
This analysis provides valuable guidance for selecting the appropriate approach—whether prioritizing efficiency or accuracy. More broadly, our methodology provides a standardized and efficient framework for model comparison and serves as a benchmark for evaluating other two-sample tests.
AI keywords | Two sample test; Models evaluation; Simulation-based inference |
---|