High Dimensional Data: theory and applications (A synergy between physics, statistics and economics)

Room "B. Pontecorvo" (LNGS)

Room "B. Pontecorvo"


Nicola Rossi (INFN-LNGS)

The importance of high-dimensional data stands in their ability to store a wide range of complex information, pivotal across various fields such as genetics, economics, and meteorology. The inherent complexity of these datasets necessitates new statistical methods and advanced algorithms to effectively manage and interpret them. This workshop will highlight these challenges and present state-of-the-art methodologies and theoretical advances tailored specifically to high-dimensional time series.

Marc Hallin (Université Libre de Bruxelles), Marco Lippi (Einaudi Institute for Economics and Finance (EIEF)), Tommaso Proietti (University of Rome Tor Vergata), Alessandro Razeto (INFN - LNGS)

Carlo Bucci, Alfredo Cocco, Nicola D'Ambrosio, Alessandro Giovannelli, Marcello Messina, Nicola Rossi

Martina Buontempo, Fausto Chiarizia, Mara Pace


    • 9:30 AM
      Welcome coffee
    • 1
      Speakers: Alessandro Giovannelli (UnivAQ), Nicola Rossi (INFN - LNGS)
    • 2
      High-Dimensional Dynamic Factor Models: Theory and Applications to Forecasting and Macroeconomic Analysis

      High-dimensional time series may well be the most common type of dataset in the so-called “big data” revolution, and have entered current practice in many areas, including meteorology, genomics, chemometrics, connectomics, complex physics simulations, biological and environmental research, finance and econometrics. The analysis of such datasets poses significant challenges, both from a statistical as well as from a numerical point of view. The most successful procedures so far have been based on dimension reduction techniques and, more particularly, on high-dimensional factor models. Those models have been developed, essentially, within time series econometrics, and deserve being better known in other areas.

      45 min plus 5 min discussion

      Speaker: Marco Lippi (Einaudi Institute for Economics and Finance (EIEF))
    • 3
      Factor Models for High-Dimensional Functional Time Series

      In this talk, we set up the theoretical foundations for a high-dimensional functional factor model approach in the analysis of large cross-sections (panels) of functional time series (FTS). We first establish a representation result stating that, under mild assumptions on the covariance operator of the cross-section, we can represent each FTS as the sum of a common component driven by scalar factors loaded via functional loadings, and a mildly cross-correlated idiosyncratic component. Our model and theory are developed in a general Hilbert space setting that allows for mixed panels of functional and scalar time series. We then turn to the identification of the number of factors, and the estimation of factors, their loadings, and the common components. We provide a family of information criteria for identifying the number of factors and prove their consistency. We provide average error bounds for the estimators of the factors, loadings, and common components; our results encompass the scalar case, for which they reproduce and extend, under weaker conditions, well-established similar results. Our consistency results in the asymptotic regime where the number $N$ of series and the number~$T$ of time points diverge thus extend to the functional context the "blessing of dimensionality" that explains the success of factor models in the analysis of high-dimensional (scalar) time series. We provide numerical illustrations that corroborate the convergence rates predicted by the theory and provide a finer understanding of the interplay between $N$ and $T$ for estimation purposes. We conclude with an application to forecasting mortality curves, where we demonstrate that our approach outperforms existing methods.

      45 min plus 5 min discussion

      Speaker: Marc Hallin (Université Libre de Bruxelles)
    • 11:40 AM
      Coffee break
    • 4
      Regularized Estimation and Prediction of the El Nino/Southern Oscillation Cycle

      The El Niño/Southern Oscillation (ENSO) phenomenon is one of the most important sources of interannual climate variability. This paper focuses on the prediction of the sea surface temperatures in the four Niño regions, which represent the oceanic manifestation of ENSO. The series are the components of a large-dimensional dynamic system constituted by 15 time series that include the atmospheric component (sea level pressure in Darwin and Tahiti), trade and zonal winds anomalies, as well as series related to the ocean heat content. We propose a prediction method based on a novel regularized multivariate Durbin-Levinson algorithm, which performs the projection of the series into its past, avoiding the curse of dimensionality. The regularization concerns both the lag and the cross-sectional dimensions: we taper and threshold the partial canonical correlations computed on a mixture sample cross-covariance matrix that shrinks the traditional estimator towards a seemingly unrelated system. Using a rolling forecast experiment, we show that the cyclical properties of the series are essential for multi-step ahead forecasting and the use of cross--sectional information leads to significant forecasting gains with respect to traditional vector autoregressive modelling.

      45 min plus 5 min discussion

      Speaker: Tommaso Proietti (University of Rome Tor Vergata)
    • 5
      Data lifecycle for particle experiments

      The life-cycle of data for experiment in particle physics follows a complicated path that starts from the design of the detector, continues with the signal acquisition and finishes in the analysis. Each step involves significant elaboration with techniques that span from digital signal processing to statistical modelling of the processes at play in order to extract the quantities of interest for the measurement. This keynote will introduce such processes and than will focus on the typical inference techniques used by physicists to obtain their results.

      45 min plus 5 min discussion

      Speaker: Alessandro Razeto (INFN - LNGS)