Fisica statistica

Random matrices and analysis of correlations in data

by Dario Villamaina (Institut de Physique Theorique Philippe Meyer (Paris))

Europe/Rome
Aula 2 (Dip. di Fisica - Edificio E. Fermi)

Aula 2

Dip. di Fisica - Edificio E. Fermi

Description
Determining correlations among variables starting from some observations is a common problem in statistics. In these cases, one deals with some estimators of population correlation matrices, which are affected by finite sampling effects. One of the most common techniques generally used for this kind of problems is the principal component analysis, where one usually retains only the components corresponding to larger eigenvalues of sample correlation matrices, considered as the most informative. Actually, what is usually neglected in this procedure (namel! y the eigenvectors associated to smaller eigenvalues) is not just related to the sampling noise. Indeed, using a combination of random matrix and information-theoretic tools, I will show that all the eigenvectors of sample correlation matrices are informative about the principal components (namely, the eigenvectors associated to large eigenvalues) of the population correlation matrix. This extra information can be used in order to efficiently improve standard data cleaning procedures.