Speaker
Dr
Geert Verdoolaege
(Ghent University)
Description
Any measurement process involves the sampling from a latent probability distribution. This distribution contains all information about the quantity of interest and learning algorithms can benefit considerably from the inclusion of probabilistic information. The field of information geometry, which is well-established from the theoretical point of view, describes probability distribution families using the framework of differential geometry. The Fisher information serves as a metric tensor and allows the calculation of geodesic distances (GDs) between distributions. Equipped with a genuine distance measure, machine learning techniques can be adapted to operate on probabilistic manifolds, enabling regression, classification and dimensionality reduction. In this work we concentrate on classification and dimensionality reduction, with an application to the public international database (PID) of tokamak confinement data. We model each physical variable with a Gaussian distribution and, with the aid of the GD, we show that this leads to an improved performance of classification according to the confinement mode, compared to the classification of only the measured values in a Euclidean space. We furthermore compare a k-nearest-neighbor classifier with a support vector machine algorithm using a geodesic kernel. We also consider projection of the data on a tangent plane at the geodesic centroid of the data cloud, followed by classification in this hyperplane. We next show the advantage of dimensionality reduction of the confinement data on a Gaussian probabilistic manifold, respecting the inherent probabilistic nature of both the original and the reduced space for dimensionality estimation.
Primary author
Dr
Geert Verdoolaege
(Ghent University)
Co-authors
Mr
Giorgos Karagounis
(Ghent University)
Prof.
Guido Van Oost
(Ghent University)