26–30 May 2025
Hotel Hermitage - Isola d'Elba
Europe/Rome timezone

Exploring the latent space of transcriptomic data with topic modeling

26 May 2025, 18:00
15m
Sala Maria Luisa (Hotel Hermitage - Isola d'Elba)

Sala Maria Luisa

Hotel Hermitage - Isola d'Elba

La Biodola 57037 Portoferraio (Li) Tel. +39.0565 9740 http://www.hotelhermitage.it/
Presentazione orale Calcolo teorico e degli esperimenti Calcolo teorico e degli esperimenti

Speaker

Filippo Valle (Istituto Nazionale di Fisica Nucleare)

Description

The availability of high-dimensional transcriptomic datasets is increasing at a tremendous pace, together with the need for suitable computational tools. Clustering and dimensionality reduction methods are popular go-to methods to identify basic structures in these datasets. At the same time, different topic modeling techniques have been developed to organize the deluge of available data of natural language using their latent topical structure.
This paper leverages the statistical analogies between text and transcriptomic datasets to compare different topic modeling methods when applied to gene expression data. Specifically, we test their accuracy in the specific task of
discovering and reconstructing the tissue structure of the human transcriptome and distinguishing healthy from cancerous
tissues. We examine the properties of the latent space recovered by different methods, highlight their differences, and
their pros and cons across different tasks. We focus in particular on how different statistical priors can affect the results
and their interpretability.
Finally, we show that the latent topic space can be a useful low-dimensional embedding space, where a basic neural network classifier can annotate transcriptomic profiles with high accuracy.

Primary authors

Filippo Valle (Istituto Nazionale di Fisica Nucleare) Matteo Osella Michele Caselle (Istituto Nazionale di Fisica Nucleare)

Presentation materials