ATTENZIONE: Lunedì 15 Luglio, dalle 13:00 alle 15:00 sarà effettuato un intervento di manutenzione su Durante tale fascia oraria il servizio potrà risultare non raggiungibile.

ATTENTION: On Monday 15 July, from 1:00 pm to 3:00 pm, maintenance will be carried out on During this time slot, the service may be unreachable.


Big data at CERN

by Zbigniew Baranowski (CERN)

Aula C. Voci (INFN Padova)

Aula C. Voci

INFN Padova


Data generation rates at CERN are growing very fast for database workloads going into LHC run 2 and beyond. In particular, this is expected for data coming from controls, logging and monitoring systems. Storing, administering and accessing big data sets in a relational database system can quickly become a very hard technical challenge, as the size of the active data set and the number of concurrent users increase. In order to cope with this problem, CERN has adopted modern Big Data solutions based on Apache Hadoop and its ecosystem. Notably, technologies like Apache Spark, Impala, Parquet are offering a rapidly developing set of solutions for deploying and managing very large data warehouses on commodity hardware and with open source software. Additionally, they enable new, flexible interfaces for data processing including machine learning.
This presentation will also describe the infrastructure that currently is deployed at CERN and the most interesting projects that are running on top of it.

Organized by

Tommaso Dorigo