Workshop di CCR: La Biodola, 3 - 7 giugno 2019

Name: Workshop di CCR: La Biodola, 3 - 7 giugno 2019
Start: 2019-06-03T09:00:00+02:00
End: 2019-06-07T23:30:00+02:00
Location: Hotel Hermitage - Isola d'Elba

3–7 Jun 2019

Hotel Hermitage - Isola d'Elba

Europe/Rome timezone

Support

Web-based interactive data analysis for HEP with Spark and ROOT DataFrame

6 Jun 2019, 09:00

30m

Sala Maria Luisa (Hotel Hermitage - Isola d'Elba)

Sala Maria Luisa

Hotel Hermitage - Isola d'Elba

La Biodola 57037 Portoferraio (Li) Tel. +39.0565 9740 http://www.hotelhermitage.it/

Infrastrutture di calcolo e cloud

Enric Tejedor Saavedra (CERN) Javier Cervantes Villanueva (CERN) Piotr Mrowczynski (CERN) Prasanth Kothuri (CERN) Vincenzo Eduardo Padulano

This talk is about sharing our recent experiences in providing data analytics platform based on Apache Spark for High Energy Physics (HEP), building applications on top of it and exploring its usage in some use case scenarios.
Apache Spark is an analytics framework especially aimed at managing big data, with a strong traction and the support of a large user base. At CERN, the issue of distributing large scale computations has been tackled deploying both Hadoop service(s) to on-premise clusters with Spark running on YARN and on OpenStack Cloud infrastructure(s) with Spark running on Kubernetes.
Meanwhile, CERN provides a web platform, called SWAN (Service for Web-based ANalysis), where users can write and run their analyses in the form of (jupyter) notebooks, seamlessly accessing the data and software they need without having them
on their machine.
The first part of the presentation talks about the integrations between Spark, Hadoop with YARN and OpenStack with Kubernetes that together brought a truly Unified Analytics Platform, enabling scaled, distributed HEP data processing. Furthermore, we will discuss how SWAN has become the interface of such Analytics Platform, providing submission and monitoring capabilities for Spark computations.
The second part will focus on evolutions in exploiting analytics infrastructure, namely new developments in ROOT analytics framework - Distributed RDataFrame and PyRDF - which through SWAN allow interactive, parallel and distributed analysis on large physics datasets stored on EOS that can be easily monitored and shared with others.

Vincenzo Eduardo Padulano

web based interactive data analysis.pdf

Workshop di CCR: La Biodola, 3 - 7 giugno 2019

Support

Web-based interactive data analysis for HEP with Spark and ROOT DataFrame

Sala Maria Luisa

Hotel Hermitage - Isola d'Elba

Speakers

Description

Primary author

Presentation materials

Choose timezone

Workshop di CCR: La Biodola, 3 - 7 giugno 2019

Support

Speakers

Description

Primary author

Presentation materials