Computing on Low-Power Architectures (COLA)
from
Thursday, 25 February 2016 (08:15)
to
Friday, 26 February 2016 (18:00)
Monday, 22 February 2016
Tuesday, 23 February 2016
Wednesday, 24 February 2016
Thursday, 25 February 2016
08:15
Registration and Welcome Coffe
Registration and Welcome Coffe
08:15 - 08:45
Room: 1
08:45
Welcome and Opening
-
Lorenzo Pareschi
(
University of Ferrara
)
Welcome and Opening
Lorenzo Pareschi
(
University of Ferrara
)
08:45 - 09:00
Room: 1
09:00
Computing on Low-Power Architectures
-
Filippo Mantovani
(
BSC
)
Computing on Low-Power Architectures
Filippo Mantovani
(
BSC
)
09:00 - 09:30
Room: 1
09:30
ARM in HPC: Software and Tools
-
Geraint North
(
ARM
)
ARM in HPC: Software and Tools
Geraint North
(
ARM
)
09:30 - 09:55
Room: 1
I will briefly update the audience on the state and availability of ARM tools for HPC, covering compilers, debuggers and profilers, and briefly summarise hardware availability. What then follows is a presentation of what I perceive to be the main issues that ARM needs to from a software tools perspective in HPC. Some of these are specific to ARM, and some are industry-wide issues, where ARM could potentially differentiate itself by leading the way to an innovative solution.
10:00
Porting HPC Libraries and Applications to ARM's 64-bit Architecture
-
George Lander
(
ARM
)
Porting HPC Libraries and Applications to ARM's 64-bit Architecture
George Lander
(
ARM
)
10:00 - 10:25
Room: 1
Last year at SuperComputing 2015 ARM announced the ARM Performance Libraries providing BLAS, LAPACK and FFT routines optimised for the ARM AArch64 architecture. Alongside this announcement ARM gave out a list of open source HPC libraries and applications that it would be shipping. This talk will go over some of the issues faced along the road to porting and providing all of these packages.
10:30
Coffe Break
Coffe Break
10:30 - 11:00
Room: 1
11:00
The INFN COSA project
-
Daniele Cesini
(
INFN - CNAF
)
The INFN COSA project
Daniele Cesini
(
INFN - CNAF
)
11:00 - 11:25
Room: 1
The embedded and high-performance computing sectors have in the past been very isolated and unaware of each other’s needs and technologies. Similar isolations have occurred between HPC and the mobile/tablets commodity markets. We are now experiencing a very important convergence between markets, both in constraints and needs as well as in technologies. High computational demands, power consumption limitation, parallelism, heterogeneous computing and cost effectiveness are now driving constraints of both the HPC and embedded sectors. This convergence opens the way to the possibility of performing scientific computation on low power architecture originally developed for the embedded or mobile world. In this talk, we present the panorama of the low power architectures suitable for scientific computation. The INFN experience in building a low power cluster based on System-on-Chips (SoCs) is discussed together with the performance results in terms of power ratio and energy consumption obtained on that cluster. The applications used in the tests range from synthetic benchmarks to real life use cases. Results are compared to those obtained on traditional HPC architectures.
11:30
Energy to Solution vs Time to Solution, towards energy-aware HPC applications
-
Enrico Calore
(
UNIFE and INFN
)
Energy to Solution vs Time to Solution, towards energy-aware HPC applications
Enrico Calore
(
UNIFE and INFN
)
11:30 - 11:55
Room: 1
Energy efficiency is quickly gaining importance in the HPC field. High-end processors are evolving towards more advanced power-saving and power-monitoring technologies, while low-power processors, designed for the mobile market, are gaining interest in the HPC area thanks to their increasing computing capabilities, in conjunction with their competitive pricing. On the other hand, from the software point of view, HPC applications are still optimized mainly for performance, often neglecting energy considerations, despite the fact that data-centers in the near future may start to account for consumed energy, instead of running time. In this work we explore how HPC applications may became more energy-aware; in particular we explore energy-profile benchmarks of actual HPC applications on different architectures, in order to compare their energy performance, but also to identify different available software strategies to tune energy consumption.
12:00
Low Power processor in HEP
-
Michele Michelotto
(
INFN
)
Low Power processor in HEP
Michele Michelotto
(
INFN
)
12:00 - 12:25
Room: 1
High Energy Physics benefits from an implicit parallelism at the level of the single physics event. Each event can be processed indipedentely making very easy the distribution of the event on a cluster of independent computing node. The problem is the huge number of events that requires thousand of power hungry worker node. The HEP community is starting to look at even bigger number of smaller but energy efficient processors. The talk will concentrate on the reference benchmark for HEP, called HS06 and the relative performance of present processor in term of HS06/Watt.
12:30
Low Power Computing in Gamma-Ray Astronomy
-
Denis Bastieri
(
UNIPD and INFN
)
Low Power Computing in Gamma-Ray Astronomy
Denis Bastieri
(
UNIPD and INFN
)
12:30 - 12:55
Room: 1
Gamma-Ray Astronomy is an optimal test-ground for Low-Power Computing and High-Throughput Computing. On the one hand, ground based detectors for Gamma-ray Astronomy are the prototypes for distributed experiments, as single detectors may be scattered in an area of few square kilometres, and the capability of each unit to process, at least partially, its own data before sending them to the central data acquisition provides a key advantage. On the other hand, satellite-born detector needs low-power on-board and huge computing power facilities for the ground processing. The talk will present some applications in the field of Gamma-Ray Astronomy, ranging from a GPU chain to build the model that best represent the data acquired in space (by evaluating the Maximum Likelihood Ratio), to an FPGA/ARM architecture used to process images collected by Cherenkov telescopes on ground.
13:00
Lunch
Lunch
13:00 - 14:00
Room: 1
14:00
Near-threshold Scalable Computing - The PULP experience
-
Luca Benini
(
ETHZ and UNIBO
)
Near-threshold Scalable Computing - The PULP experience
Luca Benini
(
ETHZ and UNIBO
)
14:00 - 14:25
Room: 1
The “internet of everything” envisions trillions of connected objects loaded with high-bandwidth sensors requiring massive amounts of local signal processing, fusion, pattern extraction and classification. Higher level intelligence, requiring local storage and complex search and matching algorithms, will come next. From the computational viewpoint, the challenge is formidable and can be addressed only by pushing computing fabrics toward massive parallelism and brain-like energy efficiency levels. We believe that CMOS technology can still take us a long way toward this vision. Our recent results with the PULP (parallel ultra-low power) open computing platform demonstrate that pj/OP (GOPS/mW) computational efficiency is within reach in today’s 28nm CMOS FDSOI technology. In the longer term, looking toward the next 1000x of energy efficiency improvement, we will need to fully exploit the flexibility of heterogeneous 3D integration, stop being religious about analog vs. digital, Von Neumann vs. “new” computing paradigms, and seriously look into relaxing traditional “hardware-software contracts” such as numerical precision and error-free permanent storage.
14:30
Energy-Aware Scheduling at the Leibniz Supercomputing Centre
-
Daniele Tafani
(
Leibniz Supercomputing Center
)
Energy-Aware Scheduling at the Leibniz Supercomputing Centre
Daniele Tafani
(
Leibniz Supercomputing Center
)
14:30 - 14:55
Room: 1
Due to rising energy prices and increasing carbon footprint, it is commonly accepted that the main constraint for future, sustainable many-Peta to Exascale HPC system will be dictated by power consumption. Along with the design of more energy-efficient hardware and cooling infrastructures, a viable way of addressing this challenge is offered by energy-aware scheduling. This presentation explains the approach adopted by the Leibniz Supercomputing Centre to reduce power consumption by employing energy aware management software and thorough power consumption monitoring. Specifically, we will describe the energy aware scheduling feature of IBM LoadLeveler, the resource and management system adopted in SuperMUC, one of the faster supercomputers in the world. This feature allows to select the most "energy-efficient" CPU frequency for a large fraction of SuperMUC's application portfolio and, therefore, contributes to substantially reducing the overall energy consumption of the system.
15:00
Climbing Mont Blanc - a Prototype System for Training in Energy Efficient Programming
-
Lasse Natvig
(
Norwegian University of Science and Technology
)
Climbing Mont Blanc - a Prototype System for Training in Energy Efficient Programming
Lasse Natvig
(
Norwegian University of Science and Technology
)
15:00 - 15:25
Room: 1
Climbing Mont Blanc (CMB) is an open online judge used for training in energy efficient programming of state-of-the-art heterogeneous multicores. It is based on an Odroid-XU3 board with an Exynos Octa processor and integrated power sensors. The system currently accepts C and C++ programs, with support for OpenCL v1.1, OpenMP 4.0 and Pthreads. Programs submitted using the graphical user interface are evaluated with respect to performance and energy-efficiency. A small and varied set of problems are available. We are not aware of any other online programming judges that reports energy-efficiency. The talk will present some early experience from using the CMB system, potential opportunities for collaboration and future work. Our long term goal is to learn more about energy-efficient computing on handheld devices from submissions to the system.
15:30
Execution of the DPSNN spiking neural network simulator on the nVIDIA Jetson TK1 platform
-
Alessandro Lonardo
(
INFN
)
Execution of the DPSNN spiking neural network simulator on the nVIDIA Jetson TK1 platform
Alessandro Lonardo
(
INFN
)
15:30 - 15:55
Room: 1
Fast simulation of spiking neural network models plays a dual role: it contributes to the solution of a scientific grand-challenge – i.e. the comprehension of brain activity – and, by including it into embedded systems, it can enhance applications like autonomous navigation, surveillance and robotics. The DPSNN is a spiking neural network simulator developed at the INFN APE lab. It is coded as a network of C++ processes, and it is designed to generate spiking behaviors and synaptic connectivity that do not change when the number of processing nodes is varied, easing the quantitative study of scalability. We used the DPSNN as a benchmark for the ARM-based nVIDIA Jetson TK1 platform measuring instantaneous power, total energy consumption, execution time and energetic cost per synaptic event. Results will be presented and compared against those obtained on an Intel Xeon platform.
16:00
Coffe Break
Coffe Break
16:00 - 16:30
Room: 1
16:30
Quantum ESPRESSO community code and the Exascale Challenge
-
Carlo Cavazzoni
(
CINECA
)
Quantum ESPRESSO community code and the Exascale Challenge
Carlo Cavazzoni
(
CINECA
)
16:30 - 16:55
Room: 1
QUANTUM ESPRESSO builds upon electronic-structure codes that have been developed and tested by some of the original authors of novel electronic-structure algorithms and applied in the last twenty years by some of the leading materials modeling groups worldwide. Innovation and efficiency are its main focus, with special attention paid to massively parallel architectures, and in the exascale challenge has been selected by many HPC centers, and technology providers world-wide as one of the application worth to be ported on new architecture. In the talk the refactoring effort and the porting strategies, toward exascale, will be presented and discussed together with preliminary results on new highly parallel chips.
17:00
High throughput data acquisition with InfiniBand on low power architectures
-
Matteo Manzali
(
UNIFE
)
High throughput data acquisition with InfiniBand on low power architectures
Matteo Manzali
(
UNIFE
)
17:00 - 17:25
Room: 1
LHCb experiment is preparing a major upgrade, during long shutdown 2 in 2018, of both the detector and the data acquisition system. A system composed of about 500 nodes and capable of transporting up to 50 Tbps of data will be required, this can only be achieved in a manageable way using a readout system based on commodity hardware and high-bandwidth data-centre switches. Several studies are ongoing in order to investigate different network and hardware technologies with the aim of reducing the purchase and maintenance costs of the system. In this presentation we will introduce InfiniBand and show preliminary tests with this network technology and x86 low power architectures. We will also describe how optimisations, like the usage of core-affinity, can affect the performances of such kind of systems.
17:30
Exploration of Future Computing Platforms at CMS
-
David Abdurachmanov
(
CERN
)
Exploration of Future Computing Platforms at CMS
David Abdurachmanov
(
CERN
)
17:30 - 17:55
Room: 1
Overview of various efforts at Compact Muon Solenoid (CMS) experiment at CERN on emerging general-purpose computing platforms for High Throughput Computing (HTC). We report our experience on software porting, performance, energy efficiency and building a demonstrator Worldwide LHC Computing Grid (WLCG) Tier-3 computing site at Princeton University based on ARMv8 64-bit Server-on-Chip.
18:00
Wrap Up Day1
Wrap Up Day1
18:00 - 18:30
Room: 1
20:30
Social Dinner at "Trattoria Noemi" via Ragno 31 (phone: 0532769070)
Social Dinner at "Trattoria Noemi" via Ragno 31 (phone: 0532769070)
20:30 - 22:30
Friday, 26 February 2016
09:00
GPU programming for complex fluids
-
Pinaki Kumar
(
Technische Universiteit Eindhoven
)
GPU programming for complex fluids
Pinaki Kumar
(
Technische Universiteit Eindhoven
)
09:00 - 09:25
Room: 1
In this contribution we will discuss issues related to the optimisation of Lattice Boltzmann multicomponent flow solver to study the physics of soft glassy system on multi-GPU platforms.
09:30
Structured Parallel Programming on multi-core wireless sensor networks
-
Stefano Chessa
(
UNIPI
)
Structured Parallel Programming on multi-core wireless sensor networks
Stefano Chessa
(
UNIPI
)
09:30 - 09:55
Room: 1
Wireless sensor network (WSN) platforms are now experiencing the same evolution of high performance computing (HPC) when it evolved from singlecore to multi-core architectures. Multi-core sensor platforms are expected to grow, especially in application domains that require complex processing of the sensed data, such as those that require image processing, data encryption, network coding, data fusion etc. The shift from single-core to multi-core sensor platforms also affects the WSN programming models. In fact, it introduces the need of high level abstractions to support parallel and distributed programming and models in WSN. This fact has recently suggested the adoption in WSN of methodologies such as skeletons that are largely used in the programming of parallel and distributed systems. Our work addresses the use of skeletons in the context of WSN, with the particular attention to multi-core sensors. In particular, leveraging on the fact that some meaningful WSN applications are characterised by known programming patterns (for example, in visual sensor networks, the stencil skeleton fits well object tracking applications), we aim at defining suitable models of computation for the most promising skeletons for WSN, and at combining the concepts of structured parallel programming and real-time sensing.
10:00
Porting and testing the Einstein Toolkit on the the generation of low-power architectures.
-
Roberto De Pietri
(
UNIPR and INFN
)
Porting and testing the Einstein Toolkit on the the generation of low-power architectures.
Roberto De Pietri
(
UNIPR and INFN
)
10:00 - 10:25
Room: 1
Low-Power architectures are subject of much interest also as viable alternatives to traditional HPC platform. In this talk we will focus on the performance that can now be obtained porting a large simulation toolkit (The EinsteinToolkit), widely used in Numerical Astrophysics to simulated matter coupled to the Einstein’s equations, to Low Power Architectures. We considered multicores / multi node cluster based on ARM and Intel low power processors and we compared results with a traditional HPC cluster, the Galileo system at CINECA. The work has been performed using the resources actually available for the INFN-COSA project.
10:30
Coffe Break
Coffe Break
10:30 - 11:00
Room: 1
11:00
Experience with Beignet OpenCL on low power Intel SoC
-
Felice Pantaleo
(
CERN
)
Experience with Beignet OpenCL on low power Intel SoC
Felice Pantaleo
(
CERN
)
11:00 - 11:25
Room: 1
This presentation will focus on our first-hand experience in running benchmarks using Open Source OpenCL, Beignet, on both the GPU and CPU of a low power Intel Skylake SoC.
11:30
Experience running codes on ARM64+GPU platforms
-
Filippo Spiga
Experience running codes on ARM64+GPU platforms
Filippo Spiga
11:30 - 11:55
Room: 1
The presentation is going to be focused on the first-hand experience in running CUDA-accelerated applications on ARM64 platforms with NVIDIA GPU Kepler cards. The talk will underline challenges, difficulties, weakness and strength of an heterogeneous platform.
12:00
Squeezing Deep Learning onto a Phone
-
Carlo Fantozzi
(
UNIPD
)
Squeezing Deep Learning onto a Phone
Carlo Fantozzi
(
UNIPD
)
12:00 - 12:25
Room: 1
Deep learning has recently emerged as one of the most promising techniques for classification, with breakthrough results in fields such as image recognition and natural language processing. However, deep learning calls for a tremendous amount of resources, chiefly in the training phase, but also during the inference phase. This may not be an issue when "Google-scale" computing facilities are available, but it hampers the applicability of deep learning in several fields where computing power, or memory capacity, or energy, are constrained. The talk will focus on one such field: mobile computing, where it is clear that client-side machine learning algorithms will play a key role in the next generation of applications, but nontrivial progress is required to tailor such algorithms to a limited computing or power budget. The talk will present preliminary observations and measurements taken from an image recognition application with convolutional neural networks. As HPC is also being infiltrated by mobile technologies, with ARM processors expected to appear in the Green500 list any time soon, such observations acquire a more general significance.
12:30
Wrap Up Day2
Wrap Up Day2
12:30 - 13:00
Room: 1
13:00
Lunch
Lunch
13:00 - 14:00
Room: 1
14:00
Closed session (TPC meeting, ...)
Closed session (TPC meeting, ...)
14:00 - 18:00
Room: 1