Sep 17 – 21, 2018
Department of Physics and Geology
Europe/Rome timezone
School on Open Science Cloud

Coffee Seminar: Becoming a Data Scientist

Sep 20, 2018, 3:35 PM
Physics Building (Department of Physics and Geology)

Physics Building

Department of Physics and Geology

via Pascoli, snc 06123 - Perugia (IT)


Dr Valentin Kuznetsov (Cornell University (US))


In this talk we'll cover materials and knowledge base of how to become a world class Data Scientists. We'll use real kaggle competition dataset and build for it a set of ML models. During this session we'll introduce advanced concepts of ML training, like feature embedding, data transformation and ML model fine-tuning. We'll discuss how to work large datasets, techniques to avoid RAM limitations problems, and how to create an ensemble of multiple ML models. We'll use XGBoost library and Keras ML framework to build our models and embeddings matrix. Even though the materials of this session is quite advance we expect students to listen and follow up with introduced concepts.

Primary author

Dr Valentin Kuznetsov (Cornell University (US))

Presentation materials