Description
We propose an hackathon divided into two challenges: in the first one, participants will tackle a regression problem using the well-known 'Asteroid Dataset', where they need to estimate the diameter of various types of asteroids. In the second challenge, participants will face a classification problem aimed at reconstructing the diagnosis of diabetes for different types of patients based on survey responses.
Project proposal: general context
Astronomy and Health Physics.
Input dataset
-Asteroid dataset: 130,000 samples, 40 features - csv file.
-Diabetes dataset: 400,000 samples, 30 features - csv file.
Currently, we are in doubt about what data storage to use but we'll solve this problem quickly.
Project proposal: description of the problem
We are organizing an hackathon featuring two distinct challenges. The first challenge focuses on a regression task, where participants will use the renowned "Asteroid Dataset" to predict the diameter of asteroids based on various characteristics such as composition and orbital parameters. This problem will test the participants' ability to handle numerical and categorical data and build accurate predictive models.
The second challenge is centered on classification, where the goal is to determine the likelihood of diabetes diagnosis for a diverse set of patients. Using responses from a comprehensive health survey, participants will need to analyze and classify the data, applying machine learning techniques to predict whether a patient has diabetes based on lifestyle factors, medical history and other elevant features.
Goal and FOM
Regression task: coefficient of determination, R^2.
Classification task: F1 score.
Machine learning methods
Repeated cross validation, feature selection algorithms (Boruta), classical statistical tools, data manipulation techniques, data augmentation techniques (SMOTE), Linear Regression, Logistic Regression, Random Forest Classifier/Regressor, XGBoost Classifier/Regressor.