# Project - Day 4 - MLFlow training of your model

## Insert MLFlow parameters
The following cell is marked as `parameters`, you might find useful to include MLFlow usable parameters here for varying and experimenting different values for the CNN.

In [12]:
batch_size = 10
n_epochs = 5

## Excercise

Based on the Training step of the project done on day 3:

- train a model and store the metrics of the training process in MLFlow. e.g.:
```python
with mlflow.start_run(tags={"mlflow.runName": "train"}) as mlrun:

    losses = []
    val_losses = []
    !pip install -q tqdm
    from tqdm import trange
    
    #n_epochs = 5
    n_blocks = y_train.numblocks[0]
    
    for epoch in trange(n_epochs):
        for X, y in zip(X_train.blocks, y_train.blocks):
            losses.append(
                (len(losses)/n_blocks, classifier.train_on_batch(X.compute(), y.compute()))
            )
        ls = classifier.test_on_batch(X_valid, y_valid)
        val_losses.append(
            (len(losses)/n_blocks,ls)
            )
        mlflow.log_metric("loss", ls, step=int(len(losses)/n_blocks))

```

- store the model in MLFlow of the usage on the next step of the pipeline, e.g.:

```python
    classifier.save("classifier.keras")
    mlflow.log_artifact("classifier.keras")
    prds = classifier.predict(X_valid.compute())
    signature = infer_signature(X_valid.compute(), prds)
    mlflow.tensorflow.log_model(classifier, "model", registered_model_name="CYGNO_CNN", signature=signature)
```

- store any additional plot that you find useful to track as a MLFlow artifact

## SOLUTIONS

In [13]:
%%bash

## Download the training dataset from an INFN archive
wget https://pandora.infn.it/public/269d22/dl/training_set.zip -qO $HOME/data/training_set.zip

## Install the unzip utility 
#apt-get -qy install unzip

## Extract the archive
cd $HOME/data/
unzip -qn $HOME/data/training_set.zip

In [14]:
import warnings

warnings.filterwarnings('ignore')

from glob import glob
filenames = glob("/home/jovyan/data/data/export/*/*/*/*.png")
print (f"Found {len(filenames)} filenames")

import mlflow
from mlflow.models import infer_signature

Found 612 filenames
