# Project - Day 4 - MLFlow evaluate and fit

## Set parameters
The cell below has been already tagged as `parameters`. So use it to include any papermill parameter you think it would be useful to change from at MLFlow runtime. (e.g. the location of models trained in the previous step)

In [1]:
model_run_uri = "dummy"
threshold = 0.9

## Loading libraries, data and model

### Loading libraries and model from MLFlow

In [5]:
import warnings

warnings.filterwarnings('ignore')
## We will be using Numpy, Pyplot and Tensorflow as our scientific tool box
import numpy as np 
import matplotlib.pyplot as plt
import tensorflow as tf

## BytesIO for defining in-memory file-like objects
from io import BytesIO

## Dask and in particular dask array for defining OOM pipelines
import dask
import dask.array as da

## Progress bars
from tqdm import tqdm

import mlflow


2024-12-05 14:54:59.327027: I tensorflow/tsl/cuda/cudart_stub.cc:28] Could not find cuda drivers on your machine, GPU will not be used.
2024-12-05 14:54:59.367685: E tensorflow/compiler/xla/stream_executor/cuda/cuda_dnn.cc:9342] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-12-05 14:54:59.367735: E tensorflow/compiler/xla/stream_executor/cuda/cuda_fft.cc:609] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-12-05 14:54:59.367756: E tensorflow/compiler/xla/stream_executor/cuda/cuda_blas.cc:1518] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2024-12-05 14:54:59.374712: I tensorflow/tsl/cuda/cudart_stub.cc:28] Could not find cuda drivers on your machine, GPU will not be used.
2024-12-05 14:54:59.375294: I tensorflow/core/platform/cpu_feature_guard.cc:182] This Tens

### Reproduce the final result plot based on the new model trained from the pipeline

You should now be able to reproduce the steps of the Day-3 model deployment and adapt it to the MLFlow pipeline:

- load the model from the artifact location of the previous step
  - little help: `mlflow.artifacts.download_artifacts(artifact_uri=model_run_uri, dst_path="./models")`
- evaluate and fit the results, storing the plot as MLFlow artifacts


## SOLUTIONS

In [3]:
! wget https://pandora.infn.it/public/cdf340/dl/soscdata.zip
! rm -fr input
! mkdir -p input && cd input && unzip ../soscdata.zip

object_names = []
with open('../object_list.csv') as f:
    object_names = [x.strip("\n") for x in f.readlines()]

--2024-12-05 14:54:44--  https://pandora.infn.it/public/cdf340/dl/soscdata.zip
Resolving pandora.infn.it (pandora.infn.it)... 131.154.52.50
Connecting to pandora.infn.it (pandora.infn.it)|131.154.52.50|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 404408328 (386M) [application/force-download]
Saving to: ‘soscdata.zip.1’


2024-12-05 14:54:46 (267 MB/s) - ‘soscdata.zip.1’ saved [404408328/404408328]

Archive:  ../soscdata.zip
  inflating: data-chunk-2023-10-18T18:28:27.642050.npz  
  inflating: data-chunk-2023-10-18T18:28:39.359952.npz  
  inflating: data-chunk-2023-10-18T18:28:40.722029.npz  
  inflating: data-chunk-2023-10-18T18:28:43.152314.npz  
  inflating: data-chunk-2023-10-18T18:28:45.133516.npz  
  inflating: data-chunk-2023-10-18T18:28:47.170518.npz  
  inflating: data-chunk-2023-10-18T18:28:48.838282.npz  
  inflating: data-chunk-2023-10-18T18:28:51.503922.npz  
  inflating: data-chunk-2023-10-18T18:28:53.465447.npz  
  inflating: data-chunk-2023-1

In [6]:
def load_npz_from_minio(object_name):
  """Load an object from Minio into a numpy array"""
  return np.load("input/"+object_name)


def inspect_np(np_file):
    """Display key, shape and dtype of the arrays in a npz file"""
    keys = np_file.keys()
    print ("Keys in file: ", ", ".join(keys))
    for key in keys:
        array = np_file[key]
        print (
            f" - {key:<15s}"
            f"   shape: {str(array.shape):<20s}"
            f"   dtype: {array.dtype}"
          )

npz_file = load_npz_from_minio(object_names[-1])
print(npz_file)
inspect_np(npz_file)

NpzFile 'input/data-chunk-2023-10-18T20:44:35.884254.npz' with keys: image, tstamp
Keys in file:  image, tstamp
 - image             shape: (10, 128, 128)         dtype: uint8
 - tstamp            shape: (10,)                  dtype: datetime64[us]
