· Aditya Raj · .md

Machine learning lifecycle management using MLflow

Machine-learning tasks are repetitive in nature. ML model lifecycle management plays a big role in that. While working on a machine-learning project, we often need to change datasets, algorithms, and other hyperparameters to achieve maximum accuracy. In this process, we need to keep a record of all the algorithms, trained models, and their metrics. Tracking all the changes in the project over a period of time can be cumbersome. This is where MLflow comes in handy. This article discusses machine learning lifecycle management for the entire lifecycle of a machine-learning project using MLflow.

What is machine learning lifecycle management?

There are several steps in a machine learning project, such as data cleaning, model training, and model deployment. The machine-learning lifecycle includes all the steps used to develop, test, and deploy machine-learning models.

In a machine-learning project, we perform a subset of the following tasks.

  1. Data collection: The first step in any machine learning or data science project is to collect and preprocess the data. In data collection, we identify the sources of data, collect it, clean it, and prepare it for analysis.
  2. Data preparation: Data preparation helps us convert the preprocessed data into a format that we can use to train a machine-learning model. It involves data preprocessing techniques, such as data normalization, feature selection, feature extraction, and data transformation.
  3. Model selection: In model selection, we select the appropriate machine-learning model for our use case. The choice of model depends on the nature of the problem, the size of the data, and the type of data.
  4. Model training: After selecting a machine-learning model, we train it using the prepared data. While training the model, we use different samples of training data, as well as hyperparameters to optimize the model's parameters and achieve the desired level of accuracy.
  5. Model testing: Once the model is trained, we test it on a separate test dataset to evaluate its performance. The test dataset is used to measure the model's accuracy and generalization performance. Each model trained using different data samples and hyperparameter is tested for accuracy and generalization.
  6. Model evaluation: After testing the trained models, we evaluate their accuracy. The evaluation process helps identify shortcomings in the model and assess its overall performance.
  7. Model deployment: Once the model is trained and evaluated, we can deploy it in a production environment. This step involves integrating the model into the production system and testing its performance under real-world conditions.
  8. Monitoring and maintenance: Once the model is deployed, it needs to be monitored and maintained to ensure that it continues to perform accurately. This involves monitoring the model's performance, retraining it periodically, and updating it as needed.

What is MLOps in machine learning?

MLOps is the practice of applying development operations (DevOps) principles to the machine-learning lifecycle management. MLOps is a framework for managing the entire machine-learning lifecycle from development to deployment and maintenance. It involves the integration of various tools and practices to streamline the machine-learning workflow and enable the automation of key processes in the entire project lifecycle.

MLOps covers a wide range of activities, including data management, model development, testing, deployment, monitoring, and maintenance.

Machine learning engineering with MLflow using the sklearn module

We can use MLflow to work with different software modules. Machine learning engineering with MLflow enables us to efficiently track experiments, manage models, and deploy them to production. In this tutorial, we will demonstrate the functions of MLflow using the sklearn module in Python.

Install and set up MLflow

To run the code in this tutorial, you will need the sklearn and MLflow modules in Python. You can install both these modules using the following command.

pip3 install mlflow scikit-learn

You will also need a database management system, such as MySQL or SQLAlchemy. I will use MySQL for the tutorial. As a prerequisite, create a separate database in your DBMS to store data related to this tutorial. I have created a database named mlflowdb as shown below.

Create a database in MySQL

After creating the database, we need to start MLflow Server. For this, we need to specify the location of the folder where the artifacts from the MLflow executions are stored. We will use the following directory to store the folder containing the artifacts.

Folder before starting the server

Start MLflow Server using a command prompt

Before we start working on the machine-learning project, we need to start MLflow Server using the following command.

mlflow server --backend-store-uri location-of-database --default-artifact-root location-of-directory-for-storing-artifacts
  • You need to pass the address of the database in place of the location-of-database variable with a specified username and password. I have used "mysql+pymysql://Aditya:Mysql1234#@localhost/mlflowdb".

    • "Aditya" is my username for logging in to the MySQL database.

    • "Mysql1234#" is the password for logging into MySQL.

    • I have the MySQL database installed on my computer. That's why the address is specified as "localhost".

    • "mlflowdb" is the name of the database.

  • In place of location-of-directory-for-storing-artifacts, you need to specify the location where artifacts need to be saved. I will save them to the "/home/aditya1117/HoneyBadger/honeybadger-codes/mlruns" directory.

The entire command looks as follows.

mlflow server --backend-store-uri   mysql+pymysql://Aditya:Mysql1234#@localhost/mlflowdb --default-artifact-root /home/aditya1117/HoneyBadger/honeybadger-codes/mlruns

After executing the above command, MLflow Server will be started on your system at port 5000. You can observe this in the following image.

Start MLflow Server

After starting the server, a folder named mlruns will be created in the "/home/aditya1117/HoneyBadger/honeybadger-codes/" directory, which previously contained only two folders.

Folder after starting server

The MLflow server runs on the 5000 port. Hence, you can open this port on your browser at the link localhost:5000 or 127.0.0.1:5000. You will see the following output on the screen:

GUI after starting server

Since we haven't executed any code, the MLflow server shows no data. Once we start experiments, we can observe the output in the GUI.

Track machine-learning models using MLflow Tracking

To track machine-learning models using MLflow Tracking, we will first create an MLflow experiment. Then, we will run the experiment and log all the metrics, data, and parameters to the MLflow server. Finally, we will go to the MLflow server to track different models. Next, we’ll discuss each step.

Create an MLflow experiment

To create an experiment in MLflow, we will first import the necessary modules into our program. As we will be using the K-nearest neighbors (KNN) classification algorithm to demonstrate the functions in MLflow, let us import both modules using import statements.

import mlflow
from sklearn.neighbors import KNeighborsClassifier

After importing the modules, we will create an MLflow experiment. For this, we first need to specify the address of the tracking server for the experiment.

tracking_server_uri = "http://localhost:5000"
mlflow.set_tracking_uri(tracking_server_uri)

After specifying the tracking server address, we start the experiment using create_experiment(), which takes the experiment name and returns an experiment ID if no experiment with that name exists (otherwise it raises an error).

experiment_name = 'KNNUsingMLFlow'
try:
    exp_id = mlflow.create_experiment(name=experiment_name)
except Exception as e:
    exp_id = mlflow.get_experiment_by_name(experiment_name).experiment_id

Once we create an experiment, it will appear on the MLflow Tracking server. You can go to the browser and refresh the tracking server URL. The screen gets updated as shown below.

GUI After Starting Experiment

In the above image, you can observe that an experiment with the name "KNNUsingMLFlow" has been created. You can select the experiment to see its details as shown below.

Experiment Selected in GUI

As we haven't started the experiment, the above screen shows no data.

Now, let us start the experiment and record different parameters and metrics on the MLflow server.

Start the MLflow experiment

To start the experiment, we will use the start_run() function. This function takes the experiment ID as an input argument to the experiment_id parameter. After execution, it starts the experiment. After this, we can record parameters and metrics in the MLflow server.

You can observe all the steps in the following code:

with mlflow.start_run(experiment_id=exp_id):
    data_points=[(2,10),(2, 6),(11,11), (6, 9), (6, 5), (1, 2), (5, 10), (4, 9),(10, 12),(7, 5),(9, 11),(4, 6), (3, 10), (3, 8),(6, 11)]
    #Create a list of class labels
    class_labels=["C2","C1","C3", "C2","C1","C1","C2","C2","C3","C1","C3","C1","C2","C2","C2"]
    #Create an untrained model
    n_neighbors=4
    untrained_model=KNeighborsClassifier(n_neighbors=4, metric="euclidean")
    #Train the model using the fit method
    trained_model=untrained_model.fit(data_points,class_labels)
    mlflow.log_param('n_neighbors', n_neighbors)
    mlflow.log_param('data_points', data_points)
    mlflow.log_param('class_labels', class_labels)
    mlflow.log_metric('number_of_classes', 3)
    mlflow.sklearn.log_model(untrained_model, "untrained_model")
    mlflow.sklearn.log_model(trained_model, "trained_model")
    mlflow.end_run()

This code uses KNN classification to predict labels by finding K-nearest neighbors and selecting the majority class. We train the model with 15 sample data points using KNeighborsClassifier() (with n_neighbors and Euclidean metric parameters), then use fit() to train and predict() to classify new points. Alternatively, use sklearn.autolog() to automatically log all parameters, models, and metrics.

with mlflow.start_run(experiment_id=exp_id):
    mlflow.sklearn.autolog()
    data_points=[(2,10),(2, 6),(11,11), (6, 9), (6, 5), (1, 2), (5, 10), (4, 9),(10, 12),(7, 5),(9, 11),(4, 6), (3, 10), (3, 8),(6, 11)]
    #Create a list of class labels
    class_labels=["C2","C1","C3", "C2","C1","C1","C2","C2","C3","C1","C3","C1","C2","C2","C2"]
    #Create an untrained model
    untrained_model=KNeighborsClassifier(n_neighbors=3, metric="euclidean")
    #Train the model using the fit method
    trained_model=untrained_model.fit(data_points,class_labels)
    mlflow.end_run()

The above code will also record all the metrics and parameters in the MLflow server.

After executing an experiment, if you go to the browser and refresh the URL of the MLflow server, you will observe that the experiment is recorded in the MLflow server with all the models, metrics, and parameters.

Machine learning lifecycle management GUI after running an experiment

The binary files of the machine-learning models are saved in the directory that we specified while starting the MLflow server. As we have specified the mlruns folder to store artifacts, you will observe that a new folder is created in the mlruns directory, as shown below.

Inside mlruns after an experiment

Here, 1 is the experiment ID. For each experiment, a separate folder will be created with the experiment ID as its name.

Inside the experiment ID folder, you will find another folder with a long alphanumeric name. There can be multiple folders in the experiment ID folder for a given experiment. Each time we run an experiment, a separate folder is created for each run, with the run ID as the name of the folder.

Inside expid after an experiment

Inside the run ID folder, you will see a folder named artifacts as shown below. This folder contains all the models saved during the experiment.

Inside run after experiment

Inside the artifacts folder, you will see a separate folder for each saved model in a single run for the experiment. As we have saved both the trained and untrained models, you will see two folders, as shown below.

Inside artifacts folder after an experiment

Inside the directory of a model, you will get configuration files, a binary file of the model, a YAML file describing the environment, and the requirements.txt file describing the dependencies. You can observe them in the following image.

Inside model after an experiment

The requirements.txt file looks as follows.

Requirements.txt file

The file describing the execution environment looks as follows.

python_env.yaml file

The conda.yaml file looks as follows.

conda.yaml file

You can observe that only the machine-learning models are saved in the file system, so where are the metrics and parameters stored?

They are stored in the database connected to the MLflow server.

As you can see in the following image, the database connected to MLflow doesn't contain any tables before the experiment.

Database before an experiment

Once we execute the experiment, different tables are created for storing metrics, parameters, the location of models, information regarding registered models, etc., as shown below.

Database after an experiment

You can track and visualize all the metrics and parameters in the GUI of the MLflow server. For this, go to any experiment and click on any run of the experiment. You will see the following screen with all the parameters, metrics, and models from the particular run of the experiment.

Tracking in GUI after an experiment

If you have run the same experiment at different times with different or the same parameters, each run is recorded in the MLflow server. You can observe this in the following image.

Compare models

To compare metrics and parameters in different runs, you can select two runs as shown in the above image. Then, click on the compare button. You will see the following output on the screen.

Compare models

In the above image, you can observe that the n_neighbors parameter is plotted against the metric number_of_classes for each run. You can also observe the scatter plot by clicking on the Scatter Plot button above the visualization, as shown below.

Compare models

You can also compare the metrics and parameters for each run side-by-side using the Box Plot button above the visualization, as shown below.

Compare models

At this point, we have discussed all the steps to track machine-learning models using the MLflow module.

Now, let us discuss how to manage machine-learning models using the MLflow registry component.

Working with trained machine-learning models using MLflow models

You can work with trained models using the MLflow Models component. Here, we can load an existing model from the file system and work with the model. We can also deploy the model using the MLflow server.

Load trained models into a Python program using MLflow

To load a trained machine-learning model from the file system into our program, we can use the load_model() function. This function takes the URI of the model and loads it into the program. We can then use this model to predict class labels for new data points using the predict() method. The predict() method, when invoked on the trained machine-learning model, takes a list of data points to classify. After execution, it returns an array containing a class label for each data point. You can observe this in the following example.

logged_model = 'runs:/c23f123eb48f4e45b4f08b506d7a2b80/trained_model'
loaded_model = mlflow.pyfunc.load_model(logged_model)
loaded_model.predict([(2,10)])

Output:

array(['C2'], dtype='<U2')

The model assigns class label C2 to data point (2, 10) by finding its three nearest neighbors ((2, 10), (3, 10), and (4, 9))which all have the C2 label, making C2 the majority class. You can predict multiple data points by passing them all to the predict() method in a single list. To load a registered model, simply use its name and stage instead of specifying the full URI, as shown below.

#load registered model
registered_model="models:/TrainedKNNModel/production"
loaded_model = mlflow.pyfunc.load_model(registered_model)
loaded_model.predict([(2,10)])

Output:

array(['C2'], dtype='<U2')

Deploy models in MLflow Server

To deploy a model in MLflow Server, we use the following syntax.

mlflow models serve -m location-of-the-model -p port_number --env-manager=local

Here is an explanation of the above code:

  • location-of-the-model is the location of the directory containing the model. I will set it to /home/aditya1117/HoneyBadger/honeybadger-codes/mlruns/1/8e3ec00972d94c8d851c043379a337ed/artifacts/trained_model.
  • port_number is the port at which we want to serve the model. I will set it to 1234.
  • I have set the env-manager variable to local because the model will be deployed in the local environment.

The complete command looks as follows.

mlflow models serve -m /home/aditya1117/HoneyBadger/honeybadger-codes/mlruns/1/8e3ec00972d94c8d851c043379a337ed/artifacts/trained_model -p 1234 --env-manager=local

After executing the above command, the model will be deployed at localhost:1234 as shown below.

Deploy model

Now we can send requests to localhost:1234/invocations or 127.0.0.1:1234/invocations with the input data to the model, and it will return the output. For instance, let us use the curl command to send a request to the model to classify the point (2, 10). For this, we will use the following syntax.

curl -d "[[2, 10]]" -H 'Content-Type: application/json' 127.0.0.1:1234/invocations

We will see the output in the terminal as follows.

Curl command

You can observe that the model has returned the value C2 in a list after classifying the point (2, 10). Hence, the model is deployed correctly at the specified port number.

Where do you go from here?

In this article, we explored machine learning lifecycle management using MLflow. We created an untrained model, trained it in repetitions, compared the results, and deployed a model.

I suggest that you execute the code in this article on your own system and experiment with the code and MLflow UI to better understand the concepts. This will help you easily manage your machine-learning projects.

I hope you enjoyed reading this article. Stay tuned for more informative articles.

Happy learning!

Aditya Raj

Written by

Aditya Raj

A machine learning engineer with a knack for writing.