Introduction to MLOps and GitHub Actions

MLOps and GitHub Actions series #1 | Learn the ins and outs of MLOps

#mlops #AI #GitHub #Devops #Machine Learning

Jul 2, 2024 — 4 min read

How many of you have seen the growth in the use of AI over the last few years? Probably a lot.

In recent years, we observed tremendous growth in AI usage. Companies across various industries have been leveraging the power of AI to provide innovative solutions. As models gets more complex and large, the need for efficient monitoring, deployment and monitoring has gradually become crucial in the success of AI usage.

In this series of articles, I will introduce to you the concept of MLOps (Machine Learning Operations) and its practical implementations, where these questions will be addressed:

What is MLOps?
Why do we need MLOps?
What are the best practices of MLOps to build and manage scalable models for enterprises?
How can we build a MLOps pipeline using GitHub Actions?
What are the key components of a solid MLOps pipeline?

Prerequisites

This series aims to be very beginner-friendly with low code for those who just wants to learn MLOps principles and an easy application with GitHub Actions. Before we get started, these following prerequisites are recommended to follow along:

Owns a GitHub account (if not, sign up here)
Code editor like Visual Studio Code

What is MLOps?

MLOps, also known as Machine Learning Operations, is a set of practices and principles used to streamline the ML processes from data cleaning to model deployment and management.

If you are familiar with the term DevOps, MLOps is essentially DevOps for - you guessed it - machine learning.

Why do we need MLOps?

MLOps is used to bridge the gaps among data science, machine learning as well as operations teams to efficiently build powerful and scalable models.

Practicing and implementing MLOps in an enterprise can achieve the following advantages:

Efficient collaboration across different functions
Automation in many processes that would be high friction
Continuously deploying up-to-date ML models
Continuously monitoring model performance

GitHub Actions: Overview

At a glance, GitHub Actions is a workflow automation tool available for free when using GitHub for version control.

GitHub Actions allow developers to automate tasks and build CI/CD pipelines directly in their GitHub repositories. This makes the tools super convenient to use to build a MLOps pipeline, which this series will cover in a step-by-step tutorial in later articles.

For more details, feel free to refer to my GitHub Action series here.

https://lo-victoria.com/series/github-actions

Understanding the ML Lifecycle

To get started with building a MLOps pipeline with GitHub Actions, let us first understand the stages of a typical ML model life cycle.

This is essential so you understand which stages we can automate using GitHub Actions.

Steps of Machine Learning Life Cycle

1. Data Collection/Preparation

First, gather relevant data from various sources to prepare it for analysis and processing.

2. Data Cleaning/Processing

In this stage, data is formatted to be suitable for training. Feature engineering will also be included to define the specific scopes the model will train on.

3. Model Training/Evaluation

Once data is prepared, ML models will be trained to find patterns and make predictions based in the data. The data is usually split into training, validation and test data.

After training, models will be using test data to evaluate their quality based on different metrics such as accuracy, precision and F1 score.

4. Model Deployment

When a satisfactory evaluation result is reached, the model can then be deployed and exported for use. In other words, it can now be used in production and make predictions with real data.

5. Monitoring/Maintenance

When a model is in production use, it is very crucial to continuously monitor its performance. Ensuring that its predictions are accurate and there is little to no latency are few of the many quality checks for good quality maintenance.

If there's a drop in the quality of the model, then we would need to re-train, update and deploy a new model. The cycle would repeat from gathering most up-to-date and relevant data.

Conclusion

That's it for article #1 in this series! I'm sure this is already a lot to take in so take your time in digesting these concepts.

Do check out the section below for more reading as well. In the next article, we will look at how to get started in setting up a GitHub repository for MLOps, some best practices in project structure and defining common MLOps principles.

Stay tuned. Cheers!