Advanced MLOps Techniques with GitHub Actions

MLOps and GitHub Actions series #4 | Learn ins and outs of MLOps

#mlops #GitHub #GitHub Actions #Devops #ci-cd

Jul 24, 2024 — 4 min read

Welcome back to another article for Introduction to MLOps with GitHub Actions series! This series aims to be very beginner-friendly with low code for those who just wants to learn MLOps principles and apply it simply with GitHub Actions.

Please ensure you have read previous articles here.

In our previous articles, we covered the basics of:

MLOps principles: When do we need it and why?
Setting up your GitHub repository for MLOps
Implementing CI/CD pipelines with GitHub Actions to automate tasks such as testing and model training.

In this article, we'll explore advanced MLOps techniques and how to leverage GitHub Actions to implement them effectively.

Conditionally Deploy the Model

It is common practice to set conditions to determine whether the model is ready for deployment. After all, we do not want to automate deploying an ML Model that has low accuracy and precision! GitHub Actions has an if attribute we can specify to run an action based on certain conditions.

For simplicity of this tutorial, let's assume after training the model, its accuracy data is saved in the repository as accuracy.txt. Our condition to deploy the model is if the accuracy of the model is above 90%. Here is how we will write our deploy-model job.

Example of Conditionally Deploying the Model

jobs:
  deploy-model:
    runs-on: ubuntu-latest
    needs: test-model

    steps:
      - name: Checkout repository
        uses: actions/checkout@v2

      - name: Set up Python
        uses: actions/setup-python@v2
        with:
          python-version: 3.9

      - name: Install dependencies
        run: |
          python -m pip install --upgrade pip
          pip install -r requirements.txt

      - name: Read accuracy
        id: read-accuracy
        run: echo "::set-output name=accuracy::$(cat accuracy.txt)"

      - name: Deploy model # only deploy if accuracy is above 90%
        if: steps.read-accuracy.outputs.accuracy > 0.9
        run: python deploy.py

Periodical Model Retraining and Updating

Models deployed in production often need to be retrained and updated periodically to maintain performance and adapt to changing data distributions.

We can use GitHub Actions to automate the process of monitoring model performance and retraining models. For example, we can use the cron action under the on attribute to trigger a workflow periodically.

Let's say we want to run the workflow bi-weekly. The cron expression would be written as: 0 0 1,15 * *

Here's a breakdown of how the cron expression works:

0 (minute): At minute 0
0 (hour): At hour 0 (midnight)
1,15 (day of month): Every 1st and 15th day of the month
* (month): Every month
* (day of week): Every day of the week

So, this schedule triggers the workflow biweekly at midnight UTC

Example for Periodical Model Retraining

name: Model Retraining

on:
  schedule: # At midnight on the 1st and 15th of each month
    - cron: '0 0 1,15 * *'  

jobs:
  retraining:
    runs-on: ubuntu-latest

    steps:
      - name: Checkout Repository
        uses: actions/checkout@v2

      - name: Set up Python
        uses: actions/setup-python@v2
        with:
          python-version: 3.x

      - name: Install Dependencies
        run: pip install -r requirements.txt

      - name: Retrain Model
        run: python src/retrain.py

A/B Testing and Model Experimentation

A/B testing is a common technique used to evaluate the performance of different versions of a model in production. We can use GitHub Actions to automate the process of deploying and monitoring multiple model versions simultaneously, enabling A/B testing and experimentation to optimize model performance and user experience.

This is done by creating 2 jobs and allowing them to run in parallel. For example, we create a job called deploy_model_a to represent model A's deployment and deploy_model_b for model B. Below is how we can implement it.

Example for A/B Testing a Model

name: A/B Testing

on:
  push: # triggers whenever there's a push on main branch
    branches:
      - main

jobs:
  deploy_model_a: # represents deploying model A
    runs-on: ubuntu-latest

    steps:
      - name: Checkout Repository
        uses: actions/checkout@v2

      - name: Set up Python
        uses: actions/setup-python@v2
        with:
          python-version: 3.x

      - name: Install Dependencies
        run: pip install -r requirements.txt

      - name: Deploy Model A
        run: python src/deploy_model_a.py

  deploy_model_b: # represents deploying model B
    runs-on: ubuntu-latest

    steps:
      - name: Checkout Repository
        uses: actions/checkout@v2

      - name: Set up Python
        uses: actions/setup-python@v2
        with:
          python-version: 3.x

      - name: Install Dependencies
        run: pip install -r requirements.txt

      - name: Deploy Model B
        run: python src/deploy_model_b.py

Conclusion

By leveraging GitHub Actions, data scientists and machine learning engineers can implement advanced MLOps techniques such as conditional deployment, model retraining, and A/B testing seamlessly within their GitHub repositories.

Automation and integration are key to achieving efficiency, scalability, and reliability in machine learning workflows. I hope you have found this article helpful in exploring some MLOps techniques via GitHub Actions!

In the next article, we'll delve into best practices and optimization tips for MLOps with GitHub Actions. Stay tuned for more insights and practical examples as we continue our journey into the world of MLOps with GitHub Actions. Cheers!