Best Practices and Optimization Tips for MLOps with GitHub Actions

MLOps and GitHub Actions series #5 | Learn the ins and outs of MLOps

#mlops #Devops #ci-cd #GitHub #github-actions

Jul 30, 2024 — 5 min read

Welcome back to another article in the Introduction to MLOps with GitHub Actions series! This series aims to be very beginner-friendly with low code for those who just wants to learn MLOps principles and apply it simply with GitHub Actions.

If you have missed the previous parts, please read them here.

In our journey of implementing MLOps with GitHub Actions, we've covered various aspects of MLOps:

MLOps principles: When do we need it and why?
Setting up your GitHub repository for MLOps
Implementing CI/CD pipelines with GitHub Actions to automate tasks such as testing and model training.
Implementing advanced techniques like conditional deployment, periodical retraining and A/B testing

Now, let's delve into some best practices and optimization tips to enhance the efficiency, scalability, and reliability of your MLOps workflows.

1. Modularize Workflows

Break down your workflows into smaller, modular components that focus on specific tasks or stages of the machine learning lifecycle, such as training the model, deploying the model, etc.

This allows for easier maintenance, reusability, and scalability of your workflows. For example, our .github/workflows folder could be organized into multiple smaller workflows.

.github/
├── workflows/
│   ├── main.yml
│   ├── setup-environment.yml
│   ├── train-model.yml
│   ├── test-model.yml
│   └── deploy-model.yml

Each workflow can act as templates or reusable actions to encapsulate common tasks and promote consistency across projects. Then, you can have the main.yml execute these modular workflows.

name: Example MLOps Pipeline with modular workflows

on:
  push:
    branches:
      - main
  schedule:
    - cron: '0 0 * * 1'  # Runs every Monday at midnight

jobs:
  setup-environment:
    uses: ./.github/workflows/setup-environment.yml

  train-model:
    needs: setup-environment
    uses: ./.github/workflows/train-model.yml

  test-model:
    needs: train-model
    uses: ./.github/workflows/test-model.yml

  deploy-model:
    needs: test-model
    uses: ./.github/workflows/deploy-model.yml

2. Parallelize and Cache

Wherever possible, parallelize tasks within your workflows to reduce overall execution time. GitHub Actions allows you to run multiple jobs concurrently, speeding up the CI/CD process. We covered this in our A/B testing implementation in the previous article.

Additionally, we can leverage caching mechanisms to store dependencies and intermediate artifacts between workflow runs, further improving performance. For example, our setup-environment.yml can cache dependencies as shown below.

name: Setup Environment

on: workflow_call

jobs:
  setup:
    runs-on: ubuntu-latest

    steps:
      - name: Checkout repository
        uses: actions/checkout@v2

      - name: Set up Python
        uses: actions/setup-python@v2
        with:
          python-version: 3.9

      - name: Install dependencies
        run: |
          python -m pip install --upgrade pip
          pip install -r requirements.txt

      - name: Cache pip # Caches dependencies to speed up subsequent runs
        uses: actions/cache@v2
        with:
          path: ~/.cache/pip
          key: ${{ runner.os }}-pip-${{ hashFiles('**/requirements.txt') }}
          restore-keys: |
            ${{ runner.os }}-pip-

3. Use Matrix Builds

Matrix builds allow you to define multiple configurations for your workflows using a matrix strategy. For example, we can train and test models across different environments, platforms, or parameter settings in parallel, reducing redundancy and improving workflow flexibility.

To implement this, use GitHub Actions matrix attribute like so:

name: MLOps Pipeline with matrix strategy

on:
  push:
    branches:
      - main

jobs:
  setup-environment:
    uses: ./.github/workflows/setup-environment.yml

  train-model:
    needs: setup-environment
    uses: ./.github/workflows/train-model.yml
    strategy:
      matrix:
        python-version: [3.7, 3.8, 3.9]
        # Parallelize training with different Python versions

  test-model:
    needs: train-model
    uses: ./.github/workflows/test-model.yml
    strategy:
      matrix:
        os: [ubuntu-latest, windows-latest, macos-latest]  
        # Parallelize testing on different OS

  deploy-model:
    needs: test-model
    uses: ./.github/workflows/deploy-model.yml

4. Implement Retries and Timeouts

In GitHub Actions, we can configure retries and timeouts for actions within your workflows to handle transient failures or long-running tasks gracefully.

Let's review our test-model.yml and see how to implement timeout with a fail fast strategy in our workflow.

name: Test Model

on: workflow_call

jobs:
  test:
    runs-on: ubuntu-latest
    timeout-minutes: 30  # Set timeout to 30 minutes
    strategy:
      fail-fast: false  # Continue running all jobs even if one fails
    steps:
      - name: Checkout repository
        uses: actions/checkout@v2

      - name: Set up Python
        uses: actions/setup-python@v2
        with:
          python-version: 3.9

      - name: Install dependencies
        run: |
          python -m pip install --upgrade pip
          pip install -r requirements.txt

      - name: Download accuracy artifact
        uses: actions/download-artifact@v2
        with:
          name: accuracy

      - name: Test model
        run: python test.py

5. Monitor and Analyze Workflow Performance

It is important in MLOps to continuously monitor workflow execution metrics such as run duration, resource utilization, and success/failure rates to identify bottlenecks and areas for optimization.

Fortunately, GitHub Actions comes with workflow visualizations and run logs, as well as external monitoring tools and services, to gain insights into workflow performance and behavior.

Here's an example of a visualization graph. It is generated real-time graph to show the workflow run progress. We can use this graph to monitor and debug workflows.

Screenshot of the visualization graph of a workflow run.

Logs are automatically available by clicking on the workflow you would like to see in GitHub then clicking on the run itself to view its run log. You can visit their official documentation to learn more.

6. Version Control Everything

Finally, the best and most crucial practice is to version control everything: committing all code, configuration files, and workflow definitions to your GitHub repository.

This will ensure reproducibility, traceability, and collaboration across team members. And of course, when it comes to version control tools, what does GitHub not offer?

It is a rhetorical question. GitHub is one of the most robust and powerful version control tool there is!

Conclusion

By incorporating these best practices and optimization tips into your MLOps workflows with GitHub Actions, we can enhance productivity, reliability, and scalability while minimizing operational overhead. Continuous improvement and refinement of our workflows are essential to adapt to evolving requirements and maximize the value of MLOps in our organizations.

The next article is the last. We'll dive into learning about the future directions in MLOps. I hope you have found this article helpful. Stay tuned as we conclude this series of MLOps with GitHub Actions together. Cheers!