Best Practices and Optimization Tips for MLOps with GitHub Actions
MLOps and GitHub Actions series #5 | Learn the ins and outs of MLOps
Welcome back to another article in the Introduction to MLOps with GitHub Actions series! This series aims to be very beginner-friendly with low code for those who just wants to learn MLOps principles and apply it simply with GitHub Actions.
If you have missed the previous parts, please read them here.
In our journey of implementing MLOps with GitHub Actions, we've covered various aspects of MLOps:
MLOps principles: When do we need it and why?
Setting up your GitHub repository for MLOps
Implementing CI/CD pipelines with GitHub Actions to automate tasks such as testing and model training.
Implementing advanced techniques like conditional deployment, periodical retraining and A/B testing
Now, let's delve into some best practices and optimization tips to enhance the efficiency, scalability, and reliability of your MLOps workflows.
1. Modularize Workflows
Break down your workflows into smaller, modular components that focus on specific tasks or stages of the machine learning lifecycle, such as training the model, deploying the model, etc.
This allows for easier maintenance, reusability, and scalability of your workflows. For example, our .github/workflows
folder could be organized into multiple smaller workflows.
.github/
├── workflows/
│ ├── main.yml
│ ├── setup-environment.yml
│ ├── train-model.yml
│ ├── test-model.yml
│ └── deploy-model.yml
Each workflow can act as templates or reusable actions to encapsulate common tasks and promote consistency across projects. Then, you can have the main.yml
execute these modular workflows.
name: Example MLOps Pipeline with modular workflows
on:
push:
branches:
- main
schedule:
- cron: '0 0 * * 1' # Runs every Monday at midnight
jobs:
setup-environment:
uses: ./.github/workflows/setup-environment.yml
train-model:
needs: setup-environment
uses: ./.github/workflows/train-model.yml
test-model:
needs: train-model
uses: ./.github/workflows/test-model.yml
deploy-model:
needs: test-model
uses: ./.github/workflows/deploy-model.yml
2. Parallelize and Cache
Wherever possible, parallelize tasks within your workflows to reduce overall execution time. GitHub Actions allows you to run multiple jobs concurrently, speeding up the CI/CD process. We covered this in our A/B testing implementation in the previous article.
Additionally, we can leverage caching mechanisms to store dependencies and intermediate artifacts between workflow runs, further improving performance. For example, our setup-environment.yml
can cache dependencies as shown below.
name: Setup Environment
on: workflow_call
jobs:
setup:
runs-on: ubuntu-latest
steps:
- name: Checkout repository
uses: actions/checkout@v2
- name: Set up Python
uses: actions/setup-python@v2
with:
python-version: 3.9
- name: Install dependencies
run: |
python -m pip install --upgrade pip
pip install -r requirements.txt
- name: Cache pip # Caches dependencies to speed up subsequent runs
uses: actions/cache@v2
with:
path: ~/.cache/pip
key: ${{ runner.os }}-pip-${{ hashFiles('**/requirements.txt') }}
restore-keys: |
${{ runner.os }}-pip-
3. Use Matrix Builds
Matrix builds allow you to define multiple configurations for your workflows using a matrix strategy. For example, we can train and test models across different environments, platforms, or parameter settings in parallel, reducing redundancy and improving workflow flexibility.
To implement this, use GitHub Actions matrix
attribute like so:
name: MLOps Pipeline with matrix strategy
on:
push:
branches:
- main
jobs:
setup-environment:
uses: ./.github/workflows/setup-environment.yml
train-model:
needs: setup-environment
uses: ./.github/workflows/train-model.yml
strategy:
matrix:
python-version: [3.7, 3.8, 3.9]
# Parallelize training with different Python versions
test-model:
needs: train-model
uses: ./.github/workflows/test-model.yml
strategy:
matrix:
os: [ubuntu-latest, windows-latest, macos-latest]
# Parallelize testing on different OS
deploy-model:
needs: test-model
uses: ./.github/workflows/deploy-model.yml
4. Implement Retries and Timeouts
In GitHub Actions, we can configure retries and timeouts for actions within your workflows to handle transient failures or long-running tasks gracefully.
Let's review our test-model.yml
and see how to implement timeout with a fail fast strategy in our workflow.
name: Test Model
on: workflow_call
jobs:
test:
runs-on: ubuntu-latest
timeout-minutes: 30 # Set timeout to 30 minutes
strategy:
fail-fast: false # Continue running all jobs even if one fails
steps:
- name: Checkout repository
uses: actions/checkout@v2
- name: Set up Python
uses: actions/setup-python@v2
with:
python-version: 3.9
- name: Install dependencies
run: |
python -m pip install --upgrade pip
pip install -r requirements.txt
- name: Download accuracy artifact
uses: actions/download-artifact@v2
with:
name: accuracy
- name: Test model
run: python test.py
5. Monitor and Analyze Workflow Performance
It is important in MLOps to continuously monitor workflow execution metrics such as run duration, resource utilization, and success/failure rates to identify bottlenecks and areas for optimization.
Fortunately, GitHub Actions comes with workflow visualizations and run logs, as well as external monitoring tools and services, to gain insights into workflow performance and behavior.
Here's an example of a visualization graph. It is generated real-time graph to show the workflow run progress. We can use this graph to monitor and debug workflows.
Logs are automatically available by clicking on the workflow you would like to see in GitHub then clicking on the run itself to view its run log. You can visit their official documentation to learn more.
6. Version Control Everything
Finally, the best and most crucial practice is to version control everything: committing all code, configuration files, and workflow definitions to your GitHub repository.
This will ensure reproducibility, traceability, and collaboration across team members. And of course, when it comes to version control tools, what does GitHub not offer?
It is a rhetorical question. GitHub is one of the most robust and powerful version control tool there is!
Conclusion
By incorporating these best practices and optimization tips into your MLOps workflows with GitHub Actions, we can enhance productivity, reliability, and scalability while minimizing operational overhead. Continuous improvement and refinement of our workflows are essential to adapt to evolving requirements and maximize the value of MLOps in our organizations.
The next article is the last. We'll dive into learning about the future directions in MLOps. I hope you have found this article helpful. Stay tuned as we conclude this series of MLOps with GitHub Actions together. Cheers!
References
https://docs.github.com/en/actions/using-workflows/caching-dependencies-to-speed-up-workflows
https://docs.github.com/en/actions/using-jobs/using-a-matrix-for-your-jobs
https://docs.github.com/en/actions/using-workflows/reusing-workflows
https://docs.github.com/en/actions/using-workflows/triggering-a-workflow
https://docs.github.com/en/actions/using-jobs/using-concurrency
https://docs.github.com/en/actions/monitoring-and-troubleshooting-workflows/using-workflow-run-logs