
Begin your journey from absolute zero to hero in ml ops by following the course in order, using sections, lectures, and articles; review the GitHub repository for notes.
Access the public GitHub repository for the course and bookmark it for updates. Explore sections like Introduction to MLOps and Experiment tracking; use the notes and MLflow to revise anytime.
Learn how machine learning trains a model from data to identify patterns and predict outputs. See how data sets and algorithms turn training into a model that predicts flower types.
Data scientists build a model from a flower dataset, split 80/20 for training and testing, train with an algorithm, and package the final model in pkl, Joblib, or ONNX.
Explore how MLOps extends DevOps to the machine learning lifecycle, automating CI/CD, infrastructure as code, and Kubernetes management for rapid model deployment.
Explore the machine learning lifecycle from problem definition to deployment and monitoring in production, including data collection, cleaning, feature engineering, model selection, training, evaluation, and maintenance.
Data scientists define problems, gather and clean data, engineer features, and train models. ML engineers productionize models with APIs; MLOps engineers automate pipelines, deployment, observability, and infrastructure.
Discover how data scientists address a requirement by clarifying requirements with the product owner, gathering data, setting up a Python environment, training a model with iris data, and saving artifacts.
MLOps engineers automate data science workflows with version control, RBAC and auditing, and CI/CD, enabling multi-python training, automated environment setup, model training, and artifact storage.
Discover how MLOps engineers, data scientists, and ML engineers collaborate to train, save, run, and ship models, with MLOps engineers supporting training, saving, and deploying.
Discover how DVC provides data version control by moving large data sets and models to remote cloud storage (S3 bucket, Azure Blob, Google Cloud Storage) with versioning.
Master data versioning with DVC and AWS S3, integrating git to manage wine prediction data and its versions via DVC add and push to remote storage.
Track every training run to capture learning rate and other parameters, code and data set versions, metrics, artifacts, and system information, enabling reproducible experiments and progress toward the target efficiency.
Discover how MLflow centralizes experiment tracking, model versioning, and deployment, replacing Excel sheets, with MLOps engineers deploying a centralized server and data scientists instrumenting Python scripts.
Install mlflow on your local machine using a python virtual environment, run mlflow ui with a sqlite backend on port 7006 for demo purposes; production setup comes in later.
Learn how to install and configure MLflow on a local Kubernetes cluster using kind, via Helm charts or manifests, with port forwarding for access, for a demo or PoC.
Deploy MLflow in production on Kubernetes by linking a MLflow server to an AWS RDS Postgres database. Create a dedicated MLflow database and user, and configure with Helm.
Connect to an MLflow instance from the terminal, set tracking URI and experiment, then update train.py to log parameters and metrics for a wine prediction model using DVC and MLflow.
Learn how MLflow enables data scientists and MLOps engineers to compare multiple runs within an experiment using box plots, parameter and metric comparisons, git commits, and artifacts.
Explore model deployment and model serving, packaging, versioning, and deploying ML models with scalable APIs, runtime resources, and autoscaling in production.
Explore four popular model deployment and serving strategies in production: virtual machines, Kubernetes, managed SageMaker, and k serve with Knative serving.
Train the intent classifier model with train.py, save the artifact, and expose predictions via a local Flask API at /predict.
Implement production-grade ml model deployment with a VPC, subnets, internet gateway, and load balancer, using WSGI and nginx to enable dynamic auto scaling and handle concurrency.
Deploy the intent classifier with WSGI on an EC2 instance using Gunicorn, create a virtual environment, install dependencies, run the server, and test the /predict endpoint with curl.
Write a user data script for auto scaling group launch template to install the model, API, and Python dependencies. Configure Gunicorn and Nginx as scalable services that start on boot.
Delete cloud resources in the correct order to avoid charges, removing the auto scaling group, load balancer, target group, launch template, security group, subnet, and the VPC.
Deploy and serve a model on a Kubernetes cluster by preparing a Dockerfile, building an image, pushing to a model registry, and deploying with manifests and ingress for inference.
Prepare a dockerfile using a python slim base, set /app, and separate layers for requirements and source code. Train the model, expose port 6000, and run gunicorn.
Learn to deploy a model container on a Kubernetes cluster by creating a namespace, deployment with replicas, and a service for stable access, using manifests for namespace, deployment, and service.
Install a traffic ingress controller in a Kubernetes cluster and learn how ingress resources create a load balancer.
Demonstrate deploying a model in real time with Kubernetes ingress and an ingress controller. Configure an ingress resource to route example.com/predict to the intent classifier service via a load balancer.
Discover KServe, the open-source kubernetes-based platform that automates model deployment, serving, and inference across frameworks like scikit-learn, TensorFlow, XGBoost, and PyTorch, with automatic scaling via native Kubernetes and Keda.
Install cert-manager, set up a local kind cluster, install CRDs and KServe, and deploy an inference service for a sample model exposed via port-forward.
Explore kserve for llm ops and why the same platform used for mlops, such as mlflow and csv, supports llm ops; practice with multiple models to boost your resume.
Learn to get started with Amazon SageMaker AI, create domains for teams, set up user profiles, and access SageMaker Studio tools like Jupyter notebooks, pipelines, and MLflow.
Implement SageMaker production setup by creating a domain in a VPC with subnets and a SageMaker domain execution role, using IAM authentication and ABAC for user profiles.
Learn to deploy and serve models on SageMaker by packaging pickle models with inference.py into a tar.gz, uploading to S3, and creating endpoints; compare two deployment approaches.
MLOps Zero to Hero is a practical, hands-on course designed to help engineers understand how machine learning systems are built, deployed, and operated in real production environments. The course focuses on the real challenges teams face after a model is trained versioning data, tracking experiments, deploying models, scaling inference, and managing ML workloads reliably.
You will start with the fundamentals of the ML lifecycle and gradually move into core MLOps practices. The course covers data and model versioning using DVC, experiment tracking with MLflow, and containerization using Docker. You will deploy models on Kubernetes, understand production-grade serving patterns, and implement Kubernetes-native inference using KServe.
The course also introduces AWS-based MLOps workflows, including Amazon SageMaker, to help you understand how managed platforms are used in real organizations. You will further explore Kubeflow to learn how ML pipelines and training workloads are orchestrated in Kubernetes environments.
Every concept is explained using simple examples and real-world workflows, with a strong emphasis on clarity and practical understanding rather than theory. By the end of the course, you will have a complete picture of how machine learning moves from experimentation to production — and the confidence to design, deploy, and operate MLOps systems in real projects.