
Understand how version control clarifies code, data, and models by tracking changes and enabling time travel for historical review, speeding development and enabling team collaboration.
Explore using GitHub as a remote code repository, including signing up, creating repositories, cloning and pushing code, and managing issues and pull requests.
generate an ssh key pair, optionally set a passphrase, and add the public key to GitHub settings to enable push access without passwords.
Explore branch types: main branch, master branch, and develop branch, plus feature, release, and hot phase rate branches; learn to create, switch, rename, and delete branches locally and remotely.
Learn practical git basics for data science workflows, including tracking files, identifying ignored vs untracked files, and essential commands such as init, clone, pull, push, and fetch to manage changes.
Learn how to use DAGsHub with Label Studio to create a medical annotation project: upload data, configure labels (drug, dosage, disease, symptoms, procedure, treatment), and commit annotated results to GitHub.
Learn practical git branching for machine learning workflows: create and manage branches (main, develop, feature, release, hotfix), switch between them, rename, view, and delete branches for model versions.
Learn to track machine learning experiments with DagsHub, logging metrics via yaml configurations, recording model parameters, comparing base and improved models, and pushing results through git workflows.
Push your DVC pipelines from your workspace to DagsHub by configuring a remote and private repository, then push data and code with git and DVC.
Our modern world runs on software and data, with Git - a version control tool we track and manage the different changes and versions of our software. Git is very useful in every programmer's work. It is a must-have tool for working in any software-related field, that includes data science to machine learning.
What about the data and the ML models we build? How do we track and manage them?
How do data scientist, machine learning engineers and AI developers track and manage the data and models they spend hours and days building?
In this course we will explore Git and DVC - two essential version control tools that every data scientist, ML engineer and AI developer needs when working on their data science project.
This is a very new field hence there are not a lot of materials on using git and dvc for data science projects. The goal of this exciting and unscripted course is to introduce you to Git and DVC for data science.
We will also explore Data Version control, how to track your models and your datasets using DVC and Git.
By the end of the course you will have a comprehensive overview of the fundamentals of Git and DVC and how to use these tools in managing and tracking your ML models and dataset for the entire machine learning project life cycle.
This course is unscripted,fun and exciting but at the same time we will dive deep into DVC and Git For Data Science.
Specifically you will learn
Git Essentials
How Git works
Git Branching for Data Science Project
Build our own custom Version Control Tools from scratch
Data Version Control - The What,Why and How
DVC Essentials
How to track and version your ML Models
DVC pipelines
How to use DAGsHub and GitHub
Label Studio
Best practices in using Git and DVC
Machine Learning Experiment Tracking
etc