Python for Data Science & Machine Learning Foundations

Master NumPy, Pandas, Matplotlib, Scikit-Learn and PyTorch with real African datasets — before your first ML model.

New

Created byGeneral Gichohi Kihara

Last updated 6/2026

English

What you'll learn

Write clean Python for data science: comprehensions, OOP, file I/O, and *args/**kwargs
Use NumPy arrays, broadcasting, and vectorisation instead of slow Python loops
Wrangle messy real-world data using Pandas groupby, merge, and feature engineering
Produce publication-quality EDA charts with Matplotlib and Seaborn
Build production-ready Scikit-Learn pipelines that prevent data leakage
Write a PyTorch training loop from scratch: tensors, autograd, nn.Module, DataLoader
Apply hypothesis testing and distributions to make better modelling decisions
Set up a full Colab + Google drive environment for any data science project

Course content

7 sections • 32 lectures • 6h 33m total length

Python Patterns You Will Use Every Day8:24
File I/O and JSON — Reading Real Data Files9:02
OOP Essentials — Why Sklearn Works the Way It Does21:48
Virtual Environments and Colab Setup5:53
Python Refresher — Test Your Knowledge

Requirements

Basic Python knowledge — you should know what a function, loop, and list is
No prior data science or ML experience needed
A Google account (all work is done in free Google Colab — no local setup required)
Willingness to run real code on real datasets every lesson

Description

Most students fail their first ML course not because the algorithms are hard — but because they can't read the data, clean it, or understand what the model is operating on.

This course fixes that. You'll build the exact Python foundation that every professional data scientist uses before touching a single algorithm: NumPy arrays, Pandas wrangling, Matplotlib visualisations, Scikit-Learn pipelines, PyTorch training loops, and statistical thinking.

Every lesson uses real datasets so the skills feel immediately practical, not textbook-abstract.

Every dataset in this course comes from real-world problems — agriculture, finance, and public health — so you're never practising on made-up numbers. You'll know how to handle the kind of messy, incomplete, real data that actually shows up on the job.

By the end of this course you will be able to: load any real-world dataset, clean and wrangle it with Pandas, visualise it for EDA, build a full Scikit-Learn preprocessing pipeline, write a PyTorch training loop from scratch, and apply the right statistical test to support your modelling decisions.

This is not a detour from machine learning. This is the ML infrastructure. Students who complete this course go on to finish ML courses — students who skip it simply do not.

Each module comes with a downloadable cheatsheet and a hands-on Colab notebook with exercises and solutions — so you are not just watching videos, you are building a personal reference library you will use for years. Everything runs in free Google Colab. No paid software, no complex local setup, no excuses.

Who this course is for:

Python developers who want to transition into data science or ML
Students who have tried an ML course and felt lost when the data got messy
Self-taught programmers building a formal data science foundation
Anyone working with agricultural, financial, or survey data
Engineers enrolling in the companion ML & Deep Learning course

Python for Data Science & Machine Learning Foundations

What you'll learn

Explore related topics

Course content

Python Refresher for Data Science4 lectures • 45min

NumPy — The Engine Under Everything5 lectures • 1hr 5min

Pandas — Data Wrangling Mastery7 lectures • 1hr 37min

Matplotlib & Seaborn — Visualisation for ML4 lectures • 46min

Scikit-Learn Foundations — Before the Algorithms4 lectures • 44min

PyTorch Foundations — Before the Neural Networks5 lectures • 1hr 7min

SciPy & Statistics — Understanding Your Data3 lectures • 30min

Requirements

Description

Who this course is for: