
Explore why data is the new oil, how targeted ads and data science drive business value, and a wide range of artificial intelligence applications transforming industries.
Discover how deep learning fuels breakthroughs in computer vision, NLP, and recommendations by learning complex nonlinear patterns from large data, while highlighting training time and data needs.
Explore the roles of data analysts, engineers, and scientists, from transforming data into insights and reports to building etl pipelines and deploying machine learning driven predictive models in production.
Data science follows a full process—from data collection and cleaning to exploration, modeling, interpretation, deployment, and ongoing monitoring—emphasizing real-world data quality and business communication.
Download and extract the course code from the resources, then upload the files to Google Drive. Open the Python notebooks in Google Colaboratory and run code blocks to view outputs.
Discover why Python powers data science with readable code, interpreted, high-level design, and extensible libraries for machine learning.
Explore Python basics with a hands-on crash course on variables, types, and simple operations, including printing, type inspection, and string concatenation in a Google Colab notebook.
Explore lists and dictionaries in Python, learn indexing, slicing, appending, and length checks, and build nested dictionaries to store and access complex data.
Explore how to implement Python conditional statements, including if, else, and elif, with booleans and range checks using and/or, while mastering indentation and whitespace for proper code flow.
Explore Python loops, including the range function, for and while loops, and break statements, with hands-on examples on iteration, incrementing variables, and collecting values.
Explore Python functions by building reusable code, from a circle area function using pi and radius to a cylinder volume function with multiple inputs, defaults, and readability notes.
Explore pandas, a Python library that enables high-performance data frames for easy data manipulation and analysis, turning raw tables into organized structures with indices, columns, and grouped averages.
Learn to load, save, and convert data frames with pandas, inspect with head and describe, and index, slice, and select columns using iloc and loc on the Titanic dataset.
Explore pandas data frame filtering to select passengers by criteria, including boolean indexing and multiple conditions, plus checking for missing cabins and combining filters for age and cabin data.
Learn essential data cleaning in pandas by loading datasets with proper encoding, renaming columns, reordering fields, and handling missing or numeric data using string operations.
Learn to engineer features with pandas by using apply and lambda to create a family size feature from existing columns, illustrated with a Titanic dataset and several methods.
Explore concatenating and joining data in pandas with concat, append, and merge, including axis handling, key alignment, duplicates, and left, right, and full outer joins.
Learn to create hourly time series data with pandas by generating a date range, converting to a dataframe, setting a datetime index, and performing resampling, mean aggregation, and date parsing.
Master advanced pandas techniques by replacing slow row-wise loops with vectorized calculations, apply and lambda row usage, and profiling to optimize distance calculations across large data frames.
Explore map and apply functions in Python, compare pythonic and non-pythonic approaches, and use zip and filter to transform and select numbers (e.g., square and even filtering).
Create map visualizations from scratch using Plotly, showing US state and county unemployment with FIPS-coded data, and global maps of life expectancy and GDP, with color scales and interactive features.
Explore heat maps, scatter plots, and lines with Plotly, using world maps, population-based marker sizes, continent colors, and country labels, plus line and great-circle flight visualizations.
Explore how statistics powers data analysis and forecasting in business, with hands-on focus on descriptive, inferential statistics, risk, probability, correlation, and modeling.
Learn how descriptive statistics summarize data and reveal patterns through simple visualizations, and perform exploratory data analysis to ensure data quality and meaningful insights.
Engage in descriptive statistics and exploratory data analysis with visualizations using Seaborn plots, including joint distribution, density, and empirical cumulative distribution function plots, illustrated with Titanic and wine data.
Explore how sampling, averages, and variance can be used to lie with statistics, examine misleading polls, and learn how to determine appropriate sample sizes to represent a larger population.
Explore sampling theory with random and stratified sampling to determine sample sizes, achieve representative samples, and minimize sampling error, illustrated by wine data and large population insights for business analytics.
Explore variance, standard deviation, and Bessel’s correction to measure data dispersion, compare spread with range, and learn to compute these metrics using Python and pandas.
Explore covariance and correlation, learn how normalization yields a standardized correlation from -1 to 1, and follow Python examples with pandas and seaborn for heatmaps and pairwise plots.
Explore why a strong correlation between margarine consumption and divorce rates does not imply causation, revealing how spurious correlations fuel misinformation and how to interpret data responsibly.
Explore how normal distributions relate to mean, median, and variance, and see how the central limit theorem makes the sampling distribution of sample means normal regardless of the population.
Learn how z-scores measure how far a value lies from the mean in units of standard deviation, transform distributions to a standard normal, and use percentiles to compare exam performance.
Explore the fundamentals of probability, from 50/50 coin flips to measuring likelihoods, and apply theoretical and empirical methods to assess business risks and campaign outcomes.
Estimate probability empirically by counting outcomes over many trials, showing how more trials converge toward the true value, with coin toss and marble examples.
Master the addition rule for probabilities, the omega sample space, and mutually exclusive versus overlapping events; apply to coins, dice, marbles, and cards, including with and without replacement.
Explore hypothesis testing by defining null and alternative hypotheses and understanding how significance guides decision making in business experiments, such as changing an e-commerce button color to affect sales.
Assess the strength and direction of a linear relationship with Pearson correlation (R). Perform hypothesis testing at alpha 0.05 to determine if the age and income relationship is statistically significant.
Analyze AB testing for marketing promotions with exploratory data analysis and visualizations to determine which promotion was most effective across stores, using age, market size, and promotion data.
Design a real-life a/b test using a hot dog example to define hypotheses, choose a testing metric, and assess sample size and statistical significance amid random variation.
Analyze an A/B test using a hot dog example to determine significance, compute confidence intervals and pooled standard error, and conclude when to reject the null hypothesis.
Connect a data source in Google Data Studio with Google Sheets, build your first dashboard, and explore dimensions, metrics, data types, and field names for unique customer counts by country.
Develop a customer analytics dashboard by transforming a table into charts, focusing on profit, average profit, and purchase metrics like invoices, to reveal top customers with clear visuals.
Create new fields in your data table, calculating total profit and total sale price from quantity and unit price, and use pivot tables to identify your most valuable customers.
Explore scorecards and KPI visualizations, create metrics like number of purchases, total and average profit per sale, customize layouts and colors, and prepare for time comparisons in future lessons.
Learn to build a profit scorecard that compares last month to the previous month with up or down indicators, and explore week-to-week and year-to-year comparisons in Google Data Studio.
Learn to create and customize bar charts in Google Data Studio, including horizontal, vertical, and 100 percent stacked charts for quarterly profits by country, with filters and styling.
Create and customize line charts to visualize total sales and profit, explore time series, drill down by country, compare metrics with combo and stacked charts, and apply styling.
Explore creating time series and line charts, handle missing data with linear interpolation, build cumulative plots, and use date-based comparisons to analyze sales and profit.
Visualize proportional data using pie charts and tree maps to compare profits and sales across customers and countries, with filters and styling to highlight top contributors.
Learn to create geographic plots in Google Data Studio, using world map visuals with country, region, and continent options, and apply filters and metrics like total profit.
Bullet graphs and area plots compare values, show target gaps, and support averages with ranges; blended data from multiple data sources with a common metric like customer ID.
Learn how to share Google Leader Studio reports with different permissions, manage link sharing, and control exports while using auto refresh with data sources like Google Analytics and Google Ads.
Discover how machine learning enables computers to learn from data by using training data to build models that predict outcomes like purchases or bankruptcy, and assess performance with accuracy.
Understand how a machine learning model maps input data to an output using weights and a bias, with linear equations and least squares.
Explore linear regression, modeling Y from X with a line (slope M, intercept B), and use gradient descent to minimize the mean squared error.
Build linear regression from scratch in Python with a cost function and gradient descent, tune a learning rate, compare to Eskalon's sklearn model, and predict Olympic 100m times.
Model nonlinear relationships with polynomial regression using polynomial features of a chosen order. Demonstrate multivariate linear regression with multiple inputs, like mpg from weight and horsepower, using Python.
Explore logistic regression as a binary classifier using a sigmoid to convert a linear score into a probability, define a decision boundary, and minimize a convex cost via gradient descent.
Discover how support vector machines use a hyperplane to maximize the margin between classes, with support vectors shaping the boundary and the kernel trick for nonlinear data.
Explore how decision trees split data with root and branch rules, using the Gini impurity and gain, then see how random forests boost accuracy via bootstrapping and majority vote.
Learn how to assess machine learning performance using accuracy, confusion matrix, precision, recall, and F1 score across binary and multiclass problems, with real-world examples.
Explore how the receiver operating characteristic curve and area under the curve assess a model’s ability to distinguish two classes, using thresholds, true/false positives and negatives, and precision and recall.
Explore what makes a good model by balancing generalization and overfitting, understanding bias-variance tradeoffs, and using regularisation, cross-validation, dropout, and data augmentation to improve unseen data performance and handle outliers.
Explore neural networks as a black box that learns nonlinear mappings from inputs to outputs, enabling accurate image and digit classification through hidden layers and deep learning algorithms.
Explore main deep learning models—feed-forward nets, CNNs, RNNs, and LSTMs—and see how CNNs use convolution and feature maps to classify image data.
Data Science, Analytics & AI for Business & the Real World™ 2025
This is a practical course, the course I wish I had when I first started learning Data Science.
It focuses on understanding all the basic theory and programming skills required as a Data Scientist, but the best part is that it features 35+ Practical Case Studies covering so many common business problems faced by Data Scientists in the real world.
Right now, even in spite of the Covid-19 economic contraction, traditional businesses are hiring Data Scientists in droves!
And they expect new hires to have the ability to apply Data Science solutions to solve their problems. Data Scientists who can do this will prove to be one of the most valuable assets in business over the next few decades!
"Data Scientist has become the top job in the US for the last 4 years running!" according to Harvard Business Review & Glassdoor.
However, Data Science has a difficult learning curve - How does one even get started in this industry awash with mystique, confusion, impossible-looking mathematics, and code? Even if you get your feet wet, applying your newfound Data Science knowledge to a real-world problem is even more confusing.
This course seeks to fill all those gaps in knowledge that scare off beginners and simultaneously apply your knowledge of Data Science and Deep Learning to real-world business problems.
This course has a comprehensive syllabus that tackles all the major components of Data Science knowledge.
Our Complete 2020 Data Science Learning path includes:
Using Data Science to Solve Common Business Problems
The Modern Tools of a Data Scientist - Python, Pandas, Scikit-learn, NumPy, Keras, prophet, statsmod, scipy and more!
Statistics for Data Science in Detail - Sampling, Distributions, Normal Distribution, Descriptive Statistics, Correlation and Covariance, Probability Significance Testing, and Hypothesis Testing.
Visualization Theory for Data Science and Analytics using Seaborn, Matplotlib & Plotly (Manipulate Data and Create Information Captivating Visualizations and Plots).
Dashboard Design using Google Data Studio
Machine Learning Theory - Linear Regressions, Logistic Regressions, Decision Trees, Random Forests, KNN, SVMs, Model Assessment, Outlier Detection, ROC & AUC and Regularization
Deep Learning Theory and Tools - TensorFlow 2.0 and Keras (Neural Nets, CNNs, RNNs & LSTMs)
Solving problems using Predictive Modeling, Classification, and Deep Learning
Data Analysis and Statistical Case Studies - Solve and analyze real-world problems and datasets.
Data Science in Marketing - Modeling Engagement Rates and perform A/B Testing
Data Science in Retail - Customer Segmentation, Lifetime Value, and Customer/Product Analytics
Unsupervised Learning - K-Means Clustering, PCA, t-SNE, Agglomerative Hierarchical, Mean Shift, DBSCAN and E-M GMM Clustering
Recommendation Systems - Collaborative Filtering and Content-based filtering + Learn to use LiteFM + Deep Learning Recommendation Systems
Natural Language Processing - Bag of Words, Lemmatizing/Stemming, TF-IDF Vectorizer, and Word2Vec
Big Data with PySpark - Challenges in Big Data, Hadoop, MapReduce, Spark, PySpark, RDD, Transformations, Actions, Lineage Graphs & Jobs, Data Cleaning and Manipulation, Machine Learning in PySpark (MLLib)
Deployment to the Cloud using Heroku to build a Machine Learning API
Our fun and engaging Case Studies include:
Sixteen (16) Statistical and Data Analysis Case Studies:
Predicting the US 2020 Election using multiple Polling Datasets
Predicting Diabetes Cases from Health Data
Market Basket Analysis using the Apriori Algorithm
Predicting the Football/Soccer World Cup
Covid Analysis and Creating Amazing Flourish Visualisations (Barchart Race)
Analyzing Olympic Data
Is Home Advantage Real in Soccer or Basketball?
IPL Cricket Data Analysis
Streaming Services (Netflix, Hulu, Disney Plus and Amazon Prime) - Movie Analysis
Pizza Restaurant Analysis - Most Popular Pizzas across the US
Micro Brewery and Pub Analysis
Supply Chain Analysis
Indian Election Analysis
Africa Economic Crisis Analysis
Six (6) Predictive Modeling & Classifiers Case Studies:
Figuring Out Which Employees May Quit (Retention Analysis)
Figuring Out Which Customers May Leave (Churn Analysis)
Who do we target for Donations?
Predicting Insurance Premiums
Predicting Airbnb Prices
Detecting Credit Card Fraud
Four (4) Data Science in Marketing Case Studies:
Analyzing Conversion Rates of Marketing Campaigns
Predicting Engagement - What drives ad performance?
A/B Testing (Optimizing Ads)
Who are Your Best Customers? & Customer Lifetime Values (CLV)
Four (4) Retail Data Science Case Studies:
Product Analytics (Exploratory Data Analysis Techniques
Clustering Customer Data from Travel Agency
Product Recommendation Systems - Ecommerce Store Items
Movie Recommendation System using LiteFM
Two (2) Time-Series Forecasting Case Studies:
Sales Forecasting for a Store
Stock Trading using Re-Enforcement Learning
Brent Oil Price Forecasting
Three (3) Natural Langauge Processing (NLP) Case Studies:
Summarizing Reviews
Detecting Sentiment in text
Spam Detection
One (1) PySpark Big Data Case Studies:
News Headline Classification
One (1) Deployment Project:
Deploying your Machine Learning Model to the Cloud using Flask & Heroku