
Introduces the course and outlines a three-part structure: general maths and constraints, tutorial numerical tasks, and coding videos for implementing machine learning algorithms.
Install Anaconda distribution on your machine to access over fifteen hundred Python packages and simplify package management and deployment, using the Navigator and Jupyter Notebook for machine learning projects.
Set up a Python virtual environment with Anaconda, activate a project-specific environment, and install key machine learning libraries—NumPy, pandas, Matplotlib, seaborn, and scikit-learn—for reproducible workflows.
Explore the notebook interface, learn to run Python code, switch between code and markdown cells, manage kernels, and render LaTeX for machine learning projects.
Define artificial intelligence as building smart machines that perform tasks requiring human intelligence, with weak versus strong AI, and position machine learning as the AI domain that learns from experiences.
Explore the fundamentals of machine learning, including supervised and unsupervised learning, and learn how classification and regression use labeled and unlabeled data to build predictive models.
Discover numerical data with continuous and discrete values, categorical data with finite categories needing encoding, plus time series and textual data in machine learning.
Explore structured datasets from Kaggle to identify input features and target outcomes, understand rows and columns, and practice splitting data into input and target for regression and classification models.
We learn data preprocessing for machine learning by importing pandas, numpy, and matplotlib, loading and inspecting a dataset, handling missing values, and selecting X and y to reduce dimensionality.
Preprocess data for regression and classification by filling missing values with min and mean strategies, encoding categorical yes/no with a level encoder, and standardizing salary for consistent feature scales.
Learn how to split data into training and testing sets using an 80/20 split, set a random state for repeatable results, and evaluate model performance on unseen data.
Install and import NumPy, the primary Python library for machine learning, then create and manipulate arrays for practical computations. Explore array operations, mathematical functions, and reshaping to multidimensional structures.
Master the pandas module for data manipulation in machine learning, creating series and data frames, loading datasets, preprocessing data, and in-place operations like renaming columns and dropping duplicates.
Encode categorical data by converting text into numeric features so models can learn from inputs. Explore one-hot encoding, dummy variables, and label encoding for finite class categories.
Split data into training and testing to evaluate a model on unseen data, preventing inflated accuracy. Use a specified test_size and random_state to ensure repeatable training and testing.
Explore dimensionality and its impact on model accuracy, and distinguish underfeeding from overfitting with linear and nonlinear data. Learn dimensionality reduction, cross-validation, and feature selection for regression and classification.
Build a classification decision tree using entropy and information gain. Identify root and leaf nodes, and use splits to achieve pure nodes.
Apply information gain and entropy to evaluate splits in a decision tree, identify the most informative split, and anticipate geny impurity in upcoming topics.
Explore building a decision tree for real-world data, using entropy and information gain to split nodes and achieve pure classifications of outcomes.
Learn to compute entropy for dataset attributes, build weighted entropy, and use information gain to select the best split in a numerical decision tree.
Learn how to build a decision tree using Gini impurity to minimize impurity and select splits from the root node, comparing it to information gain in a heart disease dataset.
In this workshop, learn to create a synthetic classification dataset from scratch (500 samples, 4 features, 3 classes), then implement and visualize a decision tree classifier using Python libraries.
Apply a decision tree classifier to classify data using entropy or impurity criteria, train with fit, evaluate with tenfold cross-validation, and interpret accuracy and confusion metrics.
Implement a three-class decision tree classifier, evaluate with confusion matrices, accuracy, precision, and recall, and visualize results with a seaborn heatmap and classification report.
Explore how linear regression uses training data to fit a line that predicts drug dose effectiveness, illustrating how higher doses relate to greater effectiveness and how the model generates predictions.
Examine why linear regression fails for nonlinear data, and learn how regression trees predict continuous values with leaf nodes containing values.
Learn how regression trees use decision nodes and leaf means to predict numerical outcomes, guided by variance reduction and sum of squared residuals, with dosage data as an example.
Explore how a regression tree selects decision thresholds, uses mean values at nodes, and minimizes sum of squared residuals to predict outcomes.
This workshop demonstrates building a decision tree regressor in Python, including pandas data loading, feature selection, train-test split, fitting, predicting, and evaluating with mean squared error.
Explore supervised learning basics, focusing on regression with a linear model that uses input X to predict continuous output y, and minimize prediction error on unseen data.
This lecture introduces ordinary least squares for simple linear regression, showing how to fit the line y = w0 + w1 x by minimizing the sum of squared residuals.
Learn how linear regression minimizes the sum of squared errors to find the global minimum, solving for omega_naught and omega_one through partial derivatives and the normal equations.
Compare regression outputs using MSE, MAE, and RMSE by measuring the distance between observed and predicted values. See how squaring errors punishes large mistakes and guides model evaluation.
Learn how r-squared serves as an evaluation metric for linear regression, comparing observed values to line-predicted values. See how it reflects the model's fit by explaining variance.
Apply ordinary least squares to fit simple linear regression, deriving the slope and intercept to produce the best-fit line for predicting grades from hours spent.
Explore simple linear regression by modeling salary as a function of years of experience, deriving the slope and intercept. Evaluate predictions with RMSE and residuals on a salary dataset.
Explore how multiple linear regression extends simple regression by using several inputs to predict profit from R&D spend, administration, and marketing spend, with beta coefficients and assumptions.
Explore multi linear regression with multiple independent variables, learn about beta coefficients and the intercept, and apply it to predict profit from factors like R&D spend, administration, and marketing spend.
Learn to implement multiple linear regression with one-hot encoding of state categories, split the data for training and testing, and evaluate with mean absolute error and root mean squared error.
Explore polynomial regression and its use for non-linear relationships, building on simple and multiple linear regression to predict salary from level in real data.
Implement polynomial regression to fit nonlinear data by creating polynomial features and using linear regression on higher-order terms; evaluate fit with R2 and RMSE across varying degrees.
Explore multivariate linear regression and how gradient descent minimizes the cost function to find optimal parameters, contrasting it with ordinary least squares.
Explore gradient descent for multivariate linear regression, compute partial derivatives of the cost function with respect to theta, and update using a learning rate alpha to approach the global minimum.
Implement multivariate linear regression with two features, load and split data into x and y, normalize features, and apply the theta0, theta1 x1, theta2 x2 model to predict the target.
Explore normalization for multivariate linear regression, compute mean squared error cost, and apply gradient descent to minimize it toward the global minimum with a bias term.
Demonstrates gradient descent for a multivariate regression model, initializes theta, uses a learning rate across 400 iterations, and visualizes cost reduction with a histogram and plot.
Explore how logistic regression solves classification problems, handling binary and multiclass tasks by predicting bounded outcomes, and compare it with linear regression, highlighting gradient descent and hypothesis representation.
Explore the hypothesized representation of the logistic function and why linear regression fails for binary classification; logistic regression yields outputs between 0 and 1 and uses a 0.5 threshold.
Learn how logistic regression uses the sigmoid function to map linear combinations of features to a bounded 0–1 probability, enabling binary classification and thresholding at 0.5.
Learn how logistic regression uses a decision boundary to separate two classes, illustrated by tumor examples, and how gradient descent optimizes weights and bias to classify new data.
Learn about the cost function of logistic regression, introducing the sigmoid hypothesis and cross-entropy loss to avoid local minima and guide training with gradient descent.
Explore logistic regression with gradient descent, using cross-entropy cost to converge to the global minimum by updating weights and bias with an adaptive learning rate.
Implement a logistic regression model to classify whether an employee will leave the company, using a sigmoid output and cross-entropy loss while identifying key features such as monthly hours.
Visualize data to identify key predictors, encode salary categories with dummy variables, select four features, and build a logistic regression model evaluated with a confusion matrix.
Machine learning is a branch of artificial intelligence (AI) focused on building applications that learn from data and improve their accuracy over time without being programmed to do so.
In data science, an algorithm is a sequence of statistical processing steps. In machine learning, algorithms are 'trained' to find patterns and features in massive amounts of data in order to make decisions and predictions based on new data. The better the algorithm, the more accurate the decisions and predictions will become as it processes more data.
Machine learning has led to some amazing results, like being able to analyze medical images and predict diseases on-par with human experts.
Google's AlphaGo program was able to beat a world champion in the strategy game go using deep reinforcement learning.
Machine learning is even being used to program self driving cars, which is going to change the automotive industry forever. Imagine a world with drastically reduced car accidents, simply by removing the element of human error.
Topics covered in this course:
1. Lecture on Information Gain and GINI impurity [decision trees]
2. Numerical problem related to Decision Tree will be solved in tutorial sessions
3. Implementing Decision Tree Classifier in workshop session [coding]
4. Regression Trees
5. Implement Decision Tree Regressor
6. Simple Linear Regression
7. Tutorial on cost function and numerical implementing Ordinary Least Squares Algorithm
8. Multiple Linear Regression
9. Polynomial Linear Regression
10. Implement Simple, Multiple, Polynomial Linear Regression [[coding session]]
11. Write code of Multivariate Linear Regression from Scratch
12. Learn about gradient Descent algorithm
13. Lecture on Logistic Regression [[decision boundary, cost function, gradient descent.....]]
14. Implement Logistic Regression [[coding session]]