Teach on Udemy

Turn what you know into an opportunity and reach millions around the world.

Learn More

Your cart is empty.

Keep shopping

Data Science: Create Real World Projects

Name: Data Science: Create Real World Projects
Rating: 4.8 (14 reviews)

Learn about Data Science and Machine Learning with Python by Creating Super Fun Projects!

Created bySachin Kafle

Last updated 4/2022

English

What you'll learn

Learn to create real world Data science and Machine learning projects
Learn about different Machine learning models and algorithms
Learn about Data Science life cycle and apply methodologies for creating projects
Learn about different domains of Data Science: Feature engineering, Feature transformation, and model Melection
Learn about Natural Language Processing
Learn about Artificial Intelligence and how to use it to solve the Data Science problems

Course content

13 sections • 93 lectures • 19h 55m total length

Introduction1:48
Explore what data science is and how it solves real world problems, with foundational skills, feature engineering, model selection, and hands-on labs using open tools like Jupyter Notebook.

Install anaconda on your machine10:57
install anaconda on your machine to access over 1500 python packages, including numpy and pandas, and use conda to manage environments and launch jupyter notebook for data science.
Set up environment and Download Machine Learning Libraries14:06
Set up a python virtual environment with Anaconda, install NumPy, pandas, Matplotlib, seaborn, and scikit-learn, and launch a Jupyter notebook to build machine learning projects.
Introduction to Jupyter Notebook22:40
Explore the Jupyter notebook interface, learn to write Python code in cells, switch between code and markdown, run and restart kernels, and export notebooks for machine learning projects.

Data Science Methodologies9:10
Explore the data science methodology and its ten parts, from business understanding to deployment, highlighting iterative data collection, preparation, modeling, evaluation, and feedback-driven backtracking.
CRISP-DM model6:41
Use the CRISP-DM data mining methodology to guide data science projects from business understanding to deployment, addressing big data challenges, data integration, governance, and cost-benefit analysis.
Phases of CRISP-DM4:04
Explore the data understanding phase of CRISP-DM: collect initial data from various sources, describe and explore it, verify data quality, and answer key questions before data preparation.
Phases of CRISP-DM part 23:06
Prepare data after understanding it by selecting relevant data, cleaning and encoding, extending and formatting for modeling, then apply AI and machine learning algorithms.
Phases of CRISP-DM part 36:50
Explore the CRISP-DM modelling phase by selecting an algorithm, defining modelling goals, configuring hyperparameters, training and evaluating the model, and planning deployment with monitoring.

Why to clean the data?3:29
Data cleaning matters in data science, as dirty data causes false conclusions and costly failures; prevention costs beat failure costs in real-world data.
Data Quality8:32
Assess data quality by comparing data against predefined constraints and considering business needs; determine validity, accuracy, completeness, consistency, and uniformity to decide appropriate cleaning.
Check if data is valid or not?13:04
Explore data quality by evaluating data validity against business-specific constraints, including data type, range, mandatory, uniqueness, and set membership, with cross-field validation examples for real-world data.
Check if data is accurate or not?3:11
Explore data accuracy within data quality by comparing measured values to true values, distinguishing accuracy from validity, and using examples like a thermometer and model predictions.
Completeness of the data3:13
Explore data completeness by identifying relevant data for a customer purchase model; learn how missing income or education data can reduce prediction accuracy.
Consistency of the data3:41
Assess data quality by checking the consistency of data across fields and attributes. Ensure fields agree with each other and recognize that names alone may not predict outcomes.
Uniformity of the data7:07
Assess data uniformity by identifying mixed units and currencies, convert to a single scale, and ensure data quality through validity, accuracy, completeness, and consistency to improve real-world projects.
How to ensure data quality5:46
Learn to ensure data quality through a continuous, iterative cleanup cycle: inspect data against validity, accuracy, completeness, consistency, and uniformity, then clean, verify, and report improvements.
Inspect the data2:39
Inspect data quality by profiling against constraints and visualizing with charts to spot outliers, using the psychic loan library's pre-processing module to check issues.
Cleaning the data13:13
Cleanse data by removing irrelevant and duplicate records, and fix type mismatches. Impute or flag missing values, manage nonstandard values and outliers, and tailor cleanup steps to business needs.
Goal of data munging8:15
Transform raw, noisy data into meaningful, quality data through data wrangling to generate valid insights, improve machine learning model performance, and support proper decisions in real-world data science projects.
Understand your data7:34
Transform raw, unstructured data into structured formats through data wrangling, extracting invoice details from pdfs with ocr and tesseract, preparing data for machine learning projects.
Introduction to Outliers8:21
Identify how outliers depart from the main distribution with box plots, revealing positive and negative skewness in salary data, from modest earnings to billionaire incomes.
Finalize Data Munging12:08
Learn to tidy and normalize messy data by transforming rows and columns into clear observations, remove duplicates, and merge multiple tables through data wrangling for better analysis and preparation.

Handle data type mismatch28:37
Learn how to detect and clean data type mismatches in real world data using pandas, mapping checks, and filters to ensure homogeneous column types.
Remove Duplicate data13:19
Identify and handle duplicate data using pandas to clean datasets before machine learning projects, employing is_duplicated and drop_duplicates on full and subset columns.
Handling missing data12:35
Identify and handle missing data in pandas by using the any function to locate gaps, count them with sum, and fill them with fillna.
Feature Importance16:44
Identify and rank feature importance to improve model accuracy by using gradient boosting regression and permutation importance, while preprocessing data, encoding categoricals, and selecting meaningful features.
Plot feature importance plot6:29
Identify permutation-based feature importance with gradient boosting regression and select features above a threshold. Train models on the reduced feature set and compare accuracy and efficiency.

Introduction to Feature Importance11:10
Learn feature transformation through normalization and standardization to scale numerical inputs for machine learning, reducing bias toward larger values and improving algorithms like linear regression and distance-based methods.
Data Normalization4:02
Compare data normalization and standardization as feature transformation methods—normalization maps data to a 0–1 range, while standardization offers flexible scaling for better model performance.
Data Standardization10:08
Standardize data with a standard scaler to zero mean and unit variance, by subtracting the mean, dividing by the standard deviation, and verifying gaussian distribution with a histogram.
Normalization in practice11:49
Learn to apply normalization with a min-max scaler from cyclone, fitting and transforming data to a 0–1 range, then convert results back to a pandas dataframe for modeling.
Standardization in practice15:10
Explore standardization in practice using a standard scaler to center data by subtracting the mean and dividing by the standard deviation, and compare it with normalization to improve model performance.
Introduction to One Hot Encoding12:15
Learn how one hot encoding converts categorical variables into binary vectors by creating a column for each unique category, enabling machine learning models to use gender, color, and other categories.
One Hot Encoding in practice13:55
Apply one hot encoding to convert categorical variables into numerical features using a one hot encoder, transforming flight data from 2013 into training and testing datasets.

Types of data in Machine Learning8:41
Explore numerical data, including continuous and discrete types, with examples like height. Recognize time series, categorical, and textual data, and learn that categorical values must be encoded into numbers.
Structured format for datasets8:05
Explore structured datasets by identifying rows and columns, using features as inputs to predict a target in a machine learning workflow, and learn to download, explore, and visualize data.
Introduction to pandas library27:36
Learn the pandas library, built on numpy, for data preprocessing with series and dataframes. Master loading, inspecting, cleaning, and renaming data to prep real-world machine learning projects.
Train Test split Concept16:08
Use train-test split to separate data into training and testing sets, train on the training data, and evaluate on unseen testing data to measure model accuracy.

Decision Tree part 130:02
Learn to build a decision tree for classification and regression by splitting data with entropy and information gain, creating root and leaf nodes to predict outcomes.
Decision Tree part 224:44
Explore how information theory guides decision tree splits by measuring entropy, computing information gain, and choosing the split that maximizes gain, with pure and impure nodes explained.
Code: Decision Tree classifier29:16
Build a decision tree classifier by generating a 500-sample, four-feature dataset for a three-class classification problem. Prepare input features and labels in a pandas dataframe and visualize the tree.
Decision Tree: GINI index10:46
Learn to build decision tree classifiers using entropy, information gain, or gini impurity, implement with scikit-learn, and assess performance with cross-validation and confusion matrices in supervised learning.

Introduction to Linear Regression15:51
Explore regression within supervised learning, distinguish it from unsupervised learning, and learn how linear regression predicts continuous outputs from input vectors, aiming to minimize error on future data.
Learn about OLS [Ordinary Least Squares] algorithm28:37
Explore how ordinary least squares finds the best-fit line in simple and multiple linear regression, predicting salary from experience with omega one times x plus omega naught.
Introduction to working of Linear Regression32:09
Learn how linear regression finds the global minimum of the sum of squared errors using ordinary least squares, deriving the normal equations to estimate the intercept and slope.
Lecture: Introduction to MSE, MAE, RMSE12:20
In this data science course lecture, learn regression evaluation using mean squared error, root mean squared error, and mean absolute error to measure prediction accuracy.
Introduction to R squared10:55
Learn how R-squared evaluates linear regression by measuring how well the regression line fits data, akin to accuracy. The lecture illustrates variance, the fit, and a mouse size–weight example.
Implement Simple Linear Regression22:50
Implement simple linear regression to predict salary from years of experience, deriving the slope and intercept from data and minimizing residual error with a train-test split.

Learn about Logistic Regression6:01
Explore logistic regression as a classification method, contrasting it with linear regression and explaining binary and multiclass problems, where predictions are bounded to finite classes.
Learn about Gradient Descent5:47
Explore gradient descent for logistic regression, minimize the cross entropy cost with learning rate tuning, and update weights and bias toward the global minimum. Compare with linear and multivariate regression.
Implement Logistic Regression part 118:34
Implement a line-by-line logistic regression for binary classification of whether an employee will leave, using sigmoid and cross-entropy loss, while exploring and visualizing data to identify key features.
Implement Logistic Regression part 230:12
Explore logistic regression with data exploration and visualization, encoding categorical salaries using dummy variables, feature selection, train-test split, and evaluation via confusion matrices for employee retention prediction.

Requirements

Basic knowledge of Python programming is essential
You should know topics of programming like functions, data structures and object oriented programming

Description

FAQ about Data Science:

What is Data Science?

Data science encapsulates the interdisciplinary activities required to create data-centric artifacts and applications that address specific scientific, socio-political, business, or other questions.

Let’s look at the constituent parts of this statement:

1. Data: Measurable units of information gathered or captured from activity of people, places and things.

2. Specific Questions: Seeking to understand a phenomenon, natural, social or other, can we formulate specific questions for which an answer posed in terms of patterns observed, tested and or modeled in data is appropriate.

3. Interdisciplinary Activities: Formulating a question, assessing the appropriateness of the data and findings used to find an answer require understanding of the specific subject area. Deciding on the appropriateness of models and inferences made from models based on the data at hand requires understanding of statistical and computational methods

Why Data Science?

The granularity, size and accessibility data, comprising both physical, social, commercial and political spheres has exploded in the last decade or more.

According to Hal Varian, Chief Economist at Google and I quote:

“I keep saying that the sexy job in the next 10 years will be statisticians and Data Scientist”

“The ability to take data—to be able to understand it, to process it, to extract value from it, to visualize it, to communicate it—that’s going to be a hugely important skill in the next decades, not only at the professional level but even at the educational level for elementary school kids, for high school kids, for college kids.”

************ ************Course Organization **************************

Section 1: Setting up Anaconda and Editor/Libraries

Section 2: Learning about Data Science Lifecycle and Methodologies

Section 3: Learning about Data preprocessing: Cleaning, normalization, transformation of data

Section 4: Some machine learning models: Linear/Logistic Regression

Section 5: Project 1: Hotel Booking Prediction System

Section 6: Project 2: Natural Language Processing

Section 7: Project 3: Artificial Intelligence

Section 8: Farewell

Who this course is for:

This course is dedicated to those people who has some knowledge of programming and wants to learn about how to solve data science and machine learning problems
This course is for them who wants to built career in the field of Data science and Machine Learning
This course is for them who wants to learn data science in perfect way: by learning about feature engineering: data cleaning, transforming and using it to algorithms
This course is for them who wants to learn Machine Learning and Artificial Intelligence by creating fun projects

Data Science: Create Real World Projects

What you'll learn

Explore related topics

Course content

Welcome to the Course: Start with Introduction1 lecture • 2min

Data Science Environment Setup3 lectures • 48min

Data Science Lifecycle/Methodology5 lectures • 30min

Introduction to Data Cleanup/Munging14 lectures • 1hr 40min

Cleaning data (Coding session) : Feature Engineering5 lectures • 1hr 18min

Introduction to Feature Transformation7 lectures • 1hr 18min

Introduction to Machine Learning4 lectures • 1hr 1min

Introduction to Decision Tree4 lectures • 1hr 35min

Introduction to Linear Regression6 lectures • 2hr 3min

Introduction to Logistic Regression4 lectures • 1hr 1min

Requirements

Description

Who this course is for: