Udemy
    •  
    •  
    •  
    •  
    •  
    •  
    •  
    •  
Turn what you know into an opportunity and reach millions around the world.
Learn More
Your cart is empty.
Keep shopping
Data Science with Python 3.x
Rating: 4.1 out of 5(15 ratings)
156 students

Data Science with Python 3.x

Gain useful insights from data by performing popular data science techniques using Python libraries
Last updated 6/2019
English

What you'll learn

  • Enhance your programming skills and master data exploration and visualization in Python
  • Learn multidimensional analysis and reduction techniques
  • Master advanced visualization techniques (such as heatmaps) for better analysis and rapidly broaden your understanding
  • Retrieve data from different data sources (CSV, JSON, Excel, PDF) and parse them in Python to give them a meaningful shape
  • Perform statistical analysis using in-built Python libraries
  • Understand the concept of Block algorithms and how Dask leverages it to load large data.
  • Implement various example using Dask Arrays, Bags, and Dask Data frames for efficient parallel computing
  • Combine Dask with existing Python packages such as NumPy and Pandas
  • Implement an end-to-end Machine Learning pipeline in a distributed setting using Dask and scikit-learn
  • Visualize and gain insights into real-world datasets via different chart types using Matplotlib

Course content

4 sections143 lectures13h 37m total length
  • The Course Overview4:38

    This video will give you an overview about the course.

  • Basic Statistical Measures7:39

    Before moving on to the coding part of the course, we must lay the foundation of descriptive statistics which will be used heavily throughout the course.

       •  Explore the various measure of statistics like mean, median, and mode

       •  Understand the various properties of these measures

       •  Learn how to calculate these statistical measures

  • Variance and Standard Deviation4:10

    Once we have learned how to calculate these statistical measures, we move on to visualizing them in the form of graphs for better understanding.

       •  Explore the various graphs through which we can visualize the statistical measures

       •  Understand the visualization changes with change in values of these measures

       •  Explore alternate graphs for visualizations

  • Visualizing Statistical Measures9:03

    We must understand the importance of variance in data and how it ties up with other measures of central tendencies.

       •  Explore the concept of variance

       •  Visualize variance in data

       •  Understand how it depends on other statistical measures

  • Calculating Percentiles5:10

    Percentiles allow us to interpret data in a more readable format. We will explore how they are calculated and what information they give regarding the dataset.

       •  Understand what are iterators and the iterator protocol

       •  Implement iterators in Python

       •  Implement generators in Python using the yield keyword

  • Quartiles and Box Plots7:04

    Once we are done with percentiles and how they can be calculated, we move on to the concept of Quartiles and how to visualize them using box plots.

       •  Understand the concept of Quartiles

       •  Visualize percentiles and Quartiles using box plots

       •  Get a better understanding of box plots

  • Finding Missing Values11:25

    Most of the real-world datasets contain missing values due to various reasons. In this video, we find out how we can know whether we have missing values in our dataset using Pandas library in Python.

       •  Explore the various reasons for the missing values in datasets

       •  Understand the various Pandas functions that can be used to find the missing values

       •  Learn about the different types of missing values and how Pandas does type conversion for them

  • Dealing with Missing Values6:18

    Once we have learned how to find missing values in the dataset, we move on to discussing the different ways to deal with missing values.

       •  First, we discuss why simply ignoring rows with missing values might not work

       •  Understand how we can impute missing values with measures of central tendencies

       •  Demonstrate via an example about we can fill missing values based on other columns

  • Hands-on with Dealing with Missing Values14:43

    Now, we move on to using Pandas library to deal with missing data.

       •  Explore the df.dropna function and its various attributes

       •  Explore the various ways of filling missing values via df.fillna, df.ffill, and df.bfill

       •  Implement an example in which we fill missing values based on values in other columns

  • Case Study: Missing Data in Titanic Dataset12:09

    We need to apply the concepts that we have learnt in this section over the real-world Titanic Dataset.

       •  Load the Titanic Dataset and explore the various columns

       •  Find out the descriptive statistics of the dataset

       •  Impute missing values in the dataset

  • What are Outliers?5:22

    Sometimes we might encounter values in our dataset which are abnormally high, low, or simply weird as compared to other values in the dataset. We must understand what outliers are and what causes them to occur.

       •  Understand what outliers are

       •  Understand the causes of outliers

       •  Explore via examples, the different types of outliers

  • Using Z-scores to Find Outliers6:50

    Z-scores are one of the commonly used methods to identify outliers. In this video, we understand the idea behind Z-score and how they can be used to identify outliers.

       •  Discuss what are Z-scores and what do they signify

       •  Visualize Z-scores over a normal distribution for more clarity

       •  Implement Z-scores to find outliers in a dummy dataset

  • Modified Z-scores7:41

    Z-scores can sometimes not be very efficient since they use mean and standard deviation to detect outliers. In this video, we use a modified version of Z-score which is based on median.

       •  Understand why Z-score might fail in some cases

       •  Understand the idea of Median, Standard Deviation, and Modified Z-scores

       •  Implement an example in which we find missing values using Modified Z-scores

  • Using IQR to Detect Outliers8:46

    Finally, we also learn how to use Interquartile Range (IQR) to detect outliers in a dataset and visualize them via box plots.

       •  Explore the concept of IQR and how it can be used to identify outliers

       •  Visualize IQR and outliers over a box plot

       •  Implement an example using IQR and box plots to detect outliers

  • Types of Variables17:25

    Before moving on to analyzing the various types of variables in a dataset, we must understand the different variables that might occur in a dataset.

       •  Understand what are the different types of variables

       •  Explore the different types of numeric variables

       •  Explore the different types of categorical variables

  • Introduction to Univariate Analysis6:27

    Now that we have understood the different types of variables, let’s take a look at the different ways of analyzing variables using Python.

       •  Create dummy data for our analysis

       •  Implement code for plotting different types of graphs in Python

       •  Explore the different graphs and libraries available in Python

  • Skewness and Kurtosis4:16

    After learning about the various graphs that we can use to explore columns in Python, we must first understand the concept of Skewness and Kurtosis in Statistics and how they affect the shape of a distribution.

       •  Understand what Skewness is

       •  Understand the idea behind Kurtosis

       •  Explore how Skewness and Kurtosis affect the shape of the curve

  • Univariate Analysis over Olympics Dataset11:39

    Finally, we will apply the different techniques that we have learned for Univariate Analysis over the Olympics Dataset.

       •  Explore the different columns in Olympics Dataset

       •  Draw density plots, histograms, and so on. over various columns

       •  Find Skewness of the data using SciPy module in Python

  • Introduction to Bivariate Analysis2:25

    Now that we have explored univariate analysis, we move ahead to bivariate analysis where we explore two variables at the same time.

       •  Understand what is bivariate analysis

       •  Understand how bivariate analysis helps us understand our data better

       •  List out various graphs used for bivariate analysis

  • Correlation Coefficient4:21

    Before moving on to doing practical bivariate analysis, we must understand the theoretical concept behind correlation coefficients.

       •  Explore the concept of correlation coefficient

       •  Understand the different types of correlation coefficient

       •  Understand what correlation coefficient signifies for our data

  • Scatter Plots and Heatmaps8:25

    After understanding the theoretical concepts behind correlation coefficients, we now move on to visualizing correlation between two sets of variables.

       •  Implement code for positive and negative correlation

       •  Use seaborn library to visualize scatterplots

       •  Use heatmaps to visualize correlation between multiple pair of columns at once

  • Bivariate Analysis: Titanic Dataset8:32

    In this video, we will apply various techniques of bivariate analysis over the Titanic Dataset.

       •  Load the Titanic Dataset

       •  Implement bivariate graphs using Seaborn

       •  Identify trends if they exist in the data

  • Bivariate Analysis: Video Game Sales18:25

    In this video, we will apply various techniques of bivariate analysis over the video game sales dataset.

       •  Load the video game sales dataset and understand the various columns

       •  Implement interactive graphs using Bokeh library in Python

       •  Identify trends if they exist in the data using bivariate graphs

  • Introduction to Multivariate Analysis3:01

    Now that we have explored univariate and bivariate analysis, we move ahead to multivariate analysis where we explore more than two variables at the same time.

       •  Understand what is multivariate analysis

       •  Understand the various advantages of multivariate analysis

       •  Visualize a graph depicting multivariate analysis

  • Multivariate Analysis over Titanic Dataset10:06

    In this video, we will apply various techniques of multivariate analysis over the Titanic Dataset.

       •  Load the Titanic Dataset and find descriptive statistics of the various variables

       •  Implement multivariate graphs using Seaborn

       •  Identify trends if they exist in the data

  • Multivariate Analysis over Pokemon Dataset18:57

    In this video, we will apply various techniques of multivariate analysis over the Pokemon Dataset.

       •  Load the Pokemon Dataset and find descriptive statistics of the various variables

       •  Implement interactive graphs using Bokeh

       •  Identify trends if they exist in the data using multivariate graphs

  • Simpson’s Paradox4:33

    Simpson’s Paradox is a phenomenon that may occur in real-world data, leading to conflicting results. We understand why it happens and what we can do to prevent it.

       •  Understand what is Simpson’s Paradox

       •  Understand what causes it and how we can prevent it from happening

       •  Demonstrate Simpson’s Paradox using an example

  • Correlation Is Not Causation4:46

    This is one of the most widely misinterpreted phenomena that occurs in real world. We understand why it happens and what we can do to prevent it.

       •  Understand why Correlation does not necessarily imply causation

       •  Understand what causes it and how we can prevent it from happening

       •  Demonstrate that correlation does not imply causation using various examples

  • Wine Data Analysis: Initial Setup4:49

    In this video, we will apply all the different techniques that we have learned in the previous sections to a real-world dataset.

       •  Download and load the dataset

       •  Explore the different variables in the dataset

       •  Create a set of questions that we will answer through our analysis

  • Red Wine Analysis24:35

    Here we will do Exploratory Data Analysis over Red Wine Data.

       •  Download and load the dataset

       •  Explore the different variables in the dataset

       •  Identify trends if they exist in the data

  • White Wine Analysis21:49

    In this video, we will do Exploratory Data Analysis over White Wine Data.

       •  Download and load the dataset

       •  Explore the different variables in the dataset

       •  Identify trends if they exist in the data

  • White Wine versus Red Wine: Analysis18:20

    Here, we will do a comparative analysis about how these wines are different from each other.

       •  Download and load the dataset

       •  Explore the different variables in the dataset based on the type of wines

       •  Identify trends if they exist in the data

  • Test your knowledge

Requirements

  • Basic knowledge of probability/statistics and Python coding experience will assist you in understanding the concepts covered in this course.

Description

Python is an open-source community-supported, general-purpose programming language that, over the years, has also become one of the bastions of data science. Thanks to its flexibility and vast popularity that data analysis, visualization, and machine learning can be easily carried out with Python.

This practical course is designed to teach you how to perform data science tasks such as data analysis, data manipulation, and data visualization. You will begin with performing data analysis on real-world datasets. You will then work on large datasets and perform exploratory data analysis to investigate the dataset and to come up with the findings from it.You will also learn to scale your data analysis and execute distributed data science projects right from data ingestion to data manipulation and visualization using Dask. Next, you will explore Dask frameworks and see how Dask can be used with other common Python tools such as NumPy, Pandas, matplotlib, Scikit-learn, and more. Finally, you will perform data visualization using Python and Matplotlib 3.

By the end of this course, you will be able to use the power of Python to analyze data, create beautiful visualizations, and use powerful machine learning algorithms.

Meet Your Expert(s):

We have the best work of the following esteemed author(s) to ensure that your learning journey is smooth:

  • Mohammed Kashif works as a Data Scientist at Nineleaps, India, dealing mostly with graph data analysis. Prior to this, he worked as a Python developer at Qualcomm. He completed his Master's degree in Computer Science from IIT Delhi, with a specialization in data engineering. His areas of interest include recommender systems, NLP, and graph analytics. In his spare time, he likes to solve questions on StackOverflow and help debug other people out of their misery. He is also an experienced teaching assistant with a demonstrated history of working in the Higher-Education industry.


  • Jamshaid Sohail is a Data Scientist who is highly passionate about Data Science, Machine learning, Deep Learning, big data, and other related fields. He spends his free time learning more about the field and learning to use its emerging tools and technologies. He is always looking for new ways to share his knowledge with other people and add value to other people's lives. He has also attended Cambridge University for a summer course in Computer Science where he studied under great professors and would like to impart this knowledge to others. He has extensive experience as a Data Scientist in a US-based company. In short, he would be extremely delighted to educate and share knowledge with other people.


  • Harish Garg is a co-founder and software professional with more than 18 years of software industry experience. He currently runs a software consultancy that specializes in the data analytics and data science domain. He has been programming in Python for more than 12 years and has been using Python for data analytics and data science for 6 years. He has developed numerous courses in the data science domain and has also published a book involving data science with Python, including Matplotlib.

Who this course is for:

  • This course is for Python developers, data analysts, and IT professionals who wish to explore the world of data science by performing data analysis, data wrangling, data manipulation, and data visualization on their own datasets.