Data Science with Python 3.x

Name: Data Science with Python 3.x
Rating: 4.1 (15 reviews)

Gain useful insights from data by performing popular data science techniques using Python libraries

Created byPackt Publishing

Last updated 6/2019

English

What you'll learn

Enhance your programming skills and master data exploration and visualization in Python
Learn multidimensional analysis and reduction techniques
Master advanced visualization techniques (such as heatmaps) for better analysis and rapidly broaden your understanding
Retrieve data from different data sources (CSV, JSON, Excel, PDF) and parse them in Python to give them a meaningful shape
Perform statistical analysis using in-built Python libraries
Understand the concept of Block algorithms and how Dask leverages it to load large data.
Implement various example using Dask Arrays, Bags, and Dask Data frames for efficient parallel computing
Combine Dask with existing Python packages such as NumPy and Pandas
Implement an end-to-end Machine Learning pipeline in a distributed setting using Dask and scikit-learn
Visualize and gain insights into real-world datasets via different chart types using Matplotlib

Course content

4 sections • 143 lectures • 13h 37m total length

The Course Overview4:38
This video will give you an overview about the course.
Basic Statistical Measures7:39
Before moving on to the coding part of the course, we must lay the foundation of descriptive statistics which will be used heavily throughout the course.
• Explore the various measure of statistics like mean, median, and mode
• Understand the various properties of these measures
• Learn how to calculate these statistical measures
Variance and Standard Deviation4:10
Once we have learned how to calculate these statistical measures, we move on to visualizing them in the form of graphs for better understanding.
• Explore the various graphs through which we can visualize the statistical measures
• Understand the visualization changes with change in values of these measures
• Explore alternate graphs for visualizations
Visualizing Statistical Measures9:03
We must understand the importance of variance in data and how it ties up with other measures of central tendencies.
• Explore the concept of variance
• Visualize variance in data
• Understand how it depends on other statistical measures
Calculating Percentiles5:10
Percentiles allow us to interpret data in a more readable format. We will explore how they are calculated and what information they give regarding the dataset.
• Understand what are iterators and the iterator protocol
• Implement iterators in Python
• Implement generators in Python using the yield keyword
Quartiles and Box Plots7:04
Once we are done with percentiles and how they can be calculated, we move on to the concept of Quartiles and how to visualize them using box plots.
• Understand the concept of Quartiles
• Visualize percentiles and Quartiles using box plots
• Get a better understanding of box plots
Finding Missing Values11:25
Most of the real-world datasets contain missing values due to various reasons. In this video, we find out how we can know whether we have missing values in our dataset using Pandas library in Python.
• Explore the various reasons for the missing values in datasets
• Understand the various Pandas functions that can be used to find the missing values
• Learn about the different types of missing values and how Pandas does type conversion for them
Dealing with Missing Values6:18
Once we have learned how to find missing values in the dataset, we move on to discussing the different ways to deal with missing values.
• First, we discuss why simply ignoring rows with missing values might not work
• Understand how we can impute missing values with measures of central tendencies
• Demonstrate via an example about we can fill missing values based on other columns
Hands-on with Dealing with Missing Values14:43
Now, we move on to using Pandas library to deal with missing data.
• Explore the df.dropna function and its various attributes
• Explore the various ways of filling missing values via df.fillna, df.ffill, and df.bfill
• Implement an example in which we fill missing values based on values in other columns
Case Study: Missing Data in Titanic Dataset12:09
We need to apply the concepts that we have learnt in this section over the real-world Titanic Dataset.
• Load the Titanic Dataset and explore the various columns
• Find out the descriptive statistics of the dataset
• Impute missing values in the dataset
What are Outliers?5:22
Sometimes we might encounter values in our dataset which are abnormally high, low, or simply weird as compared to other values in the dataset. We must understand what outliers are and what causes them to occur.
• Understand what outliers are
• Understand the causes of outliers
• Explore via examples, the different types of outliers
Using Z-scores to Find Outliers6:50
Z-scores are one of the commonly used methods to identify outliers. In this video, we understand the idea behind Z-score and how they can be used to identify outliers.
• Discuss what are Z-scores and what do they signify
• Visualize Z-scores over a normal distribution for more clarity
• Implement Z-scores to find outliers in a dummy dataset
Modified Z-scores7:41
Z-scores can sometimes not be very efficient since they use mean and standard deviation to detect outliers. In this video, we use a modified version of Z-score which is based on median.
• Understand why Z-score might fail in some cases
• Understand the idea of Median, Standard Deviation, and Modified Z-scores
• Implement an example in which we find missing values using Modified Z-scores
Using IQR to Detect Outliers8:46
Finally, we also learn how to use Interquartile Range (IQR) to detect outliers in a dataset and visualize them via box plots.
• Explore the concept of IQR and how it can be used to identify outliers
• Visualize IQR and outliers over a box plot
• Implement an example using IQR and box plots to detect outliers
Types of Variables17:25
Before moving on to analyzing the various types of variables in a dataset, we must understand the different variables that might occur in a dataset.
• Understand what are the different types of variables
• Explore the different types of numeric variables
• Explore the different types of categorical variables
Introduction to Univariate Analysis6:27
Now that we have understood the different types of variables, let’s take a look at the different ways of analyzing variables using Python.
• Create dummy data for our analysis
• Implement code for plotting different types of graphs in Python
• Explore the different graphs and libraries available in Python
Skewness and Kurtosis4:16
After learning about the various graphs that we can use to explore columns in Python, we must first understand the concept of Skewness and Kurtosis in Statistics and how they affect the shape of a distribution.
• Understand what Skewness is
• Understand the idea behind Kurtosis
• Explore how Skewness and Kurtosis affect the shape of the curve
Univariate Analysis over Olympics Dataset11:39
Finally, we will apply the different techniques that we have learned for Univariate Analysis over the Olympics Dataset.
• Explore the different columns in Olympics Dataset
• Draw density plots, histograms, and so on. over various columns
• Find Skewness of the data using SciPy module in Python
Introduction to Bivariate Analysis2:25
Now that we have explored univariate analysis, we move ahead to bivariate analysis where we explore two variables at the same time.
• Understand what is bivariate analysis
• Understand how bivariate analysis helps us understand our data better
• List out various graphs used for bivariate analysis
Correlation Coefficient4:21
Before moving on to doing practical bivariate analysis, we must understand the theoretical concept behind correlation coefficients.
• Explore the concept of correlation coefficient
• Understand the different types of correlation coefficient
• Understand what correlation coefficient signifies for our data
Scatter Plots and Heatmaps8:25
After understanding the theoretical concepts behind correlation coefficients, we now move on to visualizing correlation between two sets of variables.
• Implement code for positive and negative correlation
• Use seaborn library to visualize scatterplots
• Use heatmaps to visualize correlation between multiple pair of columns at once
Bivariate Analysis: Titanic Dataset8:32
In this video, we will apply various techniques of bivariate analysis over the Titanic Dataset.
• Load the Titanic Dataset
• Implement bivariate graphs using Seaborn
• Identify trends if they exist in the data
Bivariate Analysis: Video Game Sales18:25
In this video, we will apply various techniques of bivariate analysis over the video game sales dataset.
• Load the video game sales dataset and understand the various columns
• Implement interactive graphs using Bokeh library in Python
• Identify trends if they exist in the data using bivariate graphs
Introduction to Multivariate Analysis3:01
Now that we have explored univariate and bivariate analysis, we move ahead to multivariate analysis where we explore more than two variables at the same time.
• Understand what is multivariate analysis
• Understand the various advantages of multivariate analysis
• Visualize a graph depicting multivariate analysis
Multivariate Analysis over Titanic Dataset10:06
In this video, we will apply various techniques of multivariate analysis over the Titanic Dataset.
• Load the Titanic Dataset and find descriptive statistics of the various variables
• Implement multivariate graphs using Seaborn
• Identify trends if they exist in the data
Multivariate Analysis over Pokemon Dataset18:57
In this video, we will apply various techniques of multivariate analysis over the Pokemon Dataset.
• Load the Pokemon Dataset and find descriptive statistics of the various variables
• Implement interactive graphs using Bokeh
• Identify trends if they exist in the data using multivariate graphs
Simpson’s Paradox4:33
Simpson’s Paradox is a phenomenon that may occur in real-world data, leading to conflicting results. We understand why it happens and what we can do to prevent it.
• Understand what is Simpson’s Paradox
• Understand what causes it and how we can prevent it from happening
• Demonstrate Simpson’s Paradox using an example
Correlation Is Not Causation4:46
This is one of the most widely misinterpreted phenomena that occurs in real world. We understand why it happens and what we can do to prevent it.
• Understand why Correlation does not necessarily imply causation
• Understand what causes it and how we can prevent it from happening
• Demonstrate that correlation does not imply causation using various examples
Wine Data Analysis: Initial Setup4:49
In this video, we will apply all the different techniques that we have learned in the previous sections to a real-world dataset.
• Download and load the dataset
• Explore the different variables in the dataset
• Create a set of questions that we will answer through our analysis
Red Wine Analysis24:35
Here we will do Exploratory Data Analysis over Red Wine Data.
• Download and load the dataset
• Explore the different variables in the dataset
• Identify trends if they exist in the data
White Wine Analysis21:49
In this video, we will do Exploratory Data Analysis over White Wine Data.
• Download and load the dataset
• Explore the different variables in the dataset
• Identify trends if they exist in the data
White Wine versus Red Wine: Analysis18:20
Here, we will do a comparative analysis about how these wines are different from each other.
• Download and load the dataset
• Explore the different variables in the dataset based on the type of wines
• Identify trends if they exist in the data
Test your knowledge

The Course Overview4:11
This video explains the course prerequisites and provides an entire overview of the course.
Installing Anaconda Navigator on Windows/Linux5:14
Which Python distribution to use in this course?
Install Anaconda Navigator and verify the installation
Choose an IDE (Spyder)
Importing and Parsing CSV in Python7:46
Most of the data comes in CSV form. We will look how we can use Python to import and get things out of it.
Import and parse CSV file using CSV module
Import and parse using Pandas module
Importing and Parsing JSON in Python5:55
In industry, data is mostly exposed in web services and JSON is used to represent the data. So we will parse data out of JSON in this video.
Analyze the JSON file by opening it
Use JSON module in Python to parse the data out of JSON
Scraping Data from Public Web – Part 14:50
Most of the data is available on public web embedded in HTML markup, so a need arises to use that. We will look at the basics of web parsing in this video.
Explore the modules used for web scraping
Scrap the HTML markup of a Wikipedia page and, get the basic information out of it
Scraping Data from Public Web – Part 211:50
In this video, we will look at practical demo to extract the HTML markup of a table tag of HTML and then storing that information in structured form.
Look into the correct table tag which we want to extract into our program
Hands-On approach in Python to get the relevant information out of the table tag
Store the information in form of table
Importing and Parsing Excel Files – Part 15:17
Sometimes organizations/companies find it convenient to store the relevant information about something in sheets of Excel file. We will look into xlrd module to extract information out of Excel file.
Analyze the dataset by manually opening the file
Import the dataset into Python using xlrd module and then print the sheet names and number of sheets
Print the rows of a sheet on the console
Importing and Parsing Excel Files – Part 25:23
People prefer a small portion of code to get big things done so we will use Pandas module in this video to do the same.
Import the Pandas module
Import the different sheets of Excel files into the Pandas DataFrames and extract some basic information
Manipulating PDF Files in Python – Part 14:51
Sometimes a need arises to extract information out of the PDF files and then process that. We are going to look how we can do that in this video.
Extract information out of a PDF file and then store each page of PDF file in a separate index of a list
Print each page of PDF files
Manipulating PDF Files in Python – Part 25:52
Sometimes we feel a need to write the data to a PDF file, so in this video we will look how to edit to a PDF file.
Construct a sample resume in the code example
Edit text and images to a PDF file at proper positions
Difference between Relational and Non-Relational Databases3:46
Database administrators choose their databases based on the characteristics of databases. We will just look into the basics.
Understand when to choose relational database and when to choose the non-relational database
Look at the links containing the software’s required to install in this section
Storing Data in SQLite Databases8:27
We will be storing the JSON file into SQLite light weight database and look into the code example to accomplish that.
Create the table containing fields from the JSON file in SQLite
Dump the JSON file by parsing it into the SQLite databases
Verify the dump using DB Browser for SQLite
Storing Data in MongoDB6:25
Many of the times in industry people prefer non-relational databases over relational databases due to complexities of schemas. We will dump information in MongoDB (a famous document oriented database).
Make MongoDB up and running
Write the code to dump the CSV file into MongoDB
Verify the dump using Robo3T(Robomongo)
Storing Data in Elasticsearch7:17
In this video we are going to use Elasticsearch with Kibana(to display information from Elasticsearch) to store the JSON file into the Elasticsearch.
Explore Elasticsearch and Kibana
Import the CSV file and then convert each to a format which can be dumped to Elasticsearch
Verify the dump using Kibana
Comparative Study of Databases for Storage2:28
Often people are interested in the pros and cons of the databases, so in this video we are going to look into that in detail.
Understand advantages and disadvantages of relational and non-relational databases
The Most Important Step in Data Analysis2:35
Data cleansing holds an important part in Data Science. We will look into why it is important and some common tips and methods to do it.
Explore importance of data cleansing in Data Science
Learn about data cleansing tips and techniques
Viewing/Inspecting DataFrames6:44
We are going to jump into looking data frames and what they are, and how they display structured information in a good format.
Read datasets, displaying column names, displaying the number of rows and number of columns in a data frame
Changing the data type of columns and retrieving certain rows from the data frames
Renaming/Adding/Removing the DataFrame Columns6:04
Sometimes there is a need to give proper names to columns coming in the datasets, adding more and removing the irrelevant ones so this video will show you how to perform that.
Edit the column names
Delete the irrelevant columns
Add more columns into the data frame
Dropping Duplicate Rows6:41
Duplicate rows based on a column values might be redundant in performing the operations in Data Science so it is good to drop them.
Drop all the duplicate rows
Drop all but keep the first duplicate row
Drop all but keep the last duplicate row
Indexing DataFrame to Retrieve Specific Columns and Rows7:08
Sometimes we need to extract just required columns and rows out of the data frames. In this video we are going to look at the lines of codes which can be used to do so.
Read in the dataset and then retrieving the first, last rows and columns
Retrieve the first five columns and first five rows
Retrieve certain rows and columns
Merging/Concatenating/Joining DataFrames8:03
Data distributed in different sheets can be concatenated and merge/join depending on the use case. In this video we are going to solve this issue which occurs a lot in industry.
Look at the syntax of how we can create a data frame from a dictionary
Concatenate data frame in Python
Perform join/merge operation between two data frames on a column
Dealing with Missing Values8:36
Real time datasets contains many missing values in columns. We will look into this video how we can solve this problem and come to a good solution.
Understand how the missing values appears in the dataset and also learn how to deal with them
Introduce some missing values in a data frame
Drop the rows/columns which contain missing values or variables or mean of the column in which they are present
Filtering and Sorting of DataFrame6:38
Analyzing the dataset sometimes require rows to be sorted. Also we need to filter out some rows based on various conditions. We will look into this video how we can do that.
Edit the certain columns for the sake of computation
Sort the data frame on a column and look into the syntax of how we can do that
Filter out the certain conditions and look into the syntax of how we can do that
Encoding/Mapping Existing Values – Part 14:54
Computers understand numbers. Sometimes machine learning algorithms require columns to be in equivalent numeric form so we will look in this video how we can do that.
Drop the rows which are containing the missing values
Use LabelEncoder from the pre-processing module to encode the gender column values
Add the encoded gender column back into the data frame
Encoding/Mapping Existing Values – Part 24:41
We will look into another technique of mapping more than two unique values in a column.
Drop the rows having the missing values and get the unique values in a column
Construct a dictionary mapping those unique values to different values
Apply the new encoding onto the column of data frame and look at the changes
Rescale/Standardize Column Values7:32
Rescaling is mapping the numeric values in a column to (0 to 1) range and it helps machine learning algorithms to converge faster. Standardization helps to map column values in such a way that they have mean of zero and standard deviation of one. This helps to compare feature along different scales.
Rescale a column using the MinMaxScaler of pre-processing module and this we will look into the results
Standardize a column using the StandardScaler class of pre-processing module. And then we will look into the results
Common Cleaning Operations6:58
We will be looking into the common cleaning operations good to have in the toolbox while playing with data frames.
Drop rows having missing values in them and then reset the index of the data frame
Lower case the column names and then apply a strip function on a column to remove spaces from the values at the beginning and end of the strings
Apply a function to each value of a column
Exporting Datasets for Future Use5:35
Sometimes we need to store the data frames after doing processing back into the CSV/JSON files.
Drop rows having missing values in them and then reset the index of the data frame
Delete a column from the data frame
Dump the data frame into the CSV file and a JSON file
Different Uses of Packages (Pandas, NumPy, SciPy, and Matplotlib)1:58
We will look into the common uses of Python modules in Data Science(Pandas, NumPy, SciPy, and Matplotlib).
Learn about usage of Pandas module
Types of Column Names/Features/Attributes in Structured Data1:53
We will look into the types of Dataframe columns people come across in industry. Numeric columns which contains numbers and then will into their types. Categorical variables and their types.
Understand numerical data
Split-Apply-Combine (Performing Group By Operation)5:37
Sometimes a need arises to group similar rows to perform operations on them and get stats out of them.
Perform group by to compute the number of elements in all the groups
Perform group by to compute the average water consumption in every group of animal
Descriptive Statistics Using Python – Part 15:31
Describing features of a column comes in descriptive statistics.
Compute the mean, median and mode of values in a column
Compute sum, standard deviation and range of values in a column
Descriptive Statistics Using Python – Part 25:12
We will look into advanced statistical techniques of computing stats.
Compute geometric mean, harmonic mean and trimmed mean
Using Visualizations1:48
We are going to look into visualizations. Why they are important? What are the different types?
Understand in detail about visualizations rules
Cool Visualization of Real-World Datasets of World Population Evolution2:35
We will see an amazing site showing cool visualizations regarding the happenings in the World over the last 60 years.
Understand visualization with the help of World population dataset
Visualizations in Python – Part 19:33
We will look into how we can plot the relationship between variables (scatter plot), look into line plots and the histograms.
Explore the iris dataset and extract the relevant columns out of them
Plot a scatter plot between two columns, plot a line plot of one variable in the data frame
Plot a histogram to understand its concept in a better way
Visualizations in Python – Part 25:54
We will look into box plot and the pie chart of how they can be used to make visualizations of things.
Plot a box plot of a column in data frame which sums up many things in that columns
Make a pie chart for visualizing the utilization of hours of a person in a day
Exploring an Online Visualization Tool (RAWGraphs)3:45
Sometimes it is good to use some online tools which provide many ease to do visualizations for a non-tech person. We will look into RAWGraphs site for making visualizations of the things.
Explore RAWGraphs
Test your knowledge

The Course Overview4:13
This video will give you an overview about the course.
Introduction to Dask3:22
Dask is Python library for parallel and distributed computing. Before moving ahead, we must understand the idea behind Dask and the various use cases in which it can be used.
Understand what Dask is
Overview of the features of Dask
Use cases for Dask
Features of Dask2:41
Now that we have a basic idea of what Dask is, we move to discussing the various features it has to offer for parallel/distributed processing.
See how Dask helps in parallelizing code
Understand how Dask helps in scaling out your code
Understand the various data-structures, algorithms, schedulers, etc., that Dask can offer
Limitations of Dask1:58
We also need to cover the limitations of Dask, to get a better idea of the assumptions to be made while writing code for Dask.
Explore the limitations of Dask for parallelizing code
Study the limitations of Dask for running code over a cluster of nodes
Look at the limitations of Dask for task scheduling
Setting Up Dask6:06
Now that we have covered the features and limitations of Dask, we now move on to setting up Dask on our system.
Install Dask from Conda/PIP
Install Dask from source (GitHub)
Install Dask over Google Colab notebook
Introduction to Blocked Algorithms5:23
Now that we have a fair idea of what Dask is, we move on to discussing about Blocked Algorithms and Dask Arrays.
Understand what Dask arrays are
Overview of Blocked Algorithms
Look at the example of Blocked Algorithm
Hands-On with Dask Arrays11:20
Once, we have understood how blocked algorithms work over Dask arrays, we move on to implementing some basic operations over Dask arrays.
Explore Dask array API
Create Dask arrays
Visualize Task Graphs for Dask arrays
Digging Deeper into Dask Arrays8:40
In this video, we will look at some more advanced operations that we can perform on Dask Arrays.
Perform scalar operations, reduction, etc. over Dask Arrays
Perform operations like slicing, indexing, and broadcasting over Dask arrays
Implement Stacking and Concatenation over Dask Arrays
Performance Comparison with NumPy Arrays5:02
We do a performance analysis of Dask Arrays versus NumPy arrays.
Implement a computationally expensive operation over large NumPy arrays
Implement the same operation using Dask Arrays
Do a performance analysis over both methods
Creating Universal NumPy Functions with Dask5:49
Universal NumPy Functions are one of the building blocks of NumPy API. Now we will learn to implement them for Dask arrays.
Understand what Universal NumPy Functions are
Explore the different properties of NumPy Universal Functions
Implement Universal NumPy Functions for Dask arrays
Limitations of Dask Arrays2:28
Finally, we will discuss the current limitations of Dask arrays.
Discuss the current limitations of Dask Arrays API, as per latest documentation
Lazy Evaluation4:03
Before we move on to parallelizing our code using Dask, we must first understand the concept of Lazy evaluation and how it works.
Understand what lazy evaluation is
Understand how it is different from eager evaluation
Look at the example of Lazy evaluation in Python
Using dask.delayed0:13
Once we have understood how lazy evaluation works, we move on to exploring dask.delayed function and how it can be used to parallelize existing Python code.
Explore dask.delayed function
Implement examples using dask.delayed
Implement examples using @delayed decorators and visualize task graphs
Understanding Task Graphs9:34
Task Graphs form the basic computations of Dask. In this video, we look into what these graphs mean, how different computations affect task graphs, etc.
Understand what task graphs are
Visualize task graphs for basic computations
Visualize task graphs over complex computations
Performance Analysis with dask.delayed6:22
We do a performance analysis of dask.delayed versus Sequential Execution.
Implement a sequential program over multiple array
Implement the same operation using dask.delayed
Do a performance analysis over both methods
Introduction to Dask Dataframes2:47
Before we move on to analyzing data with Dask Dataframes, we will understand how Dask Dataframes works and what features they provide.
Overview of Dask Dataframes
Look at the features of Dask Dataframes
Discuss its similarity to Pandas API
Exploring Dask Dataframes9:46
In this video, we will implement some basic examples for manipulating Dask Dataframes.
Perform basic Dataframe manipulation operation
Perform aggregation operations on Dask Dataframes
Highlight the differences and similarity with Pandas API
Creating Dask Dataframes4:47
We will discuss some of the different ways of creating and loading Dask Dataframes.
Discuss the different ways of creating Dask Dataframes
Use glob patterns to load multiple files at once
Discuss the multiple formats from which we can load Dask Dataframes
Loading Large Datasets with Dask Dataframes5:19
We will try to load larger than memory datasets via Dask Dataframes and perform operations on it.
Load larger than memory dataset into Dask Dataframes
Perform analysis over the data
Analyze the time taken and performance of Dask Dataframes
Analyzing Data with Dask Dataframes9:39
In this video, we will take a real-world dataset and analyze it using Dask.
Load dataset using Dask Dataframe
Perform analysis on the data
Limitations of Dask Dataframes2:01
Finally, we will discuss some of the current limitations of Dask Dataframes.
Discuss some of the limitations of Dask Dataframes
Introduction to Dask Bags3:45
First, we need to understand the basic concept of Dask bags, their features and use cases, before we move on to the implementation part.
Understand the concept behind Dask Bags
Understand the features of Dask Bags
Explore the various use-cases for Dask Bags
Creating and Storing Dask Bags6:55
In this video, we will discuss the various functions available in the Dask API, to create and store Dask Bags.
Explore the various functions to create Dask Bags
Explore the various functions to store Dask Bags
Explore different options for creating/storing Dask Bags
Manipulating Dask Bags10:03
Now that we have a fair idea of what Dask Bags are, and lets us explore the various ways through which we can manipulate Dask Bags.
Create dummy Dask Bags for manipulation
Use various functions for Dask Bags API
Visualize task graphs over these functions
Word Count Example Using Dask Bags12:18
In this example, we create a word counter using Dask Bags.
Create a Dask Bag using a URL
Clean the text via Dask Bag Functions
Explore multiple ways of creating a word counter
Manipulating JSON Data Using Dask Bags11:11
In this example, we will explore how we can manipulate JSON data using Dask Bags.
Create a Dask Bag using a glob pattern of JSON files.
Clean the data
Visualize the data after processing
Limitations of Dask Bags2:29
In the final video, we will discuss some of the current limitations of Dask Bags.
Discuss the limitations of Dask Bags as per the latest documentation of Dask Version 1.2.1
Overview of Distributed Computing with Dask3:38
In this video, we will understand the various features offered by dask.distributed and compare it with Apache Spark.
Overview of dask.distributed
Look at the features of dask.distributed
Compare dask.distributed with Apache Spark
Setting Up Your Dask Cluster3:35
In this video, we will focus on setting your own local Dask cluster.
Understand the various options available for setting up your local Dask cluster
Create a local Dask cluster
Understanding Dask Schedulers7:37
Before we move on to submitting jobs to Dask clusters, we must understand the different types of schedulers available with Dask.
Discuss the different types of schedulers
Implement a program using dask.delayed
Compare the performances of different schedulers based on the same example
Exploring Dask Dashboard UI8:29
In this video, we understand the Dask Dashboard UI available for Dask cluster.
Setup a local Dask cluster
Load data via Dask Dataframes and perform computations on it
Analyze the Dask Dashboard UI
Limitations of dask.distributed1:46
Finally, we will discuss some of the limitations of dask.distrbuted.
Overview of current limitations of dask.distributed
Persisting Data4:56
In this video, we will discuss how you can save up on computation and memory by persisting data on your cluster.
Setup a local cluster and load a Dataframe
Perform some operation and analyze the task graphs
Persist the data and see the difference in task graphs
Combining Dask with Futures5:39
Dask provides an interface to Pythonâ€™s concurrent.future API. In this video, we will discuss how we can leverage that interface for asynchronous computation.
Understand what exactly is Pythonâ€™s concurrent.future interface
Explore how Dask provides an interface to concurrent.future
Implement Examples using Dask for asynchronous computation
Best Practices for Dask2:35
Finally, we will discuss some of the best practices to be followed while developing applications for Dask.
Discuss some of the best practices for developing application with Dask
Introduction to Dask-ML2:41
In this video, we will discuss a brief overview of what features Dask has to offer with respect to Machine Learning.
Overview of Dask-ML
Look at the features of Dask-ML
Using Dask-ML for Regression4:16
In this video, we go over an example of Regression using scikit-learn and combine it with Dask.
Implement a basic regression example using scikit-learn
Create a local Dask cluster
Combine Dask with scikit-learn for regression
Using Dask-ML for Classification2:50
In this video, we go over an example of Classification using scikit-learn and combine it with Dask.
Implement a basic Classification example using scikit-learn
Create a local Dask cluster
Combine Dask with scikit-learn for classification
Hyper-Parameter Tuning Using Dask4:40
In this video, we go over an example of Hyper-Parameter Tuning using scikit-learn and combine it with Dask.
Implement a basic Hyper-Parameter tuning example using scikit-learn
Create a local Dask cluster
Combine Dask with scikit-learn for Hyper-Parameter tuning
Test your knowledge

Course Overview4:02
This video gives a glimpse of the entire course.
Getting Data into Matplotlib3:29
Learn to create plots with Matplotlib3 by getting data from various means.
Plot data from lists
Plot data from NumPy
Plot data from Pandas Dataframes
Drawing Scatter Plots5:11
Learn to create Scatter plots with Matplotlib 3.
Import dataset
Draw scatter plots
Customize scatter plots
Creating Line Plots4:17
Learn to create line plots with Matplotlib 3.
Import dataset
Draw line plots
Customize line plots
Visualizing Data with Bar Charts3:36
Learn to create Scatter plots with Matplotlib 3.
Import dataset
Draw bar plots
Customize bar plots
Drawing Subplots4:53
Learn to visualize and compare datasets using subplots.
Import dataset
Create subplots
Various ways to create subplots
Creating Histograms4:09
Learn to create Histograms with Matplotlib 3.
Import dataset
Draw Histograms
Customize Histograms
Building Heatmaps1:48
Learn to create Heatmaps with Matplotlib 3.
Import dataset
Draw Heatmaps
Customize Heatmaps
Plotting Data on Box Plots1:31
Learn to create box plots with Matplotlib 3.
Import dataset
Draw box plots
Customize box plots
Drawing Pie Charts4:05
Learn to create pie charts with Matplotlib 3.
Import dataset
Draw pie charts
Customize pie charts
Customizing Labels, Titles, and Legends5:52
Learn how to customize the Matplotlib plots
Customize Labels
Customize titles
Customize legends
Adding Grids and Customizing Ticks3:10
Learn how to add grids to plots and customize ticks
Learn how to add grids to plots
Customize grids
Customize ticks
Using Matplotlib Styles3:09
Learn what Matplotlib styles and how to use them
Introduce styles
List of styles
Apply styles
Creating Custom Styles3:20
Learn to customize styles
Modify configuration files
Add your own style
Apply your style
Plot Annotation2:10
Lean plot annotation
Introduce Plot Annotation
Various types of annotations
Apply annotations
Build Plots from the Ground-Up Using Plot Scaffolding2:48
Learn to build plots in matplotlib using plot scaffolding.
Introduce plot scaffolding
Build a plot step by step
Building Custom Plots Using Figures2:05
Learn to build custom plots using figures
Introduce figure method
Build plots using figures method
Customizing Plot Axes1:57
Learn to customize plot axes
Axes and optimizations
Apply the axes customization
Building 3D Graphs Using Wireframe2:08
Learn to building 3D graphs using wireframe
Import modules and datasets
Create 3d wireframe graphs
Creating 3D Scatter Plots1:20
Create 3D scatter plots
Import modules and datasets
Create 3d scatter plots
Drawing 3D Bar Charts1:11
Draw 3D bar charts
Import modules and datasets
Create 3d bar charts
Customizing Wireframes1:38
Learn to Customize wireframes
Import modules and datasets
Customize wireframes
Drawing Animated Graphs2:20
Learn to draw animated graphs with data from your datasets
Import modules
Read-in datasets
Create animated plots
Building an Animated Histogram1:37
Learn to build animated histograms
Import modules
Read-in datasets
Create animated histograms
Creating Animated subplots1:07
Learn to create animated subplots
Import modules
Read-in datasets
Build animated subplots
Adding Interactivity to Plots1:12
Learn how to make your plots interactive.
Import modules
Read-in datasets
Add interactivity to your plots.
Creating Visualizations that Update Interactively with Data1:47
Learn to create plots that update data interactively.
Import modules
Read-in datasets
Build interactive plots.
Change the Plot Sizes0:59
Learn to change the plot sizes
Import modules and dataset
Plot data
Change plot size
Save Plot Image to a File1:47
How to save plot image to a file
Import modules and dataset
Save plots to an image file
Save plots to a pdf file.
Create Legend Outside the Plot1:32
Move legend outside the plot area
Import modules and dataset
Display default legend
Move legend out of plot area
Display Plots Inline in a Notebook1:37
How to display plots inline in a jupyter notebook
Matplotlib jupyter notebook magic command
Behavior without the command
Behavior with the inline command
Clear a Plot0:51
Learn how to use matpotlib clear plot methods
Introduce various plot clearing methods
Demonstrate plot clearing methods
Change Font Sizes of Plot Elements2:18
How to change font properties of the plot elements
Change font sizes
Change font type
Change font color
Troubleshoot Value Errors2:24
Learn to fix matplotlib value errors
Type of value errors
Introduce value errors
Fix value error
Test your knowledge

Requirements

Basic knowledge of probability/statistics and Python coding experience will assist you in understanding the concepts covered in this course.

Description

Python is an open-source community-supported, general-purpose programming language that, over the years, has also become one of the bastions of data science. Thanks to its flexibility and vast popularity that data analysis, visualization, and machine learning can be easily carried out with Python.

This practical course is designed to teach you how to perform data science tasks such as data analysis, data manipulation, and data visualization. You will begin with performing data analysis on real-world datasets. You will then work on large datasets and perform exploratory data analysis to investigate the dataset and to come up with the findings from it.You will also learn to scale your data analysis and execute distributed data science projects right from data ingestion to data manipulation and visualization using Dask. Next, you will explore Dask frameworks and see how Dask can be used with other common Python tools such as NumPy, Pandas, matplotlib, Scikit-learn, and more. Finally, you will perform data visualization using Python and Matplotlib 3.

By the end of this course, you will be able to use the power of Python to analyze data, create beautiful visualizations, and use powerful machine learning algorithms.

Meet Your Expert(s):

We have the best work of the following esteemed author(s) to ensure that your learning journey is smooth:

Mohammed Kashif works as a Data Scientist at Nineleaps, India, dealing mostly with graph data analysis. Prior to this, he worked as a Python developer at Qualcomm. He completed his Master's degree in Computer Science from IIT Delhi, with a specialization in data engineering. His areas of interest include recommender systems, NLP, and graph analytics. In his spare time, he likes to solve questions on StackOverflow and help debug other people out of their misery. He is also an experienced teaching assistant with a demonstrated history of working in the Higher-Education industry.
Jamshaid Sohail is a Data Scientist who is highly passionate about Data Science, Machine learning, Deep Learning, big data, and other related fields. He spends his free time learning more about the field and learning to use its emerging tools and technologies. He is always looking for new ways to share his knowledge with other people and add value to other people's lives. He has also attended Cambridge University for a summer course in Computer Science where he studied under great professors and would like to impart this knowledge to others. He has extensive experience as a Data Scientist in a US-based company. In short, he would be extremely delighted to educate and share knowledge with other people.
Harish Garg is a co-founder and software professional with more than 18 years of software industry experience. He currently runs a software consultancy that specializes in the data analytics and data science domain. He has been programming in Python for more than 12 years and has been using Python for data analytics and data science for 6 years. He has developed numerous courses in the data science domain and has also published a book involving data science with Python, including Matplotlib.

Who this course is for:

This course is for Python developers, data analysts, and IT professionals who wish to explore the world of data science by performing data analysis, data wrangling, data manipulation, and data visualization on their own datasets.

Data Science with Python 3.x

What you'll learn

Explore related topics

Course content

Exploratory Data Analysis with Pandas and Python 3.x32 lectures • 5hr 4min

Data Wrangling with Python 3.x38 lectures • 3hr 35min

Scalable Data Analysis in Python with Dask39 lectures • 3hr 31min

Data Visualization Recipes with Python and Matplotlib 334 lectures • 1hr 27min

Requirements

Description

Who this course is for: