Learning Path: Python: Effective Data Analysis Using Python
4.1 (7 ratings)
Instead of using a simple lifetime average, Udemy calculates a course's star rating by considering a number of different factors such as the number of ratings, the age of ratings, and the likelihood of fraudulent ratings.
208 students enrolled
Wishlisted Wishlist

Please confirm that you want to add Learning Path: Python: Effective Data Analysis Using Python to your Wishlist.

Add to Wishlist

Learning Path: Python: Effective Data Analysis Using Python

Use Pythons tools & libraries effectively for extracting data from web & creating attractive & informative visualization
4.1 (7 ratings)
Instead of using a simple lifetime average, Udemy calculates a course's star rating by considering a number of different factors such as the number of ratings, the age of ratings, and the likelihood of fraudulent ratings.
208 students enrolled
Created by Packt Publishing
Last updated 4/2017
Current price: $10 Original price: $200 Discount: 95% off
5 hours left at this price!
30-Day Money-Back Guarantee
  • 10.5 hours on-demand video
  • 1 Supplemental Resource
  • Full lifetime access
  • Access on mobile and TV
  • Certificate of Completion
What Will I Learn?
  • Scrape the Twitter stream to collect real-time data
  • Predictive methods that can forecast and predict future trends based on current data
  • Use the Selenium module and scrape with Selenium
  • Discover how to perform parsing with BeautifulSoup
  • Make 3D visualizations mainly using mplot3d
View Curriculum
  • A computer
  • Internet connection
  • Good hold on the basics of Python

Over the years, almost every organization has understood the importance of analyzing data.

In fact, it would not be an overstatement to say that “No organization will be able to survive today’s cut-throat competition if it does not analyze data.

Data analysis as we know it is the process of taking the source data, refining it to get useful information, and then making useful predictions from it.

In this Learning Path, we will learn how to analyze data using the powerful toolset provided by Python.

Packt’s Video Learning Paths are a series of individual video products put together in a logical and stepwise manner such that each video builds on the skills learned in the video before it.

Python features numerous numerical and mathematical toolkits such as Numpy, Scipy, Scikit learn, and SciKit, all used for data analysis and machine learning. With the aid of all of these, Python has become the language of choice of data scientists for data analysis, visualization, and machine learning.

We will have a general look at data analysis and then discuss the web scraping tools and techniques in detail. We will show a rich collection of recipes that will come in handy when you are scraping a website using Python, addressing your usual and unusual problems while scraping websites by diving deep into the capabilities of Python’s web scraping tools such as Selenium, BeautifulSoup, and urllib2.

We will then discuss the visualization best practices. Effective visualization helps you get better insights from your data, and help you make better and more informed business decisions.

After completing this Learning Path, you will be well-equipped to extract data even from dynamic and complex websites by using Python web scraping tools, and get a better understanding of the data visualization concepts. You will also learn how to apply these concepts and overcome any challenge while implementing them.

To ensure that you get the best of the learning experience, in this Learning Path we combine the works of some of the leading authors in the business.

About the authors

Benjamin Hoff spent 3 years working as a software engineer and team leader doing graphics processing, desktop application development, and scientific facility simulation using a mixture of C++ and Python. This sparked a passion for software development and developmental programming and led him to explore state-of-the art projects in natural language processing, facial detection/recognition, and machine learning.

Charles Clayton is a sole proprietor of crclayton technologies co, and an independent web developer. He is an experienced developer and Python specialist in Python web scraping solutions and tools such as Selenium, BeautifulSoup, and urllib2. He also has worked as a Reliability Engineer with West frazweer.

Dimitry Foures is a data scientist with a background in applied mathematics and theoretical physics. After completing his physics undergraduate studies in ENS Lyon (France), he studied fluid mechanics at École Polytechnique in Paris where he obtained first class in Master’s degree. He holds a PhD in applied mathematics from the University of Cambridge. He currently works as a data scientist for a smart energy startup in Cambridge, in close collaboration with the university.

Giuseppe Vettigli is a data scientist who has worked in the research industry and academia for many years. His work is focused on the development of machine learning models and applications to use information from structured and unstructured data. He also writes about scientific computing and data visualization in Python in his blogs.

Igor Milovanović is an experienced developer, with strong background in Linux system knowledge and software engineering education. He is skilled in building scalable data-driven distributed software rich systems.

Who is the target audience?
  • This course is ideal for those who are new to data analysis and for those who are already into data analytics and want to enhance their data extraction and visualization skills.
Compare to Other Python Courses
Curriculum For This Course
86 Lectures
Learning Python Data Analysis
24 Lectures 05:33:38

This video provides an overview of the entire course.

Preview 03:55

The aim of this video is to introduce us to Python.

Getting started with Python

We will learn how to collect and store the data.

Getting Data using the Twitter API

We will explore how to collect and store twitter tweets.

Collecting and Storing Tweets

We will talk about database design.

Database Design

We will explore Pandas and other databases.

Pandas and Databases

We will explore the concepts of Panda series, data frames and columnar operations.

Preview 21:21

We will take a look operations and how to exactly work with columns.

Grouping Operations and Working with Date Columns

We will explore how to merge various operations and learn how to export data to JSON/CSV.

Merging Operations and Exporting data to JSON/CSV

We will take a look at what exactly arrays are, their different types, and histogram functions.

Preview 21:02

See exactly what simple aggregations are.

Simple Aggregations

We will explore the concept of linear algebra.

Linear Algebra

We will learn how to present stories via simple visualizations and representations. 

Preview 31:47

We will learn the different types of graphical representations. 

Creating Charts

We will learn how to create Simple XY plots and axis scales.

Simple XY Plots with Axis Scales

We will learn how to handle text data. 

Preview 18:53

We will find out exactly what do we mean by Bag of words.

Bag of Words

We will learn how to classify words.

Classification of Words

We will take a look at stemming of words.


We will use the simple sentiment analysis using scrapped tweets.

Simple Sentiment Analysis

We will learn how to group dimensions and also take a look at the different types of data that is generated.

Preview 25:04

We will take a look at New metrics and dimensions will be derived to get hidden insights.

Trend Analysis and Deriving New Metrics

We will take a look at the concept of co-relation analysis.

Correlation Analysis

We will briefly go over what we covered in the course and also take a glimpse at what the future holds for us.

Course Summary
Getting Started with Python Web Scraping
12 Lectures 01:35:54

This video provides an overview of the entire course.

Preview 02:44

This video aims to explain the course’s expected prerequisite knowledge and system requirements, then introduce the concept of web scraping, situations in which you may want to use it,and why it is a valuable skill to know.

When to Web Scrape

Without understanding the foundations of web development, it is challenging to write efficient and robust web scraping scripts, so we will cover how a website is structured and how to locate data with precision.

What Makes up a Website

In order to query a website to scrape data from it, we need to see how the website is structured in its underlying code. We also need an application that will let us test our queries.To do this, we will learn about the element explorer and console of the Chrome Developer Tools.

How to Interact with a Website

Now we know how to create CSS selectors and use the Chrome developer tools to look at HTML and construct a query, but how do we turn this into a Python script? We use the selenium module and a web driver.

Using the Selenium Module

Now that we know how to web scrape with Python, we need to be aware of the ethical and legal ramifications associated with web scraping. Mainly, the solution is to be considerate and use common sense.

Ethical Web Scraping

BeautifulSoup cannot work alone. Although it’s a great tool for parsing and organizing a website’s HTML, it doesn’t get the HTML for us, so we have to figure out another method to request a website’s HTML.

Preview 09:13

So, now we have some HTML strings loaded in Python, but how can we use BeautifulSoup to intelligently start selecting important data from it?

Using the BeautifulSoup Module

The aim of the video is to show an example on how to parse a webpage. For eg, Wikipedia.

Example: Parsing Wikipedia

Is writing a web-scraping script always the right method, or are there better alternative solutions?

Preview 04:24

If not through web scraping, how can we get the information using an API with Python?

Introduction to APIs

Some APIs require authentication and they require multiple parameters. How do we integrate these into our script?

Working with APIs
Python Data Visualization Solutions
50 Lectures 03:26:54

This section gives an overview of the entire course

Preview 03:38

Importing data from csv into Python can be a bit tricky. It needs careful inspection and appropriate functions. Let's see how we can do that.

Importing Data from CSV

When we are automating a data pipe for many files, we are not in a position to convert an Excel file into CSV and then import it. This video shows us how to import data directly from an Excel file.

Importing Data from Microsoft Excel Files

We've learned how to import data from CSV and Excel. But how do we do that with a file that has fixed-width data? Let's explore.

Importing Data from Fix-Width Files

Although tab-delimited format is simple to read as csv files, we need to ensure that certain parameters are there to keep the reading process accurate. Let's explore how we can do that.

Importing Data from Tab Delimited Files

Let's explore how we can import data from a JSON resource like GitHub, and How to get it and process it later. 

Importing Data from a JSON Resource

Modern applications often hold different datasets inside relational databases or other databases like MongoDB, and we have to use these databases to produce beautiful graphs. This video will show us how to use SQL drivers from Python to access data. 

Importing Data from a Database

Data coming from the real world needs cleaning before processing or even visualization. It's not fully automated and we need to understand outliers in order to clean the data. Let's see how we can do that.

Cleaning Up Data from Outliers

In scientific computing, images are often represented as NumPy array data structures. We can import images using various techniques. In this video, we will take a look at using image processing in Python, mainly related to scientific processing and less on the artistic side of image manipulation. 

Importing Image Data into NumPy Arrays

In this video, we will see different ways of generating random number sequences and word sequences. Some of the examples use standard Python modules, and others use NumPy/SciPy functions.

Generating Controlled Random Datasets

Data that comes from different real-life sensors is not smooth; it contains some noise that we don't want to show on diagrams and plots. In this video, we introduce a few advanced algorithms to help with cleaning of data coming from real-world sources.

Smoothing Noise in Real-World Data

There are different plots used for representing data differently. In this video, we'll compare them and understand advanced concepts in data visualization. We would also plot sine and cosine plots and customize them.

Preview 07:53

Now that we've learned the concepts of basic plotting and customizing, this video will show us a variety of useful axis properties that we can configure in matplotlib to define axis lengths and limits.

Defining Axis Lengths and Limits

There are different kinds of audiences to whom the data is presented. Having lines set up distinct enough for target audiences for example, vivid colors for young audience leaves a great impact on the viewer. This video shows how we can change various line properties such as styles, colors, or width.

Defining Plot Line Styles, Properties, and Format Strings

As we now know how to change various line properties such as styles, colors, and width, this video will guide us with adding more data to our figure and charts by setting axis and line properties.

Setting Ticks, Labels, and Grids

Legends and annotations explain data plots clearly and in context. By assigning each plot a short description about what data it represents, we enable an easier model for the viewer. This video will show how to annotate specific points on our figures and how to create and position data legends.  

Adding Legends and Annotations

Spines define data area boundaries; they connect the axis tick marks. There are four spines. We can place them wherever we want. As they are placed on the border of the axis, we see a box around our data plot. This video will demonstrate how to move spines to the center.

Moving Spines to Center

Histograms are often used in image manipulation software as a way to visualize image properties such as distribution of light in a particular color channel. This video will help us create histograms in 2D.

Making Histograms

To visualize the uncertainty of measurement in our dataset or to indicate the error, we can use error bars. Error bars can easily give an idea of how error free the dataset is. In this video, we will see how to create bar charts and how to draw error bars.

Making Bar Charts with Error Bars

Pie charts are special in many ways, the most important being that the dataset they display must sum up to 100 percent or they are just not valid. Let's explore how we can create pie charts to represent data in a better way.

Making Pie Charts Count

The matplotlib library allows us to fill areas in between and under the curves with color so that we can display the value of that area to the viewer. In this video, we will learn how to fill the area under a curve or in between two different curves. 

Plotting with Filled Areas

If you have two variables and want to spot the correlation between those, a scatter plot may be the solution to spot patterns. This type of plot is also very useful as a start for more advanced visualizations of multidimensional data. Let's see how to create a scatter plot.

Drawing Scatter Plots with Colored Markers

To be able to distinguish one particular plot line in the figure, we need to add a shadow effect. 

Preview 03:55

Adding a data table beside our chart helps to visualize information.

Adding a Data Table to the Figure

You can create custom subplot configurations on your plots in this video. 

Using Subplots

To spot differences in patterns and compare plots visually in the figure, we need to customize our grids.

Customizing Grids

To display isolines, we create contour plots.

Creating Contour Plots

To distinguish clearly between two different plots, we fill the areas with different patterns.

Filling an Under-Plot Area

When the information is radial in nature, we need a polar plot to display information.

Drawing Polar Plots

You will learn how to visualize a real-world task in this video. 

Visualizing the filesystem Tree Using a Polar Bar

You must be curious to plot 3D data after getting your hands on 2D. Python provides a toolkit called mplot3d in matplotlib for this. Let's go ahead and explore its working! 

Preview 05:32

Similar to 3D bars, you might want to create 3D histograms since these are useful for easily spotting correlations between three independent variables. Let us now dive into it! 

Creating 3D Histograms

This video will walk you through graphics rendering with OpenGL. So let's go ahead and do it! 

Animating with OpenGL

Images can be used to highlight the strengths of your visualization in addition to pure data values. It maps deeper into the viewer's mental model, thereby helping the viewer to remember the visualizations better and for a longer time. Let's see how we could use them in Python!

Preview 06:17

This video will walk you through how you can make simple yet effective usage of the Python matplotlib library to process image channels and display the per-channel histogram of an external image.

Displaying Images with Other Plots in the Figure

The best geospatial visualizations are done by overlaying data on the map. This video will show you how to project data on a map using matplotlib's Basemap toolkit. Let's dive into it!

Plotting Data on a Map Using Basemap

This video will take you through the generation of random images to tell humans and computers apart. Let's do it! 

Generating CAPTCHA

With the logarithmic scale, the ratio of consecutive values is constant. This is important when we are trying to read log plots. Let us step ahead and see how to perform it! 

Preview 05:18

In this video we will discuss how to create a stem plot which will display data as lines extending from a baseline along the x-axis.

Creating a Stem Plot

In this video we will visualize wind patterns or liquid flow, and we will use uniform representation of the vector field for this. So, let's go ahead and do it! 

Drawing Streamlines of Vector Flow

Color-coding the data can have great impact on how your visualizations are perceived by the viewer, as they come with assumptions about colors and what colors represent. This video will walk you through the steps showing the use of colormaps! 

Using Colormaps

If we want to take a quick look at the data and see if there is any correlation, we would draw a quick scatter plot.Iin this video, you will understand scatter plots. 

Using Scatter Plots and Histograms

If you have two different datasets from two different observations, you want to know if those two event sets are correlated. You want to cross-correlate them and see if they match in any way. This video will let you achieve this goal!

Plotting the Cross Correlation Between Two Variables

How you could predict the growth of stock dividends? In this video we will dive into some interesting steps which will let you understand the importance of autocorrelation for this prediction! 

The Importance of Autocorrelation

Let's look into how to visualize two-dimensional vector quantities such as speed and direction of wind!

Preview 06:23

How will you visually compare several similar data series? This video will walk you through making a box-and-whisker plot which achieves this goal! 

Making a Box-and-Whisker Plot

One form of very widely used visualization of time-based data is a Gantt chart. Let us see how to work with it!

Making Gantt Charts

Error bars are useful to display the dispersion of data on a plot. So, let's explore their use in Python for data visualization.

Making Error Bars

This video will let you explore more features of text manipulation in matplotlib, giving a powerful toolkit for even advanced typesetting needs. Let's dive into it.

Making Use of Text and Font Properties

This video will explain some of the programming interfaces in matplotlib and make a comparison of pyplot and object-oriented API. Let us now explore it! 

Understanding the Difference between pyplot and OO API
About the Instructor
Packt Publishing
3.9 Average rating
8,249 Reviews
59,041 Students
687 Courses
Tech Knowledge in Motion

Packt has been committed to developer learning since 2004. A lot has changed in software since then - but Packt has remained responsive to these changes, continuing to look forward at the trends and tools defining the way we work and live. And how to put them to work.

With an extensive library of content - more than 4000 books and video courses -Packt's mission is to help developers stay relevant in a rapidly changing world. From new web frameworks and programming languages, to cutting edge data analytics, and DevOps, Packt takes software professionals in every field to what's important to them now.

From skills that will help you to develop and future proof your career to immediate solutions to every day tech challenges, Packt is a go-to resource to make you a better, smarter developer.

Packt Udemy courses continue this tradition, bringing you comprehensive yet concise video courses straight from the experts.