Learning Path: Python: Effective Data Analysis Using Python

Name: Learning Path: Python: Effective Data Analysis Using Python
Rating: 3.9 (20 reviews)

Use Pythons tools & libraries effectively for extracting data from web & creating attractive & informative visualization

Created byPackt Publishing

Last updated 4/2017

English

What you'll learn

Scrape the Twitter stream to collect real-time data
Predictive methods that can forecast and predict future trends based on current data
Use the Selenium module and scrape with Selenium
Discover how to perform parsing with BeautifulSoup
Make 3D visualizations mainly using mplot3d

Course content

3 sections • 86 lectures • 10h 36m total length

The Course Overview3:55
This video provides an overview of the entire course.
Getting started with Python26:22
The aim of this video is to introduce us to Python.
Getting Data using the Twitter API20:47
We will learn how to collect and store the data.
Collecting and Storing Tweets9:26
We will explore how to collect and store twitter tweets.
Database Design10:30
We will talk about database design.
Pandas and Databases5:55
We will explore Pandas and other databases.
Panda Series, Dataframes, and Columnar Operations21:21
We will explore the concepts of Panda series, data frames and columnar operations.
Grouping Operations and Working with Date Columns17:01
We will take a look operations and how to exactly work with columns.
Merging Operations and Exporting data to JSON/CSV14:54
We will explore how to merge various operations and learn how to export data to JSON/CSV.
Array Features, Bucketting Arrays and Histogram Functions21:02
We will take a look at what exactly arrays are, their different types, and histogram functions.
Simple Aggregations21:23
See exactly what simple aggregations are.
Linear Algebra4:29
We will explore the concept of linear algebra.
Introducting PyQT and MatplotLib31:47
We will learn how to present stories via simple visualizations and representations.
Creating Charts7:35
We will learn the different types of graphical representations.
Simple XY Plots with Axis Scales4:47
We will learn how to create Simple XY plots and axis scales.
Introduction to the NTLK Package18:53
We will learn how to handle text data.
Bag of Words
We will find out exactly what do we mean by Bag of words.
Classification of Words9:26
We will learn how to classify words.
Stemming11:53
We will take a look at stemming of words.
Simple Sentiment Analysis5:42
We will use the simple sentiment analysis using scrapped tweets.
Grouping By Dimensions and Classification of Data Types25:04
We will learn how to group dimensions and also take a look at the different types of data that is generated.
Trend Analysis and Deriving New Metrics20:28
We will take a look at New metrics and dimensions will be derived to get hidden insights.
Correlation Analysis17:28
We will take a look at the concept of co-relation analysis.
Course Summary3:30
We will briefly go over what we covered in the course and also take a glimpse at what the future holds for us.

The Course Overview2:44
This video provides an overview of the entire course.
When to Web Scrape2:56
This video aims to explain the course’s expected prerequisite knowledge and system requirements, then introduce the concept of web scraping, situations in which you may want to use it,and why it is a valuable skill to know.
What Makes up a Website9:49
Without understanding the foundations of web development, it is challenging to write efficient and robust web scraping scripts, so we will cover how a website is structured and how to locate data with precision.
How to Interact with a Website8:31
In order to query a website to scrape data from it, we need to see how the website is structured in its underlying code. We also need an application that will let us test our queries.To do this, we will learn about the element explorer and console of the Chrome Developer Tools.
Using the Selenium Module12:11
Now we know how to create CSS selectors and use the Chrome developer tools to look at HTML and construct a query, but how do we turn this into a Python script? We use the selenium module and a web driver.
Ethical Web Scraping4:38
Now that we know how to web scrape with Python, we need to be aware of the ethical and legal ramifications associated with web scraping. Mainly, the solution is to be considerate and use common sense.
Requesting HTML9:13
BeautifulSoup cannot work alone. Although it’s a great tool for parsing and organizing a website’s HTML, it doesn’t get the HTML for us, so we have to figure out another method to request a website’s HTML.
Using the BeautifulSoup Module13:17
So, now we have some HTML strings loaded in Python, but how can we use BeautifulSoup to intelligently start selecting important data from it?
Example: Parsing Wikipedia11:21
The aim of the video is to show an example on how to parse a webpage. For eg, Wikipedia.
Bypassing the Browser4:24
Is writing a web-scraping script always the right method, or are there better alternative solutions?
Introduction to APIs4:59
If not through web scraping, how can we get the information using an API with Python?
Working with APIs11:51
Some APIs require authentication and they require multiple parameters. How do we integrate these into our script?

The Course Overview3:38
This section gives an overview of the entire course
Importing Data from CSV4:32
Importing data from csv into Python can be a bit tricky. It needs careful inspection and appropriate functions. Let's see how we can do that.
Importing Data from Microsoft Excel Files4:45
When we are automating a data pipe for many files, we are not in a position to convert an Excel file into CSV and then import it. This video shows us how to import data directly from an Excel file.
Importing Data from Fix-Width Files3:05
We've learned how to import data from CSV and Excel. But how do we do that with a file that has fixed-width data? Let's explore.
Importing Data from Tab Delimited Files2:23
Although tab-delimited format is simple to read as csv files, we need to ensure that certain parameters are there to keep the reading process accurate. Let's explore how we can do that.
Importing Data from a JSON Resource5:17
Let's explore how we can import data from a JSON resource like GitHub, and How to get it and process it later.
Importing Data from a Database5:08
Modern applications often hold different datasets inside relational databases or other databases like MongoDB, and we have to use these databases to produce beautiful graphs. This video will show us how to use SQL drivers from Python to access data.
Cleaning Up Data from Outliers5:54
Data coming from the real world needs cleaning before processing or even visualization. It's not fully automated and we need to understand outliers in order to clean the data. Let's see how we can do that.
Importing Image Data into NumPy Arrays6:01
In scientific computing, images are often represented as NumPy array data structures. We can import images using various techniques. In this video, we will take a look at using image processing in Python, mainly related to scientific processing and less on the artistic side of image manipulation.
Generating Controlled Random Datasets6:36
In this video, we will see different ways of generating random number sequences and word sequences. Some of the examples use standard Python modules, and others use NumPy/SciPy functions.
Smoothing Noise in Real-World Data4:45
Data that comes from different real-life sensors is not smooth; it contains some noise that we don't want to show on diagrams and plots. In this video, we introduce a few advanced algorithms to help with cleaning of data coming from real-world sources.
Defining Plot Types and Drawing Sine and Cosine Plots7:53
There are different plots used for representing data differently. In this video, we'll compare them and understand advanced concepts in data visualization. We would also plot sine and cosine plots and customize them.
Defining Axis Lengths and Limits5:16
Now that we've learned the concepts of basic plotting and customizing, this video will show us a variety of useful axis properties that we can configure in matplotlib to define axis lengths and limits.
Defining Plot Line Styles, Properties, and Format Strings1:58
There are different kinds of audiences to whom the data is presented. Having lines set up distinct enough for target audiences for example, vivid colors for young audience leaves a great impact on the viewer. This video shows how we can change various line properties such as styles, colors, or width.
Setting Ticks, Labels, and Grids2:42
As we now know how to change various line properties such as styles, colors, and width, this video will guide us with adding more data to our figure and charts by setting axis and line properties.
Adding Legends and Annotations2:33
Legends and annotations explain data plots clearly and in context. By assigning each plot a short description about what data it represents, we enable an easier model for the viewer. This video will show how to annotate specific points on our figures and how to create and position data legends.
Moving Spines to Center1:21
Spines define data area boundaries; they connect the axis tick marks. There are four spines. We can place them wherever we want. As they are placed on the border of the axis, we see a box around our data plot. This video will demonstrate how to move spines to the center.
Making Histograms3:59
Histograms are often used in image manipulation software as a way to visualize image properties such as distribution of light in a particular color channel. This video will help us create histograms in 2D.
Making Bar Charts with Error Bars3:23
To visualize the uncertainty of measurement in our dataset or to indicate the error, we can use error bars. Error bars can easily give an idea of how error free the dataset is. In this video, we will see how to create bar charts and how to draw error bars.
Making Pie Charts Count1:58
Pie charts are special in many ways, the most important being that the dataset they display must sum up to 100 percent or they are just not valid. Let's explore how we can create pie charts to represent data in a better way.
Plotting with Filled Areas1:56
The matplotlib library allows us to fill areas in between and under the curves with color so that we can display the value of that area to the viewer. In this video, we will learn how to fill the area under a curve or in between two different curves.
Drawing Scatter Plots with Colored Markers2:12
If you have two variables and want to spot the correlation between those, a scatter plot may be the solution to spot patterns. This type of plot is also very useful as a start for more advanced visualizations of multidimensional data. Let's see how to create a scatter plot.
Adding a Shadow to the Chart Line3:55
To be able to distinguish one particular plot line in the figure, we need to add a shadow effect.
Adding a Data Table to the Figure2:26
Adding a data table beside our chart helps to visualize information.
Using Subplots3:57
You can create custom subplot configurations on your plots in this video.
Customizing Grids3:04
To spot differences in patterns and compare plots visually in the figure, we need to customize our grids.
Creating Contour Plots3:23
To display isolines, we create contour plots.
Filling an Under-Plot Area2:01
To distinguish clearly between two different plots, we fill the areas with different patterns.
Drawing Polar Plots2:56
When the information is radial in nature, we need a polar plot to display information.
Visualizing the filesystem Tree Using a Polar Bar3:02
You will learn how to visualize a real-world task in this video.
Creating 3D Bars5:32
You must be curious to plot 3D data after getting your hands on 2D. Python provides a toolkit called mplot3d in matplotlib for this. Let's go ahead and explore its working!
Creating 3D Histograms3:12
Similar to 3D bars, you might want to create 3D histograms since these are useful for easily spotting correlations between three independent variables. Let us now dive into it!
Animating with OpenGL6:01
This video will walk you through graphics rendering with OpenGL. So let's go ahead and do it!
Plotting with Images6:17
Images can be used to highlight the strengths of your visualization in addition to pure data values. It maps deeper into the viewer's mental model, thereby helping the viewer to remember the visualizations better and for a longer time. Let's see how we could use them in Python!
Displaying Images with Other Plots in the Figure3:52
This video will walk you through how you can make simple yet effective usage of the Python matplotlib library to process image channels and display the per-channel histogram of an external image.
Plotting Data on a Map Using Basemap5:22
The best geospatial visualizations are done by overlaying data on the map. This video will show you how to project data on a map using matplotlib's Basemap toolkit. Let's dive into it!
Generating CAPTCHA6:36
This video will take you through the generation of random images to tell humans and computers apart. Let's do it!
Understanding Logarithmic Plots5:18
With the logarithmic scale, the ratio of consecutive values is constant. This is important when we are trying to read log plots. Let us step ahead and see how to perform it!
Creating a Stem Plot4:17
In this video we will discuss how to create a stem plot which will display data as lines extending from a baseline along the x-axis.
Drawing Streamlines of Vector Flow3:27
In this video we will visualize wind patterns or liquid flow, and we will use uniform representation of the vector field for this. So, let's go ahead and do it!
Using Colormaps5:16
Color-coding the data can have great impact on how your visualizations are perceived by the viewer, as they come with assumptions about colors and what colors represent. This video will walk you through the steps showing the use of colormaps!
Using Scatter Plots and Histograms4:28
If we want to take a quick look at the data and see if there is any correlation, we would draw a quick scatter plot.Iin this video, you will understand scatter plots.
Plotting the Cross Correlation Between Two Variables3:27
If you have two different datasets from two different observations, you want to know if those two event sets are correlated. You want to cross-correlate them and see if they match in any way. This video will let you achieve this goal!
The Importance of Autocorrelation4:11
How you could predict the growth of stock dividends? In this video we will dive into some interesting steps which will let you understand the importance of autocorrelation for this prediction!
Drawing Barbs6:23
Let's look into how to visualize two-dimensional vector quantities such as speed and direction of wind!
Making a Box-and-Whisker Plot3:36
How will you visually compare several similar data series? This video will walk you through making a box-and-whisker plot which achieves this goal!
Making Gantt Charts3:49
One form of very widely used visualization of time-based data is a Gantt chart. Let us see how to work with it!
Making Error Bars4:40
Error bars are useful to display the dispersion of data on a plot. So, let's explore their use in Python for data visualization.
Making Use of Text and Font Properties3:59
This video will let you explore more features of text manipulation in matplotlib, giving a powerful toolkit for even advanced typesetting needs. Let's dive into it.
Understanding the Difference between pyplot and OO API5:12
This video will explain some of the programming interfaces in matplotlib and make a comparison of pyplot and object-oriented API. Let us now explore it!

Requirements

A computer
Internet connection
Good hold on the basics of Python

Description

Over the years, almost every organization has understood the importance of analyzing data.

In fact, it would not be an overstatement to say that “No organization will be able to survive today’s cut-throat competition if it does not analyze data.”

Data analysis as we know it is the process of taking the source data, refining it to get useful information, and then making useful predictions from it.

In this Learning Path, we will learn how to analyze data using the powerful toolset provided by Python.

Packt’s Video Learning Paths are a series of individual video products put together in a logical and stepwise manner such that each video builds on the skills learned in the video before it.

Python features numerous numerical and mathematical toolkits such as Numpy, Scipy, Scikit learn, and SciKit, all used for data analysis and machine learning. With the aid of all of these, Python has become the language of choice of data scientists for data analysis, visualization, and machine learning.

We will have a general look at data analysis and then discuss the web scraping tools and techniques in detail. We will show a rich collection of recipes that will come in handy when you are scraping a website using Python, addressing your usual and unusual problems while scraping websites by diving deep into the capabilities of Python’s web scraping tools such as Selenium, BeautifulSoup, and urllib2.

We will then discuss the visualization best practices. Effective visualization helps you get better insights from your data, and help you make better and more informed business decisions.

After completing this Learning Path, you will be well-equipped to extract data even from dynamic and complex websites by using Python web scraping tools, and get a better understanding of the data visualization concepts. You will also learn how to apply these concepts and overcome any challenge while implementing them.

To ensure that you get the best of the learning experience, in this Learning Path we combine the works of some of the leading authors in the business.

About the authors

Benjamin Hoff spent 3 years working as a software engineer and team leader doing graphics processing, desktop application development, and scientific facility simulation using a mixture of C++ and Python. This sparked a passion for software development and developmental programming and led him to explore state-of-the art projects in natural language processing, facial detection/recognition, and machine learning.

Charles Clayton is a sole proprietor of crclayton technologies co, and an independent web developer. He is an experienced developer and Python specialist in Python web scraping solutions and tools such as Selenium, BeautifulSoup, and urllib2. He also has worked as a Reliability Engineer with West frazweer.

Dimitry Foures is a data scientist with a background in applied mathematics and theoretical physics. After completing his physics undergraduate studies in ENS Lyon (France), he studied fluid mechanics at École Polytechnique in Paris where he obtained first class in Master’s degree. He holds a PhD in applied mathematics from the University of Cambridge. He currently works as a data scientist for a smart energy startup in Cambridge, in close collaboration with the university.

Giuseppe Vettigli is a data scientist who has worked in the research industry and academia for many years. His work is focused on the development of machine learning models and applications to use information from structured and unstructured data. He also writes about scientific computing and data visualization in Python in his blogs.

Igor Milovanović is an experienced developer, with strong background in Linux system knowledge and software engineering education. He is skilled in building scalable data-driven distributed software rich systems.

Who this course is for:

This course is ideal for those who are new to data analysis and for those who are already into data analytics and want to enhance their data extraction and visualization skills.

Learning Path: Python: Effective Data Analysis Using Python

What you'll learn

Explore related topics

Course content

Learning Python Data Analysis24 lectures • 5hr 34min

Getting Started with Python Web Scraping12 lectures • 1hr 36min

Python Data Visualization Solutions50 lectures • 3hr 27min

Requirements

Description

Who this course is for: