Over the years, almost every organization has understood the importance of analyzing data.
In fact, it would not be an overstatement to say that “No organization will be able to survive today’s cut-throat competition if it does not analyze data.”
Data analysis as we know it is the process of taking the source data, refining it to get useful information, and then making useful predictions from it.
In this Learning Path, we will learn how to analyze data using the powerful toolset provided by Python.
Packt’s Video Learning Paths are a series of individual video products put together in a logical and stepwise manner such that each video builds on the skills learned in the video before it.
Python features numerous numerical and mathematical toolkits such as Numpy, Scipy, Scikit learn, and SciKit, all used for data analysis and machine learning. With the aid of all of these, Python has become the language of choice of data scientists for data analysis, visualization, and machine learning.
We will have a general look at data analysis and then discuss the web scraping tools and techniques in detail. We will show a rich collection of recipes that will come in handy when you are scraping a website using Python, addressing your usual and unusual problems while scraping websites by diving deep into the capabilities of Python’s web scraping tools such as Selenium, BeautifulSoup, and urllib2.
We will then discuss the visualization best practices. Effective visualization helps you get better insights from your data, and help you make better and more informed business decisions.
After completing this Learning Path, you will be well-equipped to extract data even from dynamic and complex websites by using Python web scraping tools, and get a better understanding of the data visualization concepts. You will also learn how to apply these concepts and overcome any challenge while implementing them.
To ensure that you get the best of the learning experience, in this Learning Path we combine the works of some of the leading authors in the business.
About the authors
Benjamin Hoff spent 3 years working as a software engineer and team leader doing graphics processing, desktop application development, and scientific facility simulation using a mixture of C++ and Python. This sparked a passion for software development and developmental programming and led him to explore state-of-the art projects in natural language processing, facial detection/recognition, and machine learning.
Charles Clayton is a sole proprietor of crclayton technologies co, and an independent web developer. He is an experienced developer and Python specialist in Python web scraping solutions and tools such as Selenium, BeautifulSoup, and urllib2. He also has worked as a Reliability Engineer with West frazweer.
Dimitry Foures is a data scientist with a background in applied mathematics and theoretical physics. After completing his physics undergraduate studies in ENS Lyon (France), he studied fluid mechanics at École Polytechnique in Paris where he obtained first class in Master’s degree. He holds a PhD in applied mathematics from the University of Cambridge. He currently works as a data scientist for a smart energy startup in Cambridge, in close collaboration with the university.
Giuseppe Vettigli is a data scientist who has worked in the research industry and academia for many years. His work is focused on the development of machine learning models and applications to use information from structured and unstructured data. He also writes about scientific computing and data visualization in Python in his blogs.
Igor Milovanović is an experienced developer, with strong background in Linux system knowledge and software engineering education. He is skilled in building scalable data-driven distributed software rich systems.
The aim of this video is to introduce us to Python.
We will learn how to collect and store the data.
We will explore how to collect and store twitter tweets.
We will talk about database design.
We will explore Pandas and other databases.
We will explore the concepts of Panda series, data frames and columnar operations.
We will take a look operations and how to exactly work with columns.
We will explore how to merge various operations and learn how to export data to JSON/CSV.
We will take a look at what exactly arrays are, their different types, and histogram functions.
See exactly what simple aggregations are.
We will explore the concept of linear algebra.
We will learn how to present stories via simple visualizations and representations.
We will learn the different types of graphical representations.
We will learn how to create Simple XY plots and axis scales.
We will find out exactly what do we mean by Bag of words.
We will learn how to classify words.
We will take a look at stemming of words.
We will use the simple sentiment analysis using scrapped tweets.
We will learn how to group dimensions and also take a look at the different types of data that is generated.
We will take a look at New metrics and dimensions will be derived to get hidden insights.
We will take a look at the concept of co-relation analysis.
We will briefly go over what we covered in the course and also take a glimpse at what the future holds for us.
This video aims to explain the course’s expected prerequisite knowledge and system requirements, then introduce the concept of web scraping, situations in which you may want to use it,and why it is a valuable skill to know.
Without understanding the foundations of web development, it is challenging to write efficient and robust web scraping scripts, so we will cover how a website is structured and how to locate data with precision.
In order to query a website to scrape data from it, we need to see how the website is structured in its underlying code. We also need an application that will let us test our queries.To do this, we will learn about the element explorer and console of the Chrome Developer Tools.
Now we know how to create CSS selectors and use the Chrome developer tools to look at HTML and construct a query, but how do we turn this into a Python script? We use the selenium module and a web driver.
Now that we know how to web scrape with Python, we need to be aware of the ethical and legal ramifications associated with web scraping. Mainly, the solution is to be considerate and use common sense.
BeautifulSoup cannot work alone. Although it’s a great tool for parsing and organizing a website’s HTML, it doesn’t get the HTML for us, so we have to figure out another method to request a website’s HTML.
So, now we have some HTML strings loaded in Python, but how can we use BeautifulSoup to intelligently start selecting important data from it?
The aim of the video is to show an example on how to parse a webpage. For eg, Wikipedia.
Is writing a web-scraping script always the right method, or are there better alternative solutions?
If not through web scraping, how can we get the information using an API with Python?
Some APIs require authentication and they require multiple parameters. How do we integrate these into our script?
Importing data from csv into Python can be a bit tricky. It needs careful inspection and appropriate functions. Let's see how we can do that.
When we are automating a data pipe for many files, we are not in a position to convert an Excel file into CSV and then import it. This video shows us how to import data directly from an Excel file.
We've learned how to import data from CSV and Excel. But how do we do that with a file that has fixed-width data? Let's explore.
Although tab-delimited format is simple to read as csv files, we need to ensure that certain parameters are there to keep the reading process accurate. Let's explore how we can do that.
Let's explore how we can import data from a JSON resource like GitHub, and How to get it and process it later.
Modern applications often hold different datasets inside relational databases or other databases like MongoDB, and we have to use these databases to produce beautiful graphs. This video will show us how to use SQL drivers from Python to access data.
Data coming from the real world needs cleaning before processing or even visualization. It's not fully automated and we need to understand outliers in order to clean the data. Let's see how we can do that.
In scientific computing, images are often represented as NumPy array data structures. We can import images using various techniques. In this video, we will take a look at using image processing in Python, mainly related to scientific processing and less on the artistic side of image manipulation.
In this video, we will see different ways of generating random number sequences and word sequences. Some of the examples use standard Python modules, and others use NumPy/SciPy functions.
Data that comes from different real-life sensors is not smooth; it contains some noise that we don't want to show on diagrams and plots. In this video, we introduce a few advanced algorithms to help with cleaning of data coming from real-world sources.
There are different plots used for representing data differently. In this video, we'll compare them and understand advanced concepts in data visualization. We would also plot sine and cosine plots and customize them.
Now that we've learned the concepts of basic plotting and customizing, this video will show us a variety of useful axis properties that we can configure in matplotlib to define axis lengths and limits.
There are different kinds of audiences to whom the data is presented. Having lines set up distinct enough for target audiences for example, vivid colors for young audience leaves a great impact on the viewer. This video shows how we can change various line properties such as styles, colors, or width.
As we now know how to change various line properties such as styles, colors, and width, this video will guide us with adding more data to our figure and charts by setting axis and line properties.
Legends and annotations explain data plots clearly and in context. By assigning each plot a short description about what data it represents, we enable an easier model for the viewer. This video will show how to annotate specific points on our figures and how to create and position data legends.
Spines define data area boundaries; they connect the axis tick marks. There are four spines. We can place them wherever we want. As they are placed on the border of the axis, we see a box around our data plot. This video will demonstrate how to move spines to the center.
Histograms are often used in image manipulation software as a way to visualize image properties such as distribution of light in a particular color channel. This video will help us create histograms in 2D.
To visualize the uncertainty of measurement in our dataset or to indicate the error, we can use error bars. Error bars can easily give an idea of how error free the dataset is. In this video, we will see how to create bar charts and how to draw error bars.
Pie charts are special in many ways, the most important being that the dataset they display must sum up to 100 percent or they are just not valid. Let's explore how we can create pie charts to represent data in a better way.
The matplotlib library allows us to fill areas in between and under the curves with color so that we can display the value of that area to the viewer. In this video, we will learn how to fill the area under a curve or in between two different curves.
If you have two variables and want to spot the correlation between those, a scatter plot may be the solution to spot patterns. This type of plot is also very useful as a start for more advanced visualizations of multidimensional data. Let's see how to create a scatter plot.
To be able to distinguish one particular plot line in the figure, we need to add a shadow effect.
Adding a data table beside our chart helps to visualize information.
You can create custom subplot configurations on your plots in this video.
To spot differences in patterns and compare plots visually in the figure, we need to customize our grids.
To display isolines, we create contour plots.
To distinguish clearly between two different plots, we fill the areas with different patterns.
When the information is radial in nature, we need a polar plot to display information.
You will learn how to visualize a real-world task in this video.
You must be curious to plot 3D data after getting your hands on 2D. Python provides a toolkit called mplot3d in matplotlib for this. Let's go ahead and explore its working!
Similar to 3D bars, you might want to create 3D histograms since these are useful for easily spotting correlations between three independent variables. Let us now dive into it!
This video will walk you through graphics rendering with OpenGL. So let's go ahead and do it!
Images can be used to highlight the strengths of your visualization in addition to pure data values. It maps deeper into the viewer's mental model, thereby helping the viewer to remember the visualizations better and for a longer time. Let's see how we could use them in Python!
This video will walk you through how you can make simple yet effective usage of the Python matplotlib library to process image channels and display the per-channel histogram of an external image.
The best geospatial visualizations are done by overlaying data on the map. This video will show you how to project data on a map using matplotlib's Basemap toolkit. Let's dive into it!
This video will take you through the generation of random images to tell humans and computers apart. Let's do it!
With the logarithmic scale, the ratio of consecutive values is constant. This is important when we are trying to read log plots. Let us step ahead and see how to perform it!
In this video we will discuss how to create a stem plot which will display data as lines extending from a baseline along the x-axis.
In this video we will visualize wind patterns or liquid flow, and we will use uniform representation of the vector field for this. So, let's go ahead and do it!
Color-coding the data can have great impact on how your visualizations are perceived by the viewer, as they come with assumptions about colors and what colors represent. This video will walk you through the steps showing the use of colormaps!
If we want to take a quick look at the data and see if there is any correlation, we would draw a quick scatter plot.Iin this video, you will understand scatter plots.
If you have two different datasets from two different observations, you want to know if those two event sets are correlated. You want to cross-correlate them and see if they match in any way. This video will let you achieve this goal!
How you could predict the growth of stock dividends? In this video we will dive into some interesting steps which will let you understand the importance of autocorrelation for this prediction!
Let's look into how to visualize two-dimensional vector quantities such as speed and direction of wind!
How will you visually compare several similar data series? This video will walk you through making a box-and-whisker plot which achieves this goal!
One form of very widely used visualization of time-based data is a Gantt chart. Let us see how to work with it!
Error bars are useful to display the dispersion of data on a plot. So, let's explore their use in Python for data visualization.
This video will let you explore more features of text manipulation in matplotlib, giving a powerful toolkit for even advanced typesetting needs. Let's dive into it.
This video will explain some of the programming interfaces in matplotlib and make a comparison of pyplot and object-oriented API. Let us now explore it!
Packt has been committed to developer learning since 2004. A lot has changed in software since then - but Packt has remained responsive to these changes, continuing to look forward at the trends and tools defining the way we work and live. And how to put them to work.
With an extensive library of content - more than 4000 books and video courses -Packt's mission is to help developers stay relevant in a rapidly changing world. From new web frameworks and programming languages, to cutting edge data analytics, and DevOps, Packt takes software professionals in every field to what's important to them now.
From skills that will help you to develop and future proof your career to immediate solutions to every day tech challenges, Packt is a go-to resource to make you a better, smarter developer.
Packt Udemy courses continue this tradition, bringing you comprehensive yet concise video courses straight from the experts.