Web scraping is the art of picking out data from a website by looking at the HTML code and identifying patterns that can be used to identify your data. This data can then be gathered and later used for your own analysis.
In this course we will go over the basic of web scraping, learning all about how we can extract data from websites, and all of this is guided along by a work example.
At the end of the course you should be able to go off on your own, and pick out most common websites, and be able to extract all the relevant data you may need just through using Python code.
Here we look at the generals of web-scraping and get introduced to what we will do.
Here we will quickly talk about the other way of scraping data from the web, namely APIs.
The main python libraries that we will be using in this tutorial series.
Here we will cover what the modulus operation does and what use that can be for us later on.
Here we will look at how we can deal with error that appear in our code and how we can work around them, especially if we expect them.
Here we will learn about the dataframe data structure that Pandas provides to see the format that we want to our final data to have.
Here we will make our first HTTP request and cover the possible outcomes.
Here we will test if you can correctly understand HTTP Error Codes. Feel free to search for a list of error codes and use the web as a basis, you don't need to remember them all, just be able to understand them once you get them.
In this tutorial we will look at how we can read the text response that we get when we contact a website.
Here we're going to cover how we can start the use the text response to parse out the data that we're looking for.
Here we will look deeper into the exception cases and see how we should adapt our code to incorporate them.
Having considered the straightforward as well as the exception cases, we can now complete the data parse for one company.
Here we will see where we can get more ticker symbols from, and start by identifying and selecting the range of data that is of interest for us.
Here we will start our process of parsing out the ticker symbols, based on identifying patterns that we see in the website code.
Here we will finish up our method of scraping out company ticker symbols, so that we have a complete, and much larger, set of company ticker symbols to scrape data for.
Here we will quickly recap the extraction of the ticker symbols process, to make sure we understand why we did certain things.
Now that we have a complete set of symbols to scrap with, we can modify our code from before to incorporate these new companies.
Here we will ensure that we have data for all companies and put them into well formatted pandas dataframe data structure.
Here we will go over what we got from our web scraping, and take a look at the final format of our data.
Here we will quickly cover the goal of this section as well as the extra libraries we're going to need to install.
We'll go into a short review of what a recursive function is, using the Fibonacci sequence as our example.
We'll learn how to create a browser instance as well as the basic navigation that happens within it.
Here we'll start to look at and become familiar how the content of the website looks like.
Here we'll learn how we can use elements and XPath to navigate our response data.
Here we'll use what we've learned so far to start parsing out the relevant data.
Now we're going to apply what we've learned so far to identify the direct path to our data.
Now we will use the path that we've identified to navigate through the HTML to our data.
Here we will continue on to get the data out now after we've navigated to it.
We'll combine everything we've produced before to get out our data efficiently and into a nice format.
A recap of our approach and our results.
APIs are the other way of getting data from the web, and make it a lot easier since the data is formatted for us nicely, and all we really have to do is ask for the right data. APIs are usually easier to get data from than web scraping, as we don't need to identify patterns and deal with exception cases to extract valuable data.
Here we will talk about what to do next with APIs
I've worked for over two years in physics research and mathematical analysis. I participated in two international physics competitions, where my two teammates and I won silver and gold. My thesis was in the field of Quantum Biology, focusing on analyzing the behavior of excitons at room temperature with electronic interaction.
Due to my affinity for math and statistics from my studies in physics, I tend towards data mining, processing, and analysis, which are also the things that I find most exciting.
I enjoy learning new methods and developing my skills, and am constantly studying new literature and documentation to find exciting material that can be applied in the field of data analysis.
If you want to keep up with what else I'm doing in the fields of programming, data, and data science, you can check me out at codingwithmax.