Web Scraping with Python: BeautifulSoup, Requests & Selenium

Name: Web Scraping with Python: BeautifulSoup, Requests & Selenium
Rating: 4.3 (927 reviews)

Web Scraping and Crawling with Python: Beautiful Soup, Requests & Selenium

Created byGoTrained Academy, Waqar Ahmed

Last updated 12/2018

English

EnglishItalian [Auto],

What you'll learn

Python Refresher: Review of Data Structures, Conditionals, File Handling
How Websites are Hosted on Servers; Basic Calls to Server (GET, POST Methods)
Web Scraping with Python Beautiful Soup and Requests
Using Selenium to handle JavaScript and AJAX
Diverse Web Scraping Exercises
Source codes (*.py files) for all Exercises can be downloaded
Q&A board to send your questions and get them answered quickly

Coding Exercises

This course includes our updated coding exercises so you can practice your skills as you learn.

Course content

19 sections • 72 lectures • 7h 55m total length

Web Scraping Course Overview4:22
- [Instructor] Welcome everybody to the course. I'm really glad that all of you are here. Thank you again for checking out this course so, I'll start off explaining what web scraping is, as this course is all about web scraping. So, in simple and brief terms, web scraping is extracting data from the internet. Why do we need to do that? That's a valid question.
Why do we need to scrape data from the internet?
We live in an age where everything is being uploaded on the internet, you know, daily terabytes of data is being uploaded on the internet. And with such an amount of data, you could perform analysis and you could actually improve a lot of things. Like, if you were a businessman and if you're launching your product, you could learn more about the market, e.g. what do people like? How will they respond to your product launch, or how can you improve your product.
All of this information is on the internet so you need some tool, some way to actually get that information out of the internet. One way would be to hire hundreds of persons to do this stuff, or the other way, the smart way would be that you write a computer program which does this for you, and conserves your resources. So, this is what web scraping is about.
The next question is what is this course about?
This course is about how can we access webpages programmatically? We'll be using Python for this course, so this course will teach us how we can parse a webpage, extract required data from that webpage.
Let's say you want to get some photos from some webpage, how are you going to do that? Let's say there are thousands of photos. Don't tell me that you're going to download each of them individually? I mean that is going to take a lot of time, so we will learn how to scrape a web page and extract our required data from our web pages.
Python has this module, which is known as BeautifulSoup, this is a parser for parsing web pages. We'll learn this too in this course. So in this course, we'll also learn how we can interact differently with our web pages and how we can move around and play around with them, you know?
We'll also learn that and you could say that it is an introduction to automation, you know, automating different stuff on the internet. You can say that in simple terms. So we'll be using Selenium for interacting with web pages in this course.
So to sum it up, to sum the answer to this question, I can say that briefly it is about how we extract data from any kind of web page using a programming language, in our case, it is going to be Python.
What will you learn by the end of this course?
At the end of this course, you will have a deep understanding of how web sites and servers function, like how are web sites hosted and how do they send requests to servers? What are servers? What are websites? And how does this communication take place? So you'll have a deeper understanding of how this works, how all of this works.
Then, you'll learn different web scraping and data extraction techniques which are being used worldwide, like the best practices, the worst practices. Things you have to worry, things you have to keep in mind while you are scraping data, and how can you do this in an efficient manner. You will learn this by the end of this course.
You'll also learn how to handle a lot of data and how we can actually parse that data , and get it in the required format we want.
So I really look forward to teaching you this course. I hope that you're excited about this. I think let's get to our first tutorial, so I'll see you soon in the next video. Thank you!

Lists6:39
How to use lists in Python. Revision of Python Lists and commonly used list functions.
Dictionaries9:11
How to use dictionaries in Python. Revision of Python Dictionaries and commonly used dict functions.
Tuples7:56
How to use tuples in Python. Revision of Python Tuples and commonly used tuple functions.
List Comprehensions - Part 17:06
How to write simple list comprehensions. Python supports a concept called "list comprehensions". It can be used to construct lists in a very natural, easy way, like a mathematician is used to do. List comprehensions provide a concise way to create lists. Common applications are to make new lists where each element is the result of
some operations applied to each member of another sequence or iterable, or to create a subsequence of those elements that satisfy a certain condition.
For example, assume you want to create a list of squares, instead of writing a regular for loop, you can use a list comprehension like:
squares = [x**2 for x in range(10)]
List Comprehensions - Part 216:37
Writing complex list comprehensions.
Inline - if else and List Comprehensions3:35
Using if else conditions while writing list comprehensions.
Installing xlrd and XlsxWriter to Read/Write to Excel Files0:19
Wrting to Excel Files10:02
Introduction to the Python xlsxwriter module which can be used to create Excel files and write into them. This video discusses on creating and writing data to Excel files using xlsxwriter.
Reading from Excel Files5:09
Introduction to Python xlrd module which is used to read data from Excel files. This video discusses how we can read Excel files using xlrd.
Python Editor & Other Software1:07
Exercise #1: YOU: Web Scraping Expert0:18

Web Scraping with Beautiful Soup - Overview4:44
Introduction to Beautiful Soup - a Python module used for parsing HTML.
Web Scraping with Beautiful Soup - Overview P.22:56
Understanding the HTML parse tree which is created by Beautiful Soup.
Accessing Tags7:08
How we can access tags in the HTML parse tree generated by Beautiful Soup.
Navigable Strings3:30
Concept of navigable strings in Beautiful Soup and how we can access them.

Requirements

Some prior programming experience in Python (e.g. Data Structures and OOP) will help. The course includes a full Python refresher section.
Complete beginners may wish to take a beginner Python course first, and then transition to this course afterwards.
This course adopts a step-by-step approach and requires you to open a Python editor, download available *.py code files, and start applying the provided examples and exercises.
Python 3: Codes of this course are tested on Python 3. It is up to you to adapt them if you want to run them in Python 2.

Description

Web Scraping (also termed Screen Scraping, Web Data Extraction, Web Harvesting, etc.) is a technique for extracting large amounts of data from websites and save the the extracted data to a local file or to a database.

In this course, you will learn how to perform web scraping using Python 3 and the Beautiful Soup, a free open-source library written in Python for parsing HTML.

We will use lxml, which is an extensive library for parsing XML and HTML documents very quickly; it can even handle messed up tags. We will also be using the Requests module instead of the already built-in urllib2 module due to improvements in speed and readability.

Finally, we will use Selenium alongside Beautiful Soup to crawl AJAX & JavaScript driven pages.

The course cover the following topics: accessing web pages programmatically; scraping web pages to extract the required data using Beautiful Soup to parse web pages; interacting with web pages to do different things with them programmatically; and using Selenium for web scraping and when we need it.

By the end of this course, you will be able to understand how websites and servers function, diverse data extraction techniques, and methods of handling and organizing data.

This Web Scraping course covers the following topics:

Review of data structures (Lists, Dictionaries, Tuples, File Handling)
How websites are hosted on servers
Calls to the server (GET, POST methods)
Review of HTML and CSS
Requests Module and BeautifulSoup Module overview
Parsing HTML using BeautifulSoup
Filtering elements using BeautifulSoup and navigating the Parse Tree
JavaScript and AJAX overview
Selenium and the need for it
Selecting elements using Selenium
CSS selectors
XPath selectors
Navigating pages using Selenium
Practical Projects

Who this course is for:

Those who want to learn how to use Python for web scraping and data extraction.

Web Scraping with Python: BeautifulSoup, Requests & Selenium

What you'll learn

Explore related topics

Coding Exercises

Course content

Web Scraping Course Overview1 lecture • 4min

Python Refresher: Data Structures (Optional)11 lectures • 1hr 8min

How Servers Work2 lectures • 4min

BeautifulSoup Warm-up Exercise1 lecture • 2min

Installing Required Python Packages1 lecture • 3min

Introduction to Requests Python Library3 lectures • 16min

Introduction to Beautiful Soup Python Library4 lectures • 18min

Navigating with Beautiful Soup - Going Down3 lectures • 19min

Navigating with Beautiful Soup - Going Up2 lectures • 12min

Navigating with Beautiful Soup - Going Sideways3 lectures • 13min

Requirements

Description

Who this course is for: