- 8 hours on-demand video
- 10 articles
- 19 downloadable resources
- 1 coding exercise
- Full lifetime access
- Access on mobile and TV
- Certificate of Completion
Get your team access to 4,000+ top Udemy courses anytime, anywhere.Try Udemy for Business
- Python Refresher: Review of Data Structures, Conditionals, File Handling
- How Websites are Hosted on Servers; Basic Calls to Server (GET, POST Methods)
- Web Scraping with Python Beautiful Soup and Requests
- Diverse Web Scraping Exercises
- Source codes (*.py files) for all Exercises can be downloaded
- Q&A board to send your questions and get them answered quickly
- [Instructor] Welcome everybody to the course. I'm really glad that all of you are here. Thank you again for checking out this course so, I'll start off explaining what web scraping is, as this course is all about web scraping. So, in simple and brief terms, web scraping is extracting data from the internet. Why do we need to do that? That's a valid question.
Why do we need to scrape data from the internet?
We live in an age where everything is being uploaded on the internet, you know, daily terabytes of data is being uploaded on the internet. And with such an amount of data, you could perform analysis and you could actually improve a lot of things. Like, if you were a businessman and if you're launching your product, you could learn more about the market, e.g. what do people like? How will they respond to your product launch, or how can you improve your product.
All of this information is on the internet so you need some tool, some way to actually get that information out of the internet. One way would be to hire hundreds of persons to do this stuff, or the other way, the smart way would be that you write a computer program which does this for you, and conserves your resources. So, this is what web scraping is about.
The next question is what is this course about?
This course is about how can we access webpages programmatically? We'll be using Python for this course, so this course will teach us how we can parse a webpage, extract required data from that webpage.
Let's say you want to get some photos from some webpage, how are you going to do that? Let's say there are thousands of photos. Don't tell me that you're going to download each of them individually? I mean that is going to take a lot of time, so we will learn how to scrape a web page and extract our required data from our web pages.
Python has this module, which is known as BeautifulSoup, this is a parser for parsing web pages. We'll learn this too in this course. So in this course, we'll also learn how we can interact differently with our web pages and how we can move around and play around with them, you know?
We'll also learn that and you could say that it is an introduction to automation, you know, automating different stuff on the internet. You can say that in simple terms. So we'll be using Selenium for interacting with web pages in this course.
So to sum it up, to sum the answer to this question, I can say that briefly it is about how we extract data from any kind of web page using a programming language, in our case, it is going to be Python.
What will you learn by the end of this course?
At the end of this course, you will have a deep understanding of how web sites and servers function, like how are web sites hosted and how do they send requests to servers? What are servers? What are websites? And how does this communication take place? So you'll have a deeper understanding of how this works, how all of this works.
Then, you'll learn different web scraping and data extraction techniques which are being used worldwide, like the best practices, the worst practices. Things you have to worry, things you have to keep in mind while you are scraping data, and how can you do this in an efficient manner. You will learn this by the end of this course.
You'll also learn how to handle a lot of data and how we can actually parse that data , and get it in the required format we want.
So I really look forward to teaching you this course. I hope that you're excited about this. I think let's get to our first tutorial, so I'll see you soon in the next video. Thank you!
How to write simple list comprehensions. Python supports a concept called "list comprehensions". It can be used to construct lists in a very natural, easy way, like a mathematician is used to do. List comprehensions provide a concise way to create lists. Common applications are to make new lists where each element is the result of
some operations applied to each member of another sequence or iterable, or to create a subsequence of those elements that satisfy a certain condition.
For example, assume you want to create a list of squares, instead of writing a regular for loop, you can use a list comprehension like:
squares = [x**2 for x in range(10)]
Introduction to the Python xlsxwriter module which can be used to create Excel files and write into them. This video discusses on creating and writing data to Excel files using xlsxwriter.
Introduction to Beautiful Soup - a Python module used for parsing HTML.
- Some prior programming experience in Python (e.g. Data Structures and OOP) will help. The course includes a full Python refresher section.
- Complete beginners may wish to take a beginner Python course first, and then transition to this course afterwards.
- This course adopts a step-by-step approach and requires you to open a Python editor, download available *.py code files, and start applying the provided examples and exercises.
- Python 3: Codes of this course are tested on Python 3. It is up to you to adapt them if you want to run them in Python 2.
Web Scraping (also termed Screen Scraping, Web Data Extraction, Web Harvesting, etc.) is a technique for extracting large amounts of data from websites and save the the extracted data to a local file or to a database.
In this course, you will learn how to perform web scraping using Python 3 and the Beautiful Soup, a free open-source library written in Python for parsing HTML.
We will use lxml, which is an extensive library for parsing XML and HTML documents very quickly; it can even handle messed up tags. We will also be using the Requests module instead of the already built-in urllib2 module due to improvements in speed and readability.
The course cover the following topics: accessing web pages programmatically; scraping web pages to extract the required data using Beautiful Soup to parse web pages; interacting with web pages to do different things with them programmatically; and using Selenium for web scraping and when we need it.
By the end of this course, you will be able to understand how websites and servers function, diverse data extraction techniques, and methods of handling and organizing data.
This Web Scraping course covers the following topics:
- Review of data structures (Lists, Dictionaries, Tuples, File Handling)
- How websites are hosted on servers
- Calls to the server (GET, POST methods)
- Review of HTML and CSS
- Requests Module and BeautifulSoup Module overview
- Parsing HTML using BeautifulSoup
- Filtering elements using BeautifulSoup and navigating the Parse Tree
- Selenium and the need for it
- Selecting elements using Selenium
- CSS selectors
- XPath selectors
- Navigating pages using Selenium
- Practical Projects
- Those who want to learn how to use Python for web scraping and data extraction.