Teach on Udemy

Turn what you know into an opportunity and reach millions around the world.

Learn More

Your cart is empty.

Keep shopping

Web Scraping and API Fundamentals in Python

Name: Web Scraping and API Fundamentals in Python
Rating: 4.4 (1526 reviews)

Learn Web Scraping with Beautiful Soup and requests-html; harness APIs whenever available; automate data collection!

Created by365 Careers

Last updated 1/2022

English

English [Auto],

What you'll learn

Learn the fundamentals of Web Scraping
Implement APIs into your applications
Master working with Beautiful Soup
Start using requests-html
Create functioning scrapers
Scrape JavaScript
Familiarize yourself with HTML
Get the hang of CSS Selectors
Make HTTP requests
Understand website cookies
Explore scraping content locked behind a log-in system
Limit the rate of requests

Course content

10 sections • 62 lectures • 3h 55m total length

What does the course cover?4:18
Explore api-driven data collection and web scraping fundamentals in Python, starting with json and http requests, then building with Beautiful Soup and requests-html for real projects.
What is Web Scraping?3:12
Define web scraping and its parts: web crawling and data extraction. Learn how automation saves time and enables API data access later in the course.
What is Web Scraping?
Ethics of Scraping2:55
Explore the ethics of web scraping, including intellectual property, copyright, terms of service, robots.txt, and legal gray areas, and learn to scrape responsibly by respecting site owners.
Ethics of Scraping
Download All Resources0:15

Setting up the environment - Do not skip, please!0:48
Install Anaconda, Python 3, Jupyter Notebook, and the relevant packages to set up the environment for Python-based web scraping.
Why Python and why Jupyter?4:48
Explore why Python and Jupyter underpin data science, highlighting open source, cross-platform availability, rich packages, IPython notebooks with kernels, and how Jupyter enables collaboration.
Installing Anaconda3:07
Install Anaconda to get Python, Jupyter Notebook, and data science packages, choosing Windows (64-bit) with Python 3, and complete the standard installer to launch the Jupyter dashboard.
Jupyter Dashboard - Part 12:27
Navigate the Jupyter dashboard to manage files and folders with checkboxes, rename or delete items, upload notebooks, and create ipynb notebooks with an interactive shell.
Jupyter Dashboard - Part 25:13
Navigate the Jupyter notebook interface, use code cells and markdown cells, execute with ctrl+enter or shift+enter, and master keyboard shortcuts like cut, copy, paste to run cells efficiently.
Installing the packages1:24
Learn which libraries come with Anaconda for web scraping and APIs, including numpy, scipy, pandas, requests, and Beautiful Soup, and how to install requests-html with pip.

API overview3:23
Discover how web based APIs define a contract between client and server, using http requests and json, with public examples, documentation, and concepts like pagination and registration.
API overview
HTTP requests: GET and POST requests2:35
Explore how http requests fetch web content, comparing get and post methods, and understand status codes like 200 and 404 and json data in web APIs.
HTTP requests: GET and POST requests
JSON: preferred data exchange format for APIs2:24
Learn how JSON serves as a language-independent, human- and machine-readable data exchange format for APIs, built on dictionaries and lists, with examples of country data and nested structures.
JSON: preferred data exchange format for APIs
Exchange rates API: GETting a JSON reply4:57
Explore a currency exchange rates API by making a get request with requests in Python, parse JSON with response.json(), and inspect keys like base, date, and rates.
Exchange rates API: GETting a JSON reply
Incorporating parameters in a GET request3:18
Learn to specify get request parameters, including base and symbols, with a question mark, ampersand, and equals sign to fetch targeted exchange rates using the API's JSON response.
Incorporating parameters in a GET request
Additional API functionalities4:39
Explore additional api functionalities by using historical and latest endpoints, building start at and end at dates, and filtering for currencies like GBP, then handle json dumps and errors.
Additional API functionalities
Creating a simple currency converter4:52
Build a python currency converter using the exchange rates API, collecting date, base and target currencies, and quantity, then fetch rates, compute the result, and display it with error handling.
iTunes API4:41
Explore iTunes search API in Python by building requests with term and country using the requests library, handling spaces, and inspecting JSON response keys like result count and results.
iTunes API
iTunes API: Exercise0:12
iTunes API: Structuring and exporting the data2:10
Convert iTunes API results into a pandas data frame and export to csv or xls, using the data frame constructor and a simple workflow, with pagination next time.
iTunes API: Structuring and exporting the data
APIs: Exercise0:14
GitHub API: Pagination4:21
Explore API pagination using the GitHub Jobs API, showing how to fetch 50 results per page, loop through pages with the page parameter, and aggregate results with the extend function.
GitHub API: Pagination
EDAMAM API: Initial setup and registration3:14
Learn to authenticate with the edamame nutritional analysis API by obtaining an id and key and using them in the endpoint, noting free access and 200 requests per month.
EDAMAM API: Initial setup and registration
EDAMAM API: Sending a POST request4:14
Submit a post request to the Edamam API with a json body containing title and engr and a content-type header; inspect the response and export nutrients to csv.
EDAMAM API: Sending a POST request
Downloading files with requests0:48

What is HTML?3:05
Learn the basics of HTML, its structure and syntax, encoding, and how browsers render HTML and provide dev tools for inspecting and scraping web pages.
What is HTML?
Structure of HTML2:36
Structure of HTML
Syntax of HTML. Tags6:20
Explore the basics of HTML syntax, including the HTML element, opening and closing tags, doctype HTML, and the head and body structure, with examples of paragraphs (p) and headings (h1–h6).
Syntax of HTML. Tags
Tag attributes6:00
Learn how tag attributes use name-value pairs to control alignment, links, and element behavior, and examine class and id attributes for precise scraping.
Tag attributes
Popular tags6:27
Learn to identify common HTML head and body tags, metadata usage, and data-structuring elements like div, span, iframe, images, lists, and tables, plus basic CSS and JavaScript concepts.
Popular tags
CSS and JavaScript6:23
Learn how CSS describes the visual presentation of HTML, including inline, internal, and external styles with selectors, and how JavaScript enables interactivity that impacts web scraping.
CSS and JavaScript
Character encoding6:12
Master character encoding from ASCII to Unicode and UTF-8, and learn HTML symbol encoding via named entities and decimal or hexadecimal references, including euro sign and non-breaking space.
Character encoding
XHTML and code style1:48
Explore XHTML and code style, noting that HTML rules are guidelines and sloppy markup can affect scraping, especially on smaller sites, where strict syntax matters.
XHTML and code style

Introduction to the Beautiful Soup package2:04
Learn to use the Beautiful Soup Python package to extract data from HTML, demonstrate real use cases, and scrape information from websites beyond official APIs.
Workflow of Web Scraping6:27
Discover the five-step workflow of web scraping: inspect the page, fetch HTML with a GET request, choose a parser, parse, and build a BeautifulSoup object to navigate the parse tree.
Workflow of Web Scraping
Setting up your first scraper2:54
Practice hands-on web scraping in Python by fetching a Wikipedia music page with requests, building a BeautifulSoup object with a parser, and exporting the HTML to a file.
Searching and navigating the HTML tree6:54
Searching and navigating the HTML tree
Searching the HTML tree by attributes3:30
Master searching the HTML tree by attributes with Beautiful Soup, using find and find_all to filter by id, class, and href, or by an attributes dictionary for robust element selection.
Searching the HTML tree by attributes
Extracting data from the HTML tree3:04
Extract data from the HTML tree with Beautiful Soup by reading a tag's name via attribute dot name, accessing attributes (href, class) via dictionary-like indexing, and using get for defaults.
Extracting text from an HTML tag4:40
Explore extracting text from html tags with BeautifulSoup using the dot string and text attributes, handle nested tags, navigate to parent elements, and iterate strings with stripped_strings for clean text.
Extracting text from an HTML tag
Practical example: dealing with links5:36
Explore hands-on web scraping with Beautiful Soup by extracting and normalizing links, converting relative URLs to absolute using urljoin, and filtering internal links with list comprehensions.
Practical example: Exercise0:26
Extracting data from nested HTML tags4:35
Identify div tags with a role attribute to extract links labeled as main article or see also, then loop through them to build a list of urls using url join.
Scraping multiple pages automatically7:33
Learn to scrape multiple pages automatically by extracting main text from paragraph tags. Loop through each page in Python, handle errors, and build a url-to-text dictionary for clean, usable data.

Setting up your scraper4:16
Use requests and beautiful soup to extract movie data from a rotten tomatoes list, including title, year, score, critics consensus, cast, and director. Switch to the lxml parser for speed.
Extracting the title and year of each movie6:37
Learn to extract movie titles, years, and scores by inspecting the page structure with dev tools and parsing data via Beautiful Soup using list comprehensions.
Extracting the score of each movie: Exercise0:10
Extracting the rest of the information5:58
Learn to scrape critics consensus, cast, and director data from movie pages by parsing HTML with class-based divs, removing the leading phrase, and handling missing director links.
Dealing with the cast of the movies5:17
Extract cast information from movie pages by locating the div class cast, retrieving names through links, and joining them into a single string, then store in a simple structure.
Extracting the rest of the information: Exercise0:07
Storing and exporting the data in a structured form2:41
Store scraped movie data in a structured form with a Pandas data frame, display it fully with set_option, and export to CSV or Excel for easy sharing and analysis.

Introduction to the requests-html package1:35
Explore the requests-html package as a single, simplified alternative to the Beautiful Soup and requests combo, with full JavaScript support to render dynamic pages.
Exploring the capabilities of requests-html for Web Scraping5:27
explore the requests-html package by using an HTML session to fetch and parse a page, extract links with r.html.find, convert to absolute URLs, and apply CSS selectors for data scraping.
Searching for text2:36
Learn to search text between phrases with the search and search all methods in requests HTML, handling patterns, placeholders, and raw HTML.
CSS selectors9:20
Learn how CSS selectors filter and select HTML elements for web scraping in Python, using tag, id, class, and attribute selectors, plus combining and nesting techniques.
CSS selectors
Scraping JavaScript6:13
Learn to render JavaScript with the render method in an async HTML session to fetch dynamic content, then compare pre- and post-render HTML for divs and links.
Scraping JavaScript: Exercise0:34
Completing 100%0:26

Requirements

Python 3 and the Anaconda distribution
Basic Python knowledge
Curiosity and enthusiasm to learn and practice

Description

Are you tired of manually copying and pasting values in a spreadsheet?

Do you want to learn how to obtain interesting, real-time and even rare information from the internet with a simple script?

Are you eager to acquire a valuable skill to stay ahead of the competition in this data-driven world?

If the answer is yes, then you have come to the right place at the right time!

Welcome to Web Scraping and API Fundamentals in Python!

The definitive course on data collection!

Web Scraping is a technique for obtaining information from web pages or other sources of data, such as APIs, through the use of intelligent automated programs. Web Scraping allows us to gather data from potentially hundreds or thousands of pages with a few lines of code.

From reporting to data science, automating extracting data from the web avoids repetitive work. For example, if you have worked in a serious organization, you certainly know that reporting is a recurring topic. There are daily, weekly, monthly, quarterly, and yearly reports. Whether they aim to organize the website data, transactional data, customer data, or even more easy-going information like the weather forecast – reports are indispensable in the current world. And while sometimes it is the intern’s job to take care of that, very few tasks are more cost-saving than the automation of reports.

When it comes to data science – more and more data comes from external sources, like webpages, downloadable files, and APIs. Knowing how to extract and structure that data quickly is an essential skill that will set you apart in the job market.

Yes, it is time to up your game and learn how you can automate the use of APIs and the extraction of useful info from websites.

In the first part of the course, we start with APIs. APIs are specifically designed to provide data to developers, so they are the first place to check when searching for data. We will learn about GET requests, POST requests and the JSON format.

These concepts are all explored through interesting examples and in a straight-to-the-point manner.

Sometimes, however, the information may not be available through the use of an API, but it is contained on a webpage. What can we do in this scenario? Visit the page and write down the data manually?

Please don’t ever do that!

We will learn how to leverage powerful libraries such as ‘Beautiful Soup’ and ‘requests HTML’ to scrape any website out there, no matter what combination of languages are used – HTML, JavaScript, and CSS.

Certainly, in order to scrape, you’ll need to know a thing or two about web development. That’s why we have also included an optional section that covers the basics of HTML. Consider that a bonus to all the knowledge you will acquire!

We will also explore several scraping projects. We will obtain and structure data about movies from a “Rotten Tomatoes” rank list, examining each step of the process in detail. This will help you develop a feel for what scraping is like in the real world.

We’ll also tackle how to scrape data from many webpages at once, an all-to-common need when it comes to data extraction.

And then it will be your turn to practice what you’ve learned with several projects we'll set out for you.

But there’s even more!

Web Scraping may not always go as planned (after all, that’s why you will be taking this course). Different websites are built in different ways and often our bots may be obstructed. Because of this, we will make an extra effort to explore common roadblocks that you may encounter while scraping and present you with ways to circumnavigate or deal with those problems. These include request headers and cookies, log-in systems and JavaScript generated content.

Don’t worry if you are familiar with few or none of these terms… We will start from the basics and build our way to proficiency. Moreover, we are firm believers that practice makes perfect, so this course is not so much on the theory side of things, as it adopts more of a hands-on approach. What’s more, it contains plenty of homework exercises, downloadable files and notebooks, as well as quiz questions and course notes.

We, the 365 Data Science Team are committed to providing only the highest quality content to you – our students. And while we love creating our content in-house, this time we’ve decided to team up with a true industry expert - Andrew Treadway. Andrew is a Senior Data Scientist for the New York Life Insurance Company. He holds a Master’s degree in Computer Science with Machine learning from the Georgia Institute of Technology and is an outstanding professional with more than 7 years of experience in data-related Python programming. He’s also the author of the ‘yahoo_fin’ package, widely used for scraping historical stock price data from Yahoo.

As with all of our courses, you have a 30-day money-back guarantee, if at some point you decide that the training isn’t the best fit for you. So… you’ve got nothing to lose – and everything to gain ?

So, what are you waiting for?

Click the ‘Buy now’ button and let’s start collecting data together!

Who this course is for:

You should take this course if you want to learn how to use APIs
This course is for you if you want to learn how to scrape websites
Anyone who wants to learn how to automate the boring and mundane everyday tasks
Individuals who are curious and passionate about data
The course is ideal for beginners to programming who want to learn Beautiful Soup and requests-html

Web Scraping and API Fundamentals in Python

What you'll learn

Explore related topics

Course content

Introduction to the course4 lectures • 11min

Setting up the environment6 lectures • 18min

Working with APIs15 lectures • 46min

HTML overview8 lectures • 39min

Web Scraping with Beautiful Soup11 lectures • 48min

Practical project: Scraping Rotten Tomatoes7 lectures • 25min

Scraping HTML tables1 lecture • 5min

Practical projects2 lectures • 1min

Common roadblocks when scraping1 lecture • 13min

The requests-html package7 lectures • 26min

Requirements

Description

Who this course is for: