Teach on Udemy

Turn what you know into an opportunity and reach millions around the world.

Learn More

Your cart is empty.

Keep shopping

The Ultimate Web Scraping With Python Bootcamp 2024

Name: The Ultimate Web Scraping With Python Bootcamp 2024
Rating: 4.1 (395 reviews)

Learn to extract data from the web with python with just one course, covering selectolax, playwright, scrapy and more

Created byAndy Bek

Last updated 1/2026

English

What you'll learn

Understand the fundamentals of web scraping in python from absolute scratch
Scrape information from static and dynamic websites and extract it to a variety of formats
Intercept and emulate hidden APIs to identify highly productive alternatives to getting your data
Master the requests library for working with HTTP
Parse and extract content from HTML using beautifulsoup, selectolax, and Microsoft Playwright
Master complex CSS selectors including descendant, child, sibling combinators
Understand how the web works, including HTTP, HTML, CSS, and JavaScript
Create scrapy crawlers and practice items, itemloaders and custom pipelines
Integrate scrapy with playwright for highly performant, fine-tuned dynamic website crawling
Practice processing and extracting data to a variety of formats including csv, json, xml, and SQL

Course content

16 sections • 161 lectures • 17h 30m total length

Prerequisites1:19
A Useful Mental Model3:39
All Code Resources0:20

What Is HTTP?2:46
The Request-Response Cycle3:28
Extra: But, This Website Remembers Me5:20
User-Agents3:16
HTTP Verbs2:38
Status Codes6:13
Learn how HTTP status codes classify server responses for web scraping, from 1xx informational to 5xx errors, with examples like 200 OK, 301 moved, and 404 not found.
Headers3:35
Extra: Headers Do Lie5:10
Explore how request headers identify clients, influence access controls, and how simulating a Google crawler demonstrates header-driven scraping concepts and the ethics of access.
Proxies5:45
Explore how proxies act as intermediaries between client and server, including caching proxies and load balancers, and how residential proxies rotate IP addresses to obfuscate scraping requests.

The Ingredients5:40
Markup8:32
Master the basics of HTML markup, understand semantic versus structural tags, and learn how proper tagging improves accessibility and search engine understanding for effective web scraping.
Attributes6:00
Presentation4:42
Some More Rules4:43
Behaviour8:03
Explore how JavaScript adds behavior to static web pages by linking a button to hide a link, using DOM selection, event listeners, and simple show-hide logic.
More JavaScript4:28
JavaScript In Web Scraping7:21
JavaScript drives dynamic HTML and CSS, enabling single page apps; learn to scrape such pages by using headless browsers to render and extract content.
Comments4:39
Embedded5:15
Explore how to embed CSS and JavaScript directly in HTML using style and script tags, including inline styles and external scripting, to build a single, ready-to-scrape web page.

Urllib5:36
Requests5:36
Setting Headers7:41
Query Parameters11:13
Authentication And Authorization7:05
Learn how authentication verifies identity and authorization grants access, and implement common methods like API keys in headers, bearer tokens, and basic auth for API requests.
Aside From GET4:21
Explore beyond get: send delete, post, put, patch, head, and options requests using the requests library and HTTP bin, which echoes back your requests for testing.
POSTing Data6:39
Learn sending data in the request body with the Python requests library, comparing form encoded data and json payloads, and understanding their encoding and content type.

BeautifulSoup7:54
Tags5:50
Navigate the parsed HTML tree with tags and attributes in beautiful soup, explore multi-valued attributes like class, and learn to select nested elements for data extraction.
Parents, Children, And Descendants8:13
Learn how html tags form a hierarchy from root to children and descendants, including ul, li, and a, using beautiful soup to navigate, filter navigable strings, and access relationships.
Siblings2:25
Learn how to navigate siblings in the Beautiful Soup navigation API, horizontal navigation to move between tags at the same tree depth using next_sibling and previous_sibling for efficient web scraping.
Extracting Text6:35
All Strings3:19
Extract all text content from HTML using Beautiful Soup's stripped strings and dot strings attributes, compare their results, and understand when to preserve or discard whitespace.
Search11:15
Challenge1:30
Solution9:32
Solution Refinement12:04
Refine the book data extraction by cleaning price strings to a float, using regex substitutions, and mapping ratings to integers, returning dictionaries with title, price, and rating.
An Extra: pandas11:12
Master pandas by turning a list of dictionaries into a dataframe for efficient data handling. Filter with boolean masks, calculate averages, and export books data to CSV, JSON, or Excel.
Functional Search Patterns8:23
Target page elements by id or attribute using Beautiful Soup's find and find_all, then build complex filters with anonymous functions and lambda expressions for flexible scraping.
Text Search8:58
Searching By CSS7:21
Just One Tag3:09
Explore find and find all, plus select and select one, to see how single-tag results differ from list results and how to verify element identity.

Scope Statement3:04
An Extra: Some Finance Concepts4:31
Learn how stocks, tickers, and exchanges map to a portfolio, fetch prices from Google Finance, manage multi-exchange pricing, and convert currencies for a unified view.
Parsing Price12:41
Non-USD Prices8:44
Adding Structure With Dataclasses9:02
Use Python dataclasses to define stock, position, and portfolio types with ticker, exchange, currency, and price fields; employ post_init to fetch price data and prepare portfolio valuation.
Position And Portfolio9:00
Define position and portfolio as data classes to pair stocks with quantities and compute the portfolio's total value in USD from stock prices sourced via Google Finance.
Tabular Display12:15

Befriend The Network Tab5:38
Master the network tab to monitor HTTP requests, headers, and responses, then replicate server requests in Python to scrape data without parsing HTML or CSS.
Case Study: Coffee Shop Locations8:33
Explore how to discover coffee shop locations by inspecting network requests, identifying GraphQL post queries that return restaurant data in JSON, and reproducing them with a Python workflow.
The Advantages Of APIs7:02
Full Header Emulation6:01
An Extra: Postman3:53
Code Generation6:38
Challenge3:13
Replicate a web scraping request to source open positions from a career page using Python, intercept and replicate requests, and return position details with location, type, distance, and requirements.
Solution: Interacting With The API6:48
Solution: Processing The Data6:43
Solution: Adding Geocode9:56

Introduction1:36
What Is selectolax?9:10
CSS Combinators8:46
Sibling Combinators7:37
Explore the adjacent sibling combinator, signified by the plus sign, to select elements that are siblings of a given element. Compare it with the general sibling combinator to choose wisely.
Selector Types8:03
Master simple, compound, and complex selectors and selector lists, and use descendant, child, and sibling combinators to target elements for web scraping with precision.

Scope Statement3:34
Write a Python scraper that downloads the highest resolution Unsplash images by keyword, excluding premium watermarked ones, using both HTML and API approaches with pagination, saving images locally.
Prospecting7:47
Prospect the site by analyzing image search results, filtering ads and premium images, and inspecting HTML structure to identify reliable selectors; compare HTML parsing with the API for high‑resolution images.
NOTE: Quick Correction To CSS Selector0:24
Scraping HTML7:34
Filtering Relevant URLs9:17
Extracting High-Res Image URLs11:20
Saving The Images6:54
Stepping It Up With Logging8:40
Back To The API5:54
Explore calling the Unsplash API with Python, fetch images via get requests, parse photos results to extract full-resolution image URLs, and handle pagination for future downloads.
Filtered Canonical URLs7:33
Pagination Prospecting4:29
Learn how to paginate with api requests, constructing urls with page and per page, filtering by photos, and looping through pages to collect a target number of images.
Wrapping Up12:41

Requirements

No programming experience needed - I'll teach you everything you need to know
No paid software required - we'll be using open-sourced python libraries
A computer with access to the internet
Prepare to learn real skills you could put to practice right away

Description

Welcome to the Ultimate Web Scraping With Python Bootcamp, the only course you need to go from a complete beginner in python to a very competent web scraper.

Web scraping is the process of programmatically extracting data from the web. Scraping agents visit a web resource, extract content from it, and then process the resulting data in order to parse some specific information of interest.

Scraping is the kind of programming skill that offers immediate feedback, and can be used to automate a wide variety of data collection and processing tasks.

Over the next 17+ hours, we will methodically cover everything you need to know to write web scraping agents in python.

This bootcamp is organized in three parts of increasing difficulty designed to help you progressively build your skill.

Part I - Begin

We'll start by understanding how the web works by taking a closer look at HTTP, the key application layer communication protocol of the modern web. Next, we'll explore HTML, CSS, and JavaScript from first principles to get a deeper understanding of how website are built. Finally, we'll learn how to use python to send HTTP requests and parse the resulting HTML, CSS, and JavaScript to extract the data we need. Our goal in the first part of the course is to build a solid foundation in both web scraping and python, and put those skills to practice by building functional web scrapers from scratch. Selected topics include:

a detailed overview the request-response cycle
understanding user-agents, HTTP verbs, headers and statuses
understanding why custom headers can often be used to bypass paywalls
mastering the requests library to work with HTTP in python
what stateless means and how cookies work
exploring the role of proxies in modern web architectures
mastering beautifulsoup for parsing and data extraction

Part II - Refine

In the second part of the course, we'll build on the foundation we've already laid to explore more advanced topics in web scraping. We'll learn how to scrape dynamic websites that use JavaScript to render their content, by setting up Microsoft Playwright as a headless browser to automate this process. We'll also learn how to identify and emulate API calls to scrape data from websites that don't have formally public APIs. Our projects in this section will include an image scraper that can download a set number of high-resolution images given some keyword, as well as another scraping agent that extracts price and content of discounted video games from a dynamically rendered website. Topics include:

identifying and using hidden APIs and understanding the benefits they offer
emulating headers, cookies, and body content with ease
automatically generating python code from intercepted API requests using postman and httpie
working with the highly performant selectolax parsing library
mastering CSS selectors
introducing Microsoft Playwright for headless browsing and dynamic rendering

Part III - Master

In the final part of the course, we'll introduce scrapy. This will give us an excellent, time-tested framework for building more complex and robust web scrapers. We'll learn how to set up scrapy within a virtual environment and how to create spiders and pipelines to extract data from websites in a variety of formats. Having learned how to use scrapy, we'll then explore how to integrate it with Playwright so that we tackle the challenge of scraping dynamic websites from right within scrapy. We'll conclude this section by building a scraping agent that executes custom JavaScript code before returning the resulting HTML to scrapy. Some topics from this section:

learning how to set up scrapy and explore its command line interface ("the scrapy tool")
dynamically explore response objects using scrapy shell
understand and define item schemas and load data using itemloaders and input/output processors
integrate Playwright into scrapy to tackle dynamically rendered JavaScript sites
write PageMethods to specify highly specific instructions to the headless browser from right within scrapy
define custom pipelines for saving into SQL databases and highly customized output formats

In this bootcamp, I will take you step-by-step through engaging video lectures and teach you everything you need to know to get started with web scraping in python.

By the end of this course, you will have a complete toolset to conceptualize and implement scraping agents for any website you can imagine.

See you inside!

Who this course is for:

Anyone who wants to learn how to collect data from the web programmatically
Students with or without web scraping experience looking to level up
Complete beginners with no experience

The Ultimate Web Scraping With Python Bootcamp 2024

What you'll learn

Explore related topics

Course content

Introduction3 lectures • 5min

The HTTP Protocol9 lectures • 38min

HTML, CSS, And JavaScript10 lectures • 59min

Web Requests In Python7 lectures • 48min

Parsing And Extraction15 lectures • 1hr 48min

Project 1 - Portfolio Valuation With Google Finance7 lectures • 59min

APIs: The Hidden Gems10 lectures • 1hr 4min

Selectolax And Advanced CSS Selectors5 lectures • 35min

Project 2 - Image Scraper12 lectures • 1hr 26min

Tackling JavaScript With Microsoft PlayWright4 lectures • 31min

Requirements

Description

Who this course is for: