Advanced Web Scraping with Python using Scrapy & Splash
4.8 (92 ratings)
Course Ratings are calculated from individual students’ ratings and a variety of other signals, like age of rating and reliability, to ensure that they reflect course quality fairly and accurately.
1,146 students enrolled

Advanced Web Scraping with Python using Scrapy & Splash

The most advanced web scraping & crawling course using Scrapy & Splash! Take your web scraping skills to the next level.
Highest Rated
4.8 (95 ratings)
Course Ratings are calculated from individual students’ ratings and a variety of other signals, like age of rating and reliability, to ensure that they reflect course quality fairly and accurately.
1,146 students enrolled
Created by Ahmed Rafik
Last updated 4/2020
English [Auto-generated]
Current price: $15.99 Original price: $24.99 Discount: 36% off
7 hours left at this price!
30-Day Money-Back Guarantee
This course includes
  • 5.5 hours on-demand video
  • 7 articles
  • 1 downloadable resource
  • Full lifetime access
  • Access on mobile and TV
  • Certificate of Completion
Training 5 or more people?

Get your team access to 4,000+ top Udemy courses anytime, anywhere.

Try Udemy for Business
What you'll learn
  • Advanced web scraping techniques
  • Best techniques to analyse a website before scraping it
  • Write clean spiders
  • Optimize Splash scripts
  • Bypass 504 HTTP errors
  • Build Splash Cluster
  • Bypass Google ReCaptcha (not solving it)
  • Build Desktop apps for Scrapy Spiders (Tkinter)
  • ScrapyRT
  • Showcase scraped data using ScrapyRT & Flask
  • Heavy data processing
  • Input & Output processors
Course content
Expand all 47 lectures 05:33:06
+ Introduction
6 lectures 19:48
Installing Splash(Windows Pro/Enterprise edition & Mac OS)
Installing Splash(Windows Home Edition)
Installing Splash (Linux)
Udemy 101
Asking questions
+ Centris Canada
10 lectures 01:26:56
Understanding the API
Consuming the API PART 1
Consuming the API PART 2
XHR Pagination
Summary Page
Bypass 504 HTTP Error (Method 1)
Bypass 504 HTTP Error (Method 2)
Bypass 504 HTTP Error (Method 3)
Project source code
+ Steam Store
11 lectures 01:26:19
Extracting data PART 1
Extracting data PART 2
Extracting data PART 3
Extracting data PART 4
Data processing PART 1
Data processing PART 2
Data processing PART 3
Project source code
+ Build Web App (ScrapyRT + Flask)
6 lectures 38:34
Using Flask with ScrapyRT
Flask templates PART 1
Flask templates PART 2
Flask templates PART 3
Project source code
+ Zillow
9 lectures 01:01:58
ReCaptcha Response
Testing the API
Spoofing Cookie header + Custom Cookie parser
Parsing JSON Objects
Advanced pagination
Media Pipelines PART 1
Media Pipelines PART 2
Project source code
+ Scrapy & Tkinter for Desktop Apps
5 lectures 39:28
Desktop APP PART 2
Desktop APP PART 3
Desktop APP PART 4(Threading)
Project source code
  • PC or Mac with internet access.
  • Have done a couple of projects using SCRAPY & SPLASH is extremely REQUIRED.
  • Basics of elements selection using XPATH is also extremely REQUIRED.

Hi there & welcome to the most advanced online resource on Web Scraping with Python using Scrapy & Splash. This course is fully project-based means pretty much on each section we gonna scrape a different website & tackle a different web scraping dilemma also rather than focusing on the basics of Scrapy & Splash we gonna dive straight forward into real-world projects, this also means that this course is absolutely not suitable for beginners with no background on web scraping, Scrapy, Splash & XPath expressions.

---This courses covers a variety of topics such as:---

  1. Requests chaining, like how the requests must be sent in a certain order otherwise they won't be fulfilled at all.

  2. How to analyze a website before scraping it, this is an important step to do since it helps a lot in choosing the right tools to scrape a website & it literally has a huge impact on the performance of your final product.

  3. How to optimize Splash scripts by reducing/aborting all the unnecessary requests that have nothing to do with the data points you're going to scrape, this is an important thing to do if you care about the performance of Splash as it is the key to bypass 504 Gateway Timeout HTTP errors in Splash.

  4. We gonna also cover how to build a Cluster of Splash instances with a load balancer(HAProxy) rather than having one fully overloaded Splash instance this also helps in bypassing 504 Gateway Timeout errors.

  5. Heavy data processing, you'll understand how Input & Output processors work so you'll be able to use them in order to clean the scraped data points as this will ensure the quality of your feeds.

  6. We'll use ScrapyRT (Scrapy RealTime) to build spiders that can fetch data in real-time.

  7. Showcase the scraped data points in a minimalist web app using ScrapyRT & Flask, this is extremely helpful for web scraping freelancers.

  8. Bypass Google ReCaptcha, please don't get me wrong on this point, I don't mean that we will solve it using Scrapy, instead, I'm gonna show you a technique that I use frequently to fool websites and let them think that the request is sent using a browser & was performed by a human being!

  9. Build clean & well-structured spiders

  10. Finally, we gonna build a Desktop app using Tkinter, the app will fetch & execute all the available spiders in your Scrapy project, you can also choose the feed type, feed location & name, this is also extremely helpful & important if you're a web scraping freelancer, it is always a good idea to deliver to your client a desktop app rather than installing Scrapy on his machine & stuff like that.

This course is straight to the point, there's no "foobar" or "quotes to toscrape dot com" as other courses do so make sure you have a good level of focus & lot of determination & motivation.

By the end of this course, you'll sharpen your skills in web scraping using Scrapy & Splash, you'll be able to write clean & high performing spiders that differentiate you from others, this also means if you're a web scraping freelancer you'll get more offers since you can deliver "User-Friendly" spiders with a Graphical User Interface(GUI) or web apps that fetch data in real-time.

So join me on this course & let's harvest the web together!

Who this course is for:
  • Anyone wants to learn advanced web scraping techniques
  • Anyone wants to learn how to turn Scrapy projects into Desktop/web apps
  • Web scraping freelancers