
Define a problem and scope to guide scraper design and manage the project's complexity. Plan data gathering via APIs or web scraping, and choose storage with MongoDB or post graphs.
Learn to scrape IMDb to build actor filmographies using requests and Beautiful Soup. Handle search results, navigate to actor pages, extract filmography data, and save results as JSON files.
Explore the concepts of web spiders and how they differ from scrapers, focusing on data discovery, site-wide links, and depth-limited crawling using examples like IMDb and Reddit.
Implement local caching in the IMDb spider to store actor data and movie casts, speeding up the Kevin Bacon spider by reducing external requests and leveraging a local database.
Design a stock price watcher in Python that scrapes Yahoo Finance, stores prices with timestamps in a database, and alerts when the price crosses set high or low thresholds.
Master the theory of enterprise-scale spiders and scrapers using a database-backed job queue, where spiders discover content and scrapers process posts, users, and comments.
Finish this Python web scraping course and apply your spiders and scrapers to real-world projects, from IMDb crawls to stock watchers. Use your skills for good to advance research.
The web is full of incredibly powerful data stored away in billions of different websites, databases and APIs. Financial data like stock prices and cryptocurrency trends, weather data in thousands of different cities in dozens of countries offered down to the hour, and fun biographical information about your favorite actor or actress: all of this information is at your fingertips, but it's impossible to truly harness it all without a bit of help and automation!
Scrapers and spiders are incredibly powerful programs that allow developers, big data analysts and researchers to harness all of this amazing data and use it for a vast array of different applications, from the creation of data feeds to the collection of data to feed machine learning and artificial intelligence algorithms. This course offers a hands-on approach to building real, usable spiders in realistic situations for financial analysis, link graph construction and social media research, to name a few. By the end of this course, the student will be able to develop spiders and scrapers from scratch using Python and will only be limited by their own imagination. Put the vast power of the internet within your grasp by learning how to develop automated scrapers today!
This class is built with beginners in mind, and while previous experience in Python programming helps, you can start this course without ever having written a line of code.