
Get an overview of the course and show you the requirements needed to proceed.
In this lecture we discuss what a static web page is.
In this lecture you learn the concept behind scraping static web pages. We look at the concrete steps needed to scrape practically any static page out there. A live example is provided at the end of this section...
We introduce the Jsoup library. It helps downloading, parsing and extracting elements from a page using CSS selectors. It has a lot of similarities of jQuery - so prior knowledge of jQuery is helpful but not necessary.
Also we develop a simple example program using this library...
In this example we build a web scraper that gets the top 10 Google search results for any search query and prints the title and url of each search result to the console. Later we store the results in a simple text file.
In this lecture we discuss what a dynamic / AJAX web page is and how it differs from a static one.
In this lecture you learn the concept behind scraping dynamic / AJAX web pages. Later we show you how to actually apply this concept to a concrete example.
In this lecture you learn how to make HTTP Requests with the Unirest Java library. We develop a simple live example where you can see the most important features for web scraping in action.
In this example we scrape the results from peoplefinders.com which are loaded dynamically via AJAX requests.
In this lecture you will learn everything needed to deal with WebSockets effectively. We discuss what WebSockets exactly are, how they work from a 10.000 feet view and most importantly how to scrape data from them based on a real life example from a news site.
In this lecture we export the data from the Google top 10 search results example as CSV for further processing. You can open it in Numbers, Excel or Open Office. There you can do all kinds of sorting and filtering which is really useful.
In this lecture we export the data from the Google top 10 search results example as JSON for further processing.
You will learn how to become invisible and hide traces of beeing a web scraper. This will help you avoid getting blocked or banned.
Bonus: in the resource section you find an undercover web scraper that builds upon the google scraper from the previous lecture. You can use this as a foundation for creating your own scrapers....
In this lecture you will get an overview of Proxycrawl and learn what to use it for and how it works. We will also look into the backend of it and explain everything.
In this lecture we use the same example from the previous lecture and scrape data from peoplefinders.com - except this time we do it using Proxycrawl so you can see and understand the difference between the two approaches and won't be distracted by using another website for demonstration.
Also included is the full source code with a ready to use class to integrate in your own scrapers if you like to use proxycrawl...
Thank you for taking this online course. You can download the full source code of all lectures here. I will give you an overview of what's next...
In this lecture you will find a Mind Map with the contents of the course. So you have a one page overview of all the information.
In this short and concise course you will learn everything to get started with web scraping using Java.
You will learn the concepts behind web scraping that you can apply to practically any web page (static AND dynamic / AJAX).
Course structure
We start with an overview of what web scraping is and what you can do with it.
Then we explain the difference in scraping static pages vs dynamic / AJAX pages. You learn how to classify a website in one of the two categories and then apply the right concept in order to scrape the data you want.
Now you will learn how to export the scraped data either as CSV or JSON. These are some popular formats that can be used for further processing.
Unfortunately many websites try to block scrapers or sometimes you just do not want to be detected. In the section going undercover you will learn how to stay undetected and avoid getting blocked.
At the end of the course you can download the full source code of all the lectures and we discuss an outlook to some advanced topics (private proxies, cloud deployment, multi threading ...). Those advanced topics are covered in a follow up course I am going to teach.
Why you should take this course
Stop imagining you can scrape data from websites and use the skills for your next web project, you can do it now.
Enroll now!