In this short and concise course you will learn everything to get started with web scraping using Java.
You will learn the concepts behind web scraping that you can apply to practically any web page (static AND dynamic / AJAX).
We start with an overview of what web scraping is and what you can do with it.
Then we explain the difference in scraping static pages vs dynamic / AJAX pages. You learn how to classify a website in one of the two categories and then apply the right concept in order to scrape the data you want.
Now you will learn how to export the scraped data either as CSV or JSON. These are some popular formats that can be used for further processing.
Unfortunately many websites try to block scrapers or sometimes you just do not want to be detected. In the section going undercover you will learn how to stay undetected and avoid getting blocked.
At the end of the course you can download the full source code of all the lectures and we discuss an outlook to some advanced topics (private proxies, cloud deployment, multi threading ...). Those advanced topics are covered in a follow up course I am going to teach.
Why you should take this course
Stop imagining you can scrape data from websites and use the skills for your next web project, you can do it now.
In this lecture you learn the concept behind scraping static web pages. We look at the concrete steps needed to scrape practically any static page out there. A live example is provided at the end of this section...
We introduce the Jsoup library. It helps downloading, parsing and extracting elements from a page using CSS selectors. It has a lot of similarities of jQuery - so prior knowledge of jQuery is helpful but not necessary.
Also we develop a simple example program using this library...
In this example we build a web scraper that gets the top 10 Google search results for any search query and prints the title and url of each search result to the console. Later we store the results in a simple text file.
In this lecture we discuss what a dynamic / AJAX web page is and how it differs from a static one.
In this lecture you learn the concept behind scraping dynamic / AJAX web pages. Later we show you how to actually apply this concept to a concrete example.
In this lecture you learn how to make HTTP Requests with the Unirest Java library. We develop a simple live example where you can see the most important features for web scraping in action.
In this example we scrape the results from peoplefinders.com which are loaded dynamically via AJAX requests.
In this lecture we export the data from the Google top 10 search results example as CSV for further processing. You can open it in Numbers, Excel or Open Office. There you can do all kinds of sorting and filtering which is really useful.
In this lecture we export the data from the Google top 10 search results example as JSON for further processing.
You will learn how to become invisible and hide traces of beeing a web scraper. This will help you avoid getting blocked or banned.
Bonus: in the resource section you find an undercover web scraper that builds upon the google scraper from the previous lecture. You can use this as a foundation for creating your own scrapers....
Thank you for taking this online course. You can download the full source code of all lectures here. I will give you an overview of what's next...
In this lecture you will find a Mind Map with the contents of the course. So you have a one page overview of all the information.
I am an entrepreneur and software developer who really enjoys to build and learn new things. I now have over 7 years of experience from working in different companies (big and small) and even founding my own startup .
I love to share what I have learned with YOU to be more effective and successful.