In this course you will learn how to scrape data from web pages using CasperJS.
This course consists of 5 example projects to help you fully understand the powers of the headless browser using the CasperJS API.
What You Will Learn
You will gain a thorough understanding of advanced web scraping concepts and also gain an insight into how to use the CasperJS for Testing DOM manipulation and UI interaction.
What to Expect
The Projects Will Cover
What is PhantomJS?
PhantomJS is a Full Web Stack that employs a headless browser. Phantom gives us the power to perform many interesting actions on a web page, such as: performing page manipulation, simulating user interaction and the ability to dynamically capture and save website data.
What is CasperJS?
CasperJS is a stand-alone framework built on top Phantom and is compatible with most operating systems. The focus of this course will be on the Casper API and we'll be using this API to write all our web scraping scripts.
What You Should Know
PhantomJS is described on their website as a Full Web Stack with no browser required.
And can also be described as a 'headless browser' that allows the user to do anything they could do in a standard browser like Chrome or Firefox.
CasperJS is defined on their website as a navigation scripting & testing utility for the PhantomJS WebKit headless browser. It provides useful high-level functions, methods & syntactic sugar for doing common tasks.
Python needs to be installed on your machine first.
In this lecture we're going to look at how to install PhantomJS.
In this lecture we're going to look at how to install CasperJS.
A first look at the methods in we'll be using throughout the course to setup and run our scripts.
In this video we're going to look at what kind of options we can pass into our Casper instance inside of the Create method. As well as what our typical workflow is going to look like.
We're going to start with an example of how to get search results from Bing.
First we'll search Bing with the query of 'casperjs' and push the results to an array of links. Then we'll search Bing for the query 'phantomjs' and add these results to our array.
After adding the results of our search queries, we'll dump the results to our console in a readable
Create the project scaffold. Including adding our run(), then(), start() methods and initial values that will hold our arrays.
Define the 3 functions we'll be using to evaluate our scraped elements.
Create a table and export our data to an .html file.
Formatting data into a JSON object.
In this project we're going to be scraping Hotel names and prices. We'll be working with some new methods to wait for a page to load and using different click events. In the last part, we'll see how to sort hotels by highest rating using the clickLabel method.
Setting up the project based off the pycoders example.
Get hotel names and prices and print to the console.
The website has changed. Selectors need to be adjusted. Please see resources for updated code.
I will be updating these videos soon.
Sort hotels by highest rating.
The website has changed. No longer do they allow you to sort by different prices. Also, the selectors have changed. Please see resources for updated source code.
I will be updating these videos soon.
In this project we're going to see how to scrape and capture multiple pages. We'll be taking a look at a BestBuy product page and see how we can Click the reviews tab, take a screenshot of the page and then click the Next Page link to take screenshots of the next 3 pages of reviews. We'll be using a Counter variable to keep track of what page we're on introducing new Methods such as waitFor, capture, and thenClick.
Setting up the project based on the vegas.js example.
Complete the setup. Edit our functions to use Array.prototype.map(). Add in our selectors to grab the ratings and dates and print these to our console.
Project setup for navigating over multiple pages of BestBuy. Script is based on google pagination example from CasperJS sample docs.
In this video we're going to see how you can click links to load new pages and start taking screenshots of each page. You can use the same methods applied in this video for scraping multiple pages and writing the results that you want to a file. So, we'll start on page 1 take a screenshot, click this next button take a screenshot and do that for a total of 4 pages.
Capture only the ratings div to create our screenshots.
In this final example project we're going to see how you can log in to Twitter as an authenticated user, then submit a search query and capture the results. We'll also take a look at how to send and receive events using the On and Emit methods.
Creating the scaffolding for our script.
Create the on() and emit() methods. Run our script and take a screenshot of our specified div.
In this final video I want to go over some topics that may come in handy as you create your own scripts. These files will be included in the resources folder and I'm just going to go over each of them with you.