Scrape Websites using PhantomJS and CasperJS

Become a better JavaScript Developer and learn Front-End Testing. We’ll use Node.js, jQuery and functional programming
4.3 (62 ratings) Instead of using a simple lifetime average, Udemy calculates a
course's star rating by considering a number of different factors
such as the number of ratings, the age of ratings, and the
likelihood of fraudulent ratings.
499 students enrolled
$19
$95
80% off
Take This Course
  • Lectures 32
  • Length 2 hours
  • Skill Level Intermediate Level
  • Languages English
  • Includes Lifetime access
    30 day money back guarantee!
    Available on iOS and Android
    Certificate of Completion
Wishlisted Wishlist

How taking a course works

Discover

Find online courses made by experts from around the world.

Learn

Take your courses with you and learn anywhere, anytime.

Master

Learn and practice real-world skills and achieve your goals.

About This Course

Published 2/2016 English

Course Description

In this course you will learn how to scrape data from web pages using CasperJS.

This course consists of 5 example projects to help you fully understand the powers of the headless browser using the CasperJS API.

What You Will Learn

You will gain a thorough understanding of advanced web scraping concepts and also gain an insight into how to use the CasperJS for Testing DOM manipulation and UI interaction.


What to Expect

  • We'll begin with an overview of how both PhantomJS and CasperJS works along with how to install these frameworks.
  • Next, we'll discuss what our workflow will look like and the options we can pass into a Casper object.
  • Then we'll dive into the meat of this course by working through 5 projects.


The Projects Will Cover

  • How to scrape websites that are rendered with JavaScript instead of standard HTML
  • How to wait for AJAX loaded data to appear before scraping elements
  • How to submit forms both for Authorization and when making searches
  • How to define navigation Steps - like logging into a site, clicking a button and following links
  • How to write and save specified data in tables then output as an .html file or as JSON.
  • And how to take screenshots both of full web pages and specific containers


What is PhantomJS?

PhantomJS is a Full Web Stack that employs a headless browser. Phantom gives us the power to perform many interesting actions on a web page, such as: performing page manipulation, simulating user interaction and the ability to dynamically capture and save website data.


What is CasperJS?

CasperJS is a stand-alone framework built on top Phantom and is compatible with most operating systems. The focus of this course will be on the Casper API and we'll be using this API to write all our web scraping scripts.


What You Should Know

You should already know JavaScript basics. Including what a callback function is. It will help if you know some jQuery. We use lodash in of our examples but only as a replacement for the built-in Map method that's part of the native Javascript API.

What are the requirements?

  • You should already be familiar with JavaScript basics
  • Helpful to know beginner jQuery syntax

What am I going to get from this course?

  • Know how to use JavaScript for Data Mining
  • Be able to Capture, Download and Save Website Data
  • Understand how to use CasperJS and PhantomJS
  • Apply What You've Learned to Front-end Testing
  • Create Your Own Scripts for Scraping Data
  • Have a Better Understanding of Functional Programming
  • Fully Understand JavaScript and jQuery Selectors

What is the target audience?

  • You should take this course if your interested in becoming a better JavaScript developer.
  • This course is meant for those who are already familiar with the basics of JavaScript. No prior knowledge of PhantomJS or CasperJS is required.

What you get with this course?

Not for you? No problem.
30 day money back guarantee.

Forever yours.
Lifetime access.

Learn on the go.
Desktop, iOS and Android.

Get rewarded.
Certificate of completion.

Curriculum

Section 1: Introduction
02:04

Welcome to the course! In this lecture, we will take a brief look at each one of the projects we'll be working on.

00:27

These projects are for demonstration purposes only.

Section 2: Overview and Install
02:53

PhantomJS is described on their website as a Full Web Stack with no browser required.

And can also be described as a 'headless browser' that allows the user to do anything they could do in a standard browser like Chrome or Firefox.

04:14

CasperJS is defined on their website as a navigation scripting & testing utility for the PhantomJS WebKit headless browser. It provides useful high-level functions, methods & syntactic sugar for doing common tasks.

00:56

Python needs to be installed on your machine first.

05:05

In this lecture we're going to look at how to install PhantomJS.

04:48

In this lecture we're going to look at how to install CasperJS.

Section 3: Getting Started with CasperJS
06:26

A first look at the methods in we'll be using throughout the course to setup and run our scripts.

04:59

In this video we're going to look at what kind of options we can pass into our Casper instance inside of the Create method. As well as what our typical workflow is going to look like.

Section 4: Scraping Search Results
00:35

We're going to start with an example of how to get search results from Bing.

03:05

First we'll search Bing with the query of 'casperjs' and push the results to an array of links. Then we'll search Bing for the query 'phantomjs' and add these results to our array.

06:50

After adding the results of our search queries, we'll dump the results to our console in a readable manner.

Section 5: Scraping JavaScript Rendered Web Pages
00:42

In this project we're going to learn how to scrape a website who's content is generated from a JavaScript file. And this is something you might find on many modern single-page application based sites. You will learn how to scrape the Title, URL and Date of a webpage's content and then Download and Save that data into an .html file formated file with tables and rows.

05:49

Create the project scaffold. Including adding our run(), then(), start() methods and initial values that will hold our arrays.

06:38

Define the 3 functions we'll be using to evaluate our scraped elements.

06:30

Create a table and export our data to an .html file.

03:10

Formatting data into a JSON object.

Section 6: Scraping Hotel Data
00:48

In this project we're going to be scraping Hotel names and prices. We'll be working with some new methods to wait for a page to load and using different click events. In the last part, we'll see how to sort hotels by highest rating using the clickLabel method.

04:56

Setting up the project based off the pycoders example.

05:48

Get hotel names and prices and print to the console.

The website has changed.  Selectors need to be adjusted.  Please see resources for updated code.

I will be updating these videos soon.

05:58

Sort hotels by highest rating.

The website has changed.  No longer do they allow you to sort by different prices. Also, the selectors have changed.  Please see resources for updated source code. 

I will be updating these videos soon.

Section 7: Scrape and Capture Multiple Pages
00:36

In this project we're going to see how to scrape and capture multiple pages. We'll be taking a look at a BestBuy product page and see how we can Click the reviews tab, take a screenshot of the page and then click the Next Page link to take screenshots of the next 3 pages of reviews. We'll be using a Counter variable to keep track of what page we're on introducing new Methods such as waitFor, capture, and thenClick.

05:25

Setting up the project based on the vegas.js example.

06:23

Complete the setup. Edit our functions to use Array.prototype.map(). Add in our selectors to grab the ratings and dates and print these to our console.

08:24

Project setup for navigating over multiple pages of BestBuy. Script is based on google pagination example from CasperJS sample docs.

09:30

In this video we're going to see how you can click links to load new pages and start taking screenshots of each page. You can use the same methods applied in this video for scraping multiple pages and writing the results that you want to a file. So, we'll start on page 1 take a screenshot, click this next button take a screenshot and do that for a total of 4 pages.

03:58

Capture only the ratings div to create our screenshots.

Section 8: Log In and Search
00:25

In this final example project we're going to see how you can log in to Twitter as an authenticated user, then submit a search query and capture the results. We'll also take a look at how to send and receive events using the On and Emit methods.

06:05

Creating the scaffolding for our script.

05:50

Create the on() and emit() methods. Run our script and take a screenshot of our specified div.

Section 9: Conclusion
04:21

In this final video I want to go over some topics that may come in handy as you create your own scripts. These files will be included in the resources folder and I'm just going to go over each of them with you.

Thank You
00:40

Students Who Viewed This Course Also Viewed

  • Loading
  • Loading
  • Loading

Instructor Biography

Patrick Schroeder, Software Developer

Patrick Schroeder is a self-taught full stack JavaScript developer. He enjoys working with Angular, Node.js, Mongodb, React.js, Firebase, and anything else javascript related. Patrick is passionate about teaching Javascript. He loves to help others understand difficult concepts by creating clear presentations that gradually builds to full comprehension of a given topic. He is very interested in furthering his knowledge of IOT and wearable products with the intention of teaching cutting edge technologies and collaborating to bring new products to life. 

Ready to start learning?
Take This Course