Scrape Websites using PhantomJS and CasperJS
4.5 (87 ratings)
Instead of using a simple lifetime average, Udemy calculates a course's star rating by considering a number of different factors such as the number of ratings, the age of ratings, and the likelihood of fraudulent ratings.
695 students enrolled
Wishlisted Wishlist

Please confirm that you want to add Scrape Websites using PhantomJS and CasperJS to your Wishlist.

Add to Wishlist

Scrape Websites using PhantomJS and CasperJS

Become a better JavaScript Developer and learn Front-End Testing. We’ll use Node.js, jQuery and functional programming
4.5 (87 ratings)
Instead of using a simple lifetime average, Udemy calculates a course's star rating by considering a number of different factors such as the number of ratings, the age of ratings, and the likelihood of fraudulent ratings.
695 students enrolled
Last updated 3/2017
English
Current price: $10 Original price: $200 Discount: 95% off
1 day left at this price!
30-Day Money-Back Guarantee
Includes:
  • 2 hours on-demand video
  • 15 Supplemental Resources
  • Full lifetime access
  • Access on mobile and TV
  • Certificate of Completion
Have a coupon?
What Will I Learn?
Know how to use JavaScript for Data Mining
Be able to Capture, Download and Save Website Data
Understand how to use CasperJS and PhantomJS
Apply What You've Learned to Front-end Testing
Create Your Own Scripts for Scraping Data
Have a Better Understanding of Functional Programming
Fully Understand JavaScript and jQuery Selectors
View Curriculum
Requirements
  • You should already be familiar with JavaScript basics
  • Helpful to know beginner jQuery syntax
Description

In this course you will learn how to scrape data from web pages using CasperJS.

This course consists of 5 example projects to help you fully understand the powers of the headless browser using the CasperJS API.

What You Will Learn

You will gain a thorough understanding of advanced web scraping concepts and also gain an insight into how to use the CasperJS for Testing DOM manipulation and UI interaction.


What to Expect

  • We'll begin with an overview of how both PhantomJS and CasperJS works along with how to install these frameworks.
  • Next, we'll discuss what our workflow will look like and the options we can pass into a Casper object.
  • Then we'll dive into the meat of this course by working through 5 projects.


The Projects Will Cover

  • How to scrape websites that are rendered with JavaScript instead of standard HTML
  • How to wait for AJAX loaded data to appear before scraping elements
  • How to submit forms both for Authorization and when making searches
  • How to define navigation Steps - like logging into a site, clicking a button and following links
  • How to write and save specified data in tables then output as an .html file or as JSON.
  • And how to take screenshots both of full web pages and specific containers


What is PhantomJS?

PhantomJS is a Full Web Stack that employs a headless browser. Phantom gives us the power to perform many interesting actions on a web page, such as: performing page manipulation, simulating user interaction and the ability to dynamically capture and save website data.


What is CasperJS?

CasperJS is a stand-alone framework built on top Phantom and is compatible with most operating systems. The focus of this course will be on the Casper API and we'll be using this API to write all our web scraping scripts.


What You Should Know

You should already know JavaScript basics. Including what a callback function is. It will help if you know some jQuery. We use lodash in of our examples but only as a replacement for the built-in Map method that's part of the native Javascript API.

Who is the target audience?
  • You should take this course if your interested in becoming a better JavaScript developer.
  • This course is meant for those who are already familiar with the basics of JavaScript. No prior knowledge of PhantomJS or CasperJS is required.
Students Who Viewed This Course Also Viewed
Curriculum For This Course
Expand All 32 Lectures Collapse All 32 Lectures 02:14:18
+
Introduction
2 Lectures 02:31

Welcome to the course! In this lecture, we will take a brief look at each one of the projects we'll be working on.

Preview 02:04

These projects are for demonstration purposes only.

Disclaimer
00:27
+
Overview and Install
5 Lectures 17:56

PhantomJS is described on their website as a Full Web Stack with no browser required.

And can also be described as a 'headless browser' that allows the user to do anything they could do in a standard browser like Chrome or Firefox.

Preview 02:53

CasperJS is defined on their website as a navigation scripting & testing utility for the PhantomJS WebKit headless browser. It provides useful high-level functions, methods & syntactic sugar for doing common tasks.

Preview 04:14

Python needs to be installed on your machine first.

Install Python
00:56

In this lecture we're going to look at how to install PhantomJS.

Installing PhantomJS
05:05

In this lecture we're going to look at how to install CasperJS.

Installing CasperJS
04:48
+
Getting Started with CasperJS
2 Lectures 11:25

A first look at the methods in we'll be using throughout the course to setup and run our scripts.

Setting Up a Project
06:26

In this video we're going to look at what kind of options we can pass into our Casper instance inside of the Create method. As well as what our typical workflow is going to look like.

Options and Workflow
04:59
+
Scraping Search Results
3 Lectures 10:30

We're going to start with an example of how to get search results from Bing.

Preview 00:35

First we'll search Bing with the query of 'casperjs' and push the results to an array of links. Then we'll search Bing for the query 'phantomjs' and add these results to our array.

Get Results from Bing.com
03:05

After adding the results of our search queries, we'll dump the results to our console in a readable manner.

Get Results from Bing.com - Part 2
06:50
+
Scraping JavaScript Rendered Web Pages
5 Lectures 22:49

In this project we're going to learn how to scrape a website who's content is generated from a JavaScript file. And this is something you might find on many modern single-page application based sites. You will learn how to scrape the Title, URL and Date of a webpage's content and then Download and Save that data into an .html file formated file with tables and rows.

Preview 00:42

Create the project scaffold. Including adding our run(), then(), start() methods and initial values that will hold our arrays.

Scraping JS-Rendered - Part 1
05:49

Define the 3 functions we'll be using to evaluate our scraped elements.

Scraping JS-Rendered - Part 2
06:38

Create a table and export our data to an .html file.

Scraping JS-Rendered - Part 3
06:30

Formatting data into a JSON object.

Scraping JS-Rendered - Part 4
03:10
+
Scraping Hotel Data
4 Lectures 17:30

In this project we're going to be scraping Hotel names and prices. We'll be working with some new methods to wait for a page to load and using different click events. In the last part, we'll see how to sort hotels by highest rating using the clickLabel method.

Preview 00:48

Setting up the project based off the pycoders example.

Project Setup
04:56

Get hotel names and prices and print to the console.

The website has changed.  Selectors need to be adjusted.  Please see resources for updated code.

I will be updating these videos soon.

Get Names and Prices - Part 1
05:48

Sort hotels by highest rating.

The website has changed.  No longer do they allow you to sort by different prices. Also, the selectors have changed.  Please see resources for updated source code. 

I will be updating these videos soon.

Get Names and Prices - Part 2
05:58
+
Scrape and Capture Multiple Pages
6 Lectures 34:16

In this project we're going to see how to scrape and capture multiple pages. We'll be taking a look at a BestBuy product page and see how we can Click the reviews tab, take a screenshot of the page and then click the Next Page link to take screenshots of the next 3 pages of reviews. We'll be using a Counter variable to keep track of what page we're on introducing new Methods such as waitFor, capture, and thenClick.

Preview 00:36

Setting up the project based on the vegas.js example.

Project Setup
05:25

Complete the setup. Edit our functions to use Array.prototype.map(). Add in our selectors to grab the ratings and dates and print these to our console.

Scrape Product Reviews - Part 1
06:23

Project setup for navigating over multiple pages of BestBuy. Script is based on google pagination example from CasperJS sample docs.

Scrape Product Reviews - Part 2
08:24

In this video we're going to see how you can click links to load new pages and start taking screenshots of each page. You can use the same methods applied in this video for scraping multiple pages and writing the results that you want to a file. So, we'll start on page 1 take a screenshot, click this next button take a screenshot and do that for a total of 4 pages.

Scrape Product Reviews - Part 3
09:30

Capture only the ratings div to create our screenshots.

Scrape Product Reviews - Part 4
03:58
+
Log In and Search
3 Lectures 12:20

In this final example project we're going to see how you can log in to Twitter as an authenticated user, then submit a search query and capture the results. We'll also take a look at how to send and receive events using the On and Emit methods.

Preview 00:25

Creating the scaffolding for our script.

Twitter Log In & Search
06:05

Create the on() and emit() methods. Run our script and take a screenshot of our specified div.

Twitter Log In & Search - Part 2
05:50
+
Conclusion
2 Lectures 05:01

In this final video I want to go over some topics that may come in handy as you create your own scripts. These files will be included in the resources folder and I'm just going to go over each of them with you.

Extras and Tips
04:21

Thank You
00:40
About the Instructor
Patrick Schroeder
4.3 Average rating
4,331 Reviews
71,394 Students
8 Courses
Software Developer

Patrick Schroeder is a self-taught full stack JavaScript developer. He enjoys working with Angular, Node.js, Mongodb, React.js, Firebase, and anything else javascript related. Patrick is passionate about teaching Javascript. He loves to help others understand difficult concepts by creating clear presentations that gradually builds to full comprehension of a given topic. He is very interested in furthering his knowledge of IOT and wearable products with the intention of teaching cutting edge technologies and collaborating to bring new products to life.