Teach on Udemy

Turn what you know into an opportunity and reach millions around the world.

Learn More

Your cart is empty.

Keep shopping

Web Scraping in Nodejs & JavaScript

Name: Web Scraping in Nodejs & JavaScript
Rating: 4.5 (856 reviews)

Learn web scraping in Nodejs & JavaScript by example projects with real websites! Craiglist, iMDB, AirBnB and more!

Created byStefan Hyltoft

Last updated 2/2024

English

English [Auto],

What you'll learn

Be able to scrape jobs from a page on Craigslist
Learn how to use Request
Learn how to use NightmareJS
Learn how to use Puppeteer
Learn how to scrape elements without any identifiable classes or id's
Learn how to save scraping data to CSV
Learn how to save scraping data to MongoDb
Learn how to scrape Facebook using only Request!
Learn how you can reverse engineer sites and find hidden API's!
Learn different technologies used for scraping, and when it's best to use them
Learn how to scrape sites using authentication
Learn how to scrape HTML tables using Request/Cheerio

Course content

27 sections • 159 lectures • 11h 42m total length

Software to web scrape in JavaScript1:56
Set up your web scraping workflow by installing Visual Studio Code and NodeJS, then use npm or yarn to install packages and run your scripts.
(optional) Note about deprecation of Request/Request-Promise5:21
Feel free to watch this later in the course if you wonder what the deprecation of the request/request-promise packages means to you

Intro to section0:37
Using Chrome Developer Tools3:12
Use Chrome developer tools to inspect HTML elements, identify tags and attributes, and extract text and href values for web scraping.
Selecting our element3:48
Building our first scraper!8:33
Set up a server-side web scraper with node and npm, install request, request promise, and cheerio, fetch a page, save HTML, and extract text with cheerio for data handling.
Selecting multiple elements4:52
Select multiple h2 elements with jQuery in the browser and with Cheerio in NodeJS, then loop with .each to print text. Save results to MongoDB or CSV.
Selecting using CSS ID2:47
Selecting using CSS classes3:09
Selecting using HTML attributes2:24
Learn to select elements using HTML attributes in jQuery, including targeting a specific attribute value like data customer and selecting all elements with the attribute. Apply this to Craigslist.
You're on your way to become a scraping ninja!0:35

Intro to section1:01
Structure of a HTML table2:06
Learn the basic HTML structure of HTML tables, so you can better understand how to scrape data from them
Data Structure in JavaScript0:58
See our end goal of the scraped data, the data structure of our scraped data from the HTML table using Request/Cheerio
Creating selector in Chrome Tools3:39
Learn how to easily copy a selector in Chrome tools, so you can select the data you need from the HTML table with jQuery.
Scraping all table cells in Chrome Tools4:16
Scraping data in Nodejs with Cheerio/Request6:24
Learn to scrape table data in Node.js using Cheerio and request-promise, set up with npm or yarn, write an async main function, and expose scraped results via an API.
Scraping Company Names in Nodejs7:00
Learn to extract company names from a table in node.js by iterating rows and mapping first data cell to the company, second to the contact, third to the country.
Scraping all table columns3:52
Scrape all table columns in NodeJS by extracting company, contact, and country into a scraped rows array, then deploy a periodic scraper saving data to MongoDB or CSV.
BONUS - dynamic table headers when scraping tables7:06

Project Intro1:32
Project Initializing & Package Import2:00
Initiate the Praxis Scraper project to collect all job titles, URLs, and descriptions from every page, using cheerio for selectors and axios for HTTP requests.
Requesting HTML using Axios library2:29
CSS Selectors + jQuery Injection3:29
Learn how to extract data from a site using css selectors and jquery injection, identify job title elements, inspect html, inject jquery, and print titles one by one in nodejs.
Scraping Job Titles in Nodejs2:13
Learn to print job titles one by one using a cheerio-based each loop in nodejs, extract text, and prepare a separate job object with the title and the url.
CSS Selector for Job URLs3:49
Extracting into Data Object in Nodejs4:19
Learn to extract job titles and URLs from a webpage using nodejs and cheerio, converting results into objects with map and get, and extend to scrape all pages.
Scraping All Pages2:59
Scraping Job Descriptions4:24
Scrape job descriptions by fetching HTML from job pages using Axios, parse with cheerio, and assemble a unified data object with titles, URLs, and descriptions.
Putting Job Descriptions Into Data Objects3:34
Save each job description to job.description and map to return all jobs with descriptions using Promise.all; avoid blocking by scraping not so aggressively, as shown in the next section.
Avoid Getting Banned with Sequential Requests4:05
Learn how to avoid getting banned while web scraping by using a safe mock proxy, comparing sequential requests to parallel, and implementing a for loop to wait for each promise.
Another Trick to Avoid Getting Banned3:45
Learn to scrape static websites with pagination in nodejs and javascript by inserting sleep between requests to mimic human browsing and avoid bans, with puppeteer for dynamic pages.

Intro to project1:16
Master puppeteer to scrape Craigslist San Francisco Bay Area jobs, extracting descriptions and compensation, then save to MongoDB via MLab, with scraping limits.
Why are we using Puppeteer instead of Nodejs Request?1:29
Switch from node js request to puppeteer to bypass blocking when scraping craigslist, using an automated chromium browser, while limiting requests to prevent ip bans.
Initialising project1:10
Create a Craigslist web scraper project folder, navigate into it, and initialize with npm init; install puppeteer and cheerio, noting puppeteer downloads a full chromium browser.
Opening a URL with Puppeteer3:32
In this lecture we'll learn how to open any given URL with Puppeteer and the Chromium browser.
What data are we scraping?1:52
Identify the specific data to scrape from Craigslist job listings, including posted date, title, neighborhood, job description content, compensation, and the job URL; handle missing fields gracefully.
Data Structure3:25
Job Title Css Selector4:48
Test the job title css selector in Chrome developer tools to identify the element with class 'result title', then iterate and extract text for scraping in NodeJS.
Scraping job title using Cheerio3:38
Write Node.js scraping code using Puppeteer and Cheerio to fetch page HTML and extract job titles, with plans to capture posted date and job description URL.
Scraping description url6:42
Learn to extract job description URLs from listings by selecting link elements and using the href attribute, then structure data as objects with title and URL in NodeJS for MongoDB.
Creating array of scraping objects5:47
Create an array of scraping objects using a single map loop to extract title and url, test in the browser console, then run and validate results in Node.js.
Scraping job post date5:45
Scraping Neighborhood data4:01
Learn to extract neighborhood data from listings and clean it by trimming whitespace and replacing noise with JavaScript functions, preparing clean data for subsequent scraping steps.
Scraping List of Pages with Puppeteer6:52
Build a Puppeteer workflow to loop through listing urls, visit pages serially, and scrape job descriptions, with a main function and rate limiting to prevent blocking.
Limiting Scraping Requests per Second3:37
Scraping job descriptions from different pages3:36
Scraping compensation from job listings4:31
mLab is now MongoDB Atlas0:12
Setting up MongoDB database with MLab3:48
Setting up a MongoDB database is fast, easy and free with MLab!
Connecting to MongoDB database with Mongoose3:53
Creating Listing mongoose schema2:58
Saving listing data to MongoDB4:22

Help! I'm blocked!0:32
What can you do if you're blocked?2:00
Learn how to handle blocks in web scraping with Cloud9 or Glitch, proxies in Node.js requests, and preventive scraper techniques to avoid bans.
Scraping API's0:21
Using a proxy in Request2:11
Configure a proxy in Node.js request by setting defaults with proxy, explore free and paid proxies, diagnose common errors like host unreachable, and avoid blocking with throttled scraping.

Initializing project and adding packages1:36
Creating tests folder and setting up test script1:03
Create a __tests__ folder using the jest convention and add a test script in package.json to run with watch mode, enabling live test updates while editing.
Writing our first simple test3:37
Master test driven development in nodejs by writing tests first for a parser used in web scraping, then implement code until tests pass, using yarn test.
Making our first simple test pass!0:44
Getting HTML from the website for our tests4:34
Reading HTML file for our tests2:28
Learn to read an html file for tests and separate parsing logic from data extraction using test driven development with a html getter and a parser.
Writing out our tests8:24
Getting title test to pass3:38
Build a parser that uses Cheerio and jQuery to extract listing titles from the result info element and return a listings array to pass tests.
Making URL test pass!1:10
Extract a page URL by selecting the title element, reading its text, and retrieving the href attribute in a Node.js and JavaScript web scraping workflow.
Making hood test pass!2:24
Locate the neighborhood element inside the result info container, extract its text with find and text, trim whitespace, and fix the tests so all tests pass.
Making the final test for datePosted pass!3:16
End notes + refactoring4:00

Requirements

Basic HTML
Basic jQuery
Basic Nodejs

Description

In this course you will learn how to scrape a websites, with practical examples on real websites using JavaScript Nodejs Request, Cheerio, NightmareJs and Puppeteer. You will be using the newest JavaScript ES7 syntax with async/await.

You will learn how to scrape a Craigslist website for software engineering jobs, using Nodejs Request and Cheerio. You will be using the newest JavaScript ES7 syntax with async/await.

You will then learn how to scrape more advanced websites that require JavaScript such as iMDB and AirBnB using NighmareJs and Puppeteer.

I'm gong to also show you with a practical real-life website, how you can even avoid wasting time on creating a web scraper in the first place, by reverse engineering websites and finding their hidden API's!

Learn how to avoid being blocked from websites when developing out your scraper, by building out the scraper in a test-driven way with mocked html, rather than hitting the website every time as you're debugging and developing it. You'll also learn what you can do if you're blocked and your alternatives to get your scraper up and running regardless!

You will also learn how to scrape on a server with a bad connection, or even if you have a bad connection.

You'll even learn how to save your results to a CSV file and MongoDB!

How do you build a scraper that scrapes every 1 hour (or other interval), and deploy it do a cloud host like Heroku or Google Cloud? Let me show you, quick and easy!

How do you scrape a site requiring passwords? I'm going to show you that too with a real website (Craigslist)!

How do you serve your scraping results in a REST API with Nodejs Express? And how can we build a React frontend that's showing the results? You'll learn that too, in the quickest and simplest way possible!

Plus, a section covering how to make a basic GraphQL API is included in the course.

As a last cherry on the top, I have a section containing a secret backdoor showing you how to scrape Facebook using only Request!

If you have issues regarding a site you're trying to scrape yourself, it's totally okay to reach out to me for some help. I'd be happy to point you in the right direction! Whatever issues my students are facing, I use that to expand on my course!

Who this course is for:

Anyone who wants to learn how to scrape web sites using Nodejs!

Web Scraping in Nodejs & JavaScript

What you'll learn

Explore related topics

Course content

Required Software2 lectures • 7min

What you should ALWAYS check before even writing a web scraper!1 lecture • 6min

Intro to CSS selectors and tools we use for scraping9 lectures • 30min

Scraping HTML tables with Request/Cheerio9 lectures • 36min

Scraping a Craigslist-like site with Pagination using axios & nodejs12 lectures • 39min

Scraping software jobs on Craigslist using Puppeteer21 lectures • 1hr 17min

What to do if you're blocked?4 lectures • 5min

Building a web scraper the TDD way12 lectures • 37min

Exporting web scraping results to CSV1 lecture • 8min

Handling Network Problems1 lecture • 5min

Requirements

Description

Who this course is for: