
Feel free to watch this later in the course if you wonder what the deprecation of the request/request-promise packages means to you
Learn the basic HTML structure of HTML tables, so you can better understand how to scrape data from them
See our end goal of the scraped data, the data structure of our scraped data from the HTML table using Request/Cheerio
Learn how to easily copy a selector in Chrome tools, so you can select the data you need from the HTML table with jQuery.
Learn to scrape table data in Node.js using Cheerio and request-promise, set up with npm or yarn, write an async main function, and expose scraped results via an API.
Scrape all table columns in NodeJS by extracting company, contact, and country into a scraped rows array, then deploy a periodic scraper saving data to MongoDB or CSV.
Initiate the Praxis Scraper project to collect all job titles, URLs, and descriptions from every page, using cheerio for selectors and axios for HTTP requests.
Learn how to extract data from a site using css selectors and jquery injection, identify job title elements, inspect html, inject jquery, and print titles one by one in nodejs.
Learn to print job titles one by one using a cheerio-based each loop in nodejs, extract text, and prepare a separate job object with the title and the url.
Save each job description to job.description and map to return all jobs with descriptions using Promise.all; avoid blocking by scraping not so aggressively, as shown in the next section.
Switch from node js request to puppeteer to bypass blocking when scraping craigslist, using an automated chromium browser, while limiting requests to prevent ip bans.
In this lecture we'll learn how to open any given URL with Puppeteer and the Chromium browser.
Write Node.js scraping code using Puppeteer and Cheerio to fetch page HTML and extract job titles, with plans to capture posted date and job description URL.
Create an array of scraping objects using a single map loop to extract title and url, test in the browser console, then run and validate results in Node.js.
Setting up a MongoDB database is fast, easy and free with MLab!
Configure a proxy in Node.js request by setting defaults with proxy, explore free and paid proxies, diagnose common errors like host unreachable, and avoid blocking with throttled scraping.
Create a __tests__ folder using the jest convention and add a test script in package.json to run with watch mode, enabling live test updates while editing.
Locate the neighborhood element inside the result info container, extract its text with find and text, trim whitespace, and fix the tests so all tests pass.
Export scraper results by converting an array of objects to a csv file with the objects to csv package and writing it to disk, then validate readability in Google Sheets.
In this course you will learn how to scrape a websites, with practical examples on real websites using JavaScript Nodejs Request, Cheerio, NightmareJs and Puppeteer. You will be using the newest JavaScript ES7 syntax with async/await.
You will learn how to scrape a Craigslist website for software engineering jobs, using Nodejs Request and Cheerio. You will be using the newest JavaScript ES7 syntax with async/await.
You will then learn how to scrape more advanced websites that require JavaScript such as iMDB and AirBnB using NighmareJs and Puppeteer.
I'm gong to also show you with a practical real-life website, how you can even avoid wasting time on creating a web scraper in the first place, by reverse engineering websites and finding their hidden API's!
Learn how to avoid being blocked from websites when developing out your scraper, by building out the scraper in a test-driven way with mocked html, rather than hitting the website every time as you're debugging and developing it. You'll also learn what you can do if you're blocked and your alternatives to get your scraper up and running regardless!
You will also learn how to scrape on a server with a bad connection, or even if you have a bad connection.
You'll even learn how to save your results to a CSV file and MongoDB!
How do you build a scraper that scrapes every 1 hour (or other interval), and deploy it do a cloud host like Heroku or Google Cloud? Let me show you, quick and easy!
How do you scrape a site requiring passwords? I'm going to show you that too with a real website (Craigslist)!
How do you serve your scraping results in a REST API with Nodejs Express? And how can we build a React frontend that's showing the results? You'll learn that too, in the quickest and simplest way possible!
Plus, a section covering how to make a basic GraphQL API is included in the course.
As a last cherry on the top, I have a section containing a secret backdoor showing you how to scrape Facebook using only Request!
If you have issues regarding a site you're trying to scrape yourself, it's totally okay to reach out to me for some help. I'd be happy to point you in the right direction! Whatever issues my students are facing, I use that to expand on my course!