
This course includes our updated coding exercises so you can practice your skills as you learn.
See a demo
Welcome. This course would teach you practical ways of capturing data from the internet.
My name is Mykhailo Kushnir, and currently, I’m working as ML Engineer in Ukraine, Lviv.
I need data for both my work and my pet projects. I’m sure most of you are here for similar reasons.
Same as I, you probably don’t want to commit all the time in the world to it. Because of that, I would try to keep tutorials as short and condensed as possible. That’s my intention.
Throughout this course, you’ll learn many ways to scrape data, store and version control data efficiently, use selenium for data capturing, and many more.
Finally, a small disclaimer: Udemy typically asks you to rate a course faster than you actually went through it enough to form an opinion. If that is the case, feel free to postpone the rating decision until you understand whether this course gave you enough value for the money it costs. Also, if you face any obstacles during the education process, please let me know about them, and we’ll see if I can be helpful to you. Otherwise, enjoy the course!
Hi, everyone. In this video, I’ll try to explain to you how to use this course for your own good.
First of all, I assume I know your problem. You either want to get data for your own pet project or you’re looking for a side-gag skill which scraping is.
And you want it now.
I’ve created a course that I would like to watch myself and I don’t really like long-running stuff. I’ll supplement you with reading materials, links and scripts that would help you immediately, but nonetheless, you’d have to google. On my end, I promise you to pack the content with information and useful tools.
The main part of the code would be placed on this GitHub repository. You’ll find the links to it after the video. By the way, that would be a common pattern. Whenever you see an external resource on the screen, a link to it would be possible to find after the video in the reading materials.
For the best efficiency you need to follow the course in 3 steps:
Watch the videos
Reproduce the code from it
Extend this code for some real use case problems. I’ll give you some ideas.
If something goes wrong, reach out to our slack community for a potential answer.
Now you’re fully ready for your first tutorial. It won’t be a simple one, but you’ll make it. Good luck!
Hey everyone, in this section I’ll introduce you to the course and give some tips on how to learn with the highest efficiency
After the initial overview, we will learn how to set up a programming environment for web scraping. When you complete the video part, you’ll find reading materials with links. Make sure you go through them as there would be something to grasp.
In this initial setup, you would need Python Docker and your favourite IDE. I’d suggest VS code.
First, you have to learn how to install python. There’s no better way of doing it except going to python's official website and following tutorials under the Downloads section
Next, we’d have to install a virtual environment package. And use it to create a new environment. You’ll be using it for installing requirements.txt through various projects in this course.
Virtualenv package helps you skip versioning issues so it’s definitely a useful tool.
If everything was done correctly, you would be able to create a virtual environment and install the requirements.txt file. Make sure you’ve pulled the source code for this course from GitHub.
Go to the docker install page to see how you can set it up on your specific operating system
When docker would be installed, for the start it would be enough for you to pull selenium standalone-chrome for this course
And then start it with the run command
Here is a useful link for VS code installation as well
Once again, if you face issues with this initial setup - make sure you’ve glanced at the reading materials after the video section. You can also go to our slack community to search for help from other students.
Tracking and reproducing HTTP requests is an ultimate and primary method of getting data from the Internet. You should always aim to use it, either by replicating your browser's requests or by requesting API access from site owners. In this tutorial, I'll show you how to find the necessary requests and replicate them in a blink of an eye with Postman.
Here is the list of topics we will touch on. First, I’ll explain the difference between the select method and find/find_all methods and why you should prefer the first one more often. Next, we’ll look for use cases when you’d access parents and children elements through specific properties and we’ll review how to get to the text content of tags.
First, let me explain the difference between the select and the find_all methods. Both of them are aiming to capture all elements by some predefined criteria. For example, here’s how you can modify the code from the previous tutorial to use the find_all method and still reach the same result. As you see, the find_all method requires you to define selectors in a pythonic way while the select method allows a more natural, JavaScript-ish way. That’s why I prefer the first one.
The find method does the same thing that the find_all does, but only matches the first element if it exists. It’s not hard to conclude that the same thing can be achieved through the select method and usage of some Python magic.
There are two major use cases I can remember. First, if you have your data packaged into some local HTML or XML file. In that case, you can load it into BS4 and apply its tools to read the markup’s content.
Second, if you’re trying to parse static websites. By static here I mean sites without the usage of javascript for rendering. In other words, if your data is present right away, beautiful soup can be a useful tool. Let us look at an example.
Hi everyone, in this tutorial we will make scraping more dynamic and introduce the selenium library. It is well-known among programmers as a helpful tool in many automation tasks. It can help you simulate user behaviour and therefore pass certain traps along the way of data capturing.
This library helps you navigate through the site using code simulating customers' behaviour. Bunch of stuff you can do with it like:
Interact with site
Click on elements
Drag-N-Drop simulations
Form filling
JavaScript execution on page
DOM search and many more;
In this tutorial I’ll show you how to use the Selenium framework in the most visual possible way. The goal of it is to make you familiar with the tool and those features we would be using in the next lessons. For that purpose, I encourage you to use Jupyter Notebooks as it allows running code step by step with all variables staying in memory. You should be prepared for this if you completed this course's installation tutorial.
This method is helpful for the sake of speeding up your regular scraping tasks. For example, let’s say you’re willing to scrape a site on a daily basis. Instead of logging in each time, you can use a custom scraping profile in your browser to have a session up and running until it is expired by the target site.
Be cautious about this method as some sites would block automation scripts. I’m using such methods typically when I need to perform a single request to the site every once in a while. In later tutorials, I’ll also demonstrate to you how to make your script behave more like a human so it would be possible to decrease the chance of ban.
Services like 2captcha.com can help you. They provide human help for resolving captchas. For this particular tutorial you’d need to create an account there and preferably donate at least 1$ to reproduce the code I’m going to show. Otherwise, this practical exercise would be only theoretical for you.
In this tutorial, we’ll learn how to parse data from JavaScript rendered graphics.
In the first part, you'll see how to create a Python script that captures the reserves of 44 Binance crypto wallets.
Heroku Deployment. Part #2
With the vast amount of data available on the internet, it's no wonder that web scraping has become such a popular tool for extracting information. Whether you're looking to gather data for research purposes or collect information from a competitor's website, web scraping can be a valuable skill in your toolkit. And with this practical web scraping course, you'll learn everything you need to know to start extracting data from any website. So if you're ready to start learning web scraping, this is the course for you.
Right now, the "Practical Web Scraping Course" is an ongoing project and therefore it will contain the most recent ways to parse data and would be updated often. You'll also get your answers to the questions you'd have in a short period. Here's the list of all themes that you'd learn within this course eventually:
Tracking HTTP requests in practice
Basic scraping with BS4 and requests libraries
BS4 tools in detail
Efficient scraping with Selenium
Visual Intro to Selenium tools
Dealing with authentication and user sessions
Bypassing Captcha
Scraping dynamic websites
Selenium and pagination
Scraping HighCharts.JS
Use Heroku to host your spiders
Scrapy Introduction
Scrapy integration with DB
[Items below would be added in the next part of the course]
Hosting Scrapy spiders locally
Use schedulers to run Scrapy spiders locally
Ethical scraping tools
Avoid getting banned
Scraping images and pdf’s
Real-time scraping
With this course you will be able to:
- Save time by learning modern methods of data scraping
- Get information about the most up-to-date scraping tools and techniques
- Avoid being scammed by others selling outdated courses
- Get your money's worth with a complete and comprehensive course