Yes, computers can see too. Want to know how? This course will not just show you how but equip you with skills to implement your own ideas. Let’s get started!
This course is a blend of text, videos, code examples, and assessments, which together makes your learning journey all the more exciting and truly rewarding. It includes sections that form a sequential flow of concepts covering a focused learning path presented in a modular manner. This helps you learn a range of topics at your own speed and also move towards your goal of building cool computer vision applications with OpenCV.
OpenCV is a cross-platform, free-to-use library that is primarily used for real-time computer vision and image processing. It is considered to be one of the best open source libraries that helps developers focus on constructing complete projects on image processing, motion detection, and image segmentation.
This course has been prepared using extensive research and curation skills. Each section adds to the skills learned and helps us to achieve mastery in developing computer vision applications using OpenCV. Every section is modular and can be used as a standalone resource too. This course has been designed to teach you OpenCV through the use of projects and learning recipes, and equip you with skills to develop your own cool applications. This course will take you through the commonly used Computer Vision techniques to build your own OpenCV projects from scratch.
Starting with the installation of OpenCV on your system and understanding the basics of image processing, we will swiftly move on to creating optical flow video analysis or text recognition in complex scenes, and will take you through the commonly used computer vision techniques to build your own OpenCV projects from scratch. We will develop awesome projects that will focus on the different concepts of computer vision such as image processing, motion detection, and image segmentation. By the end of this course, you will be familiar with the basics of OpenCV such as matrix operations, filters, and histograms, as well as more advanced concepts such as segmentation, machine learning, complex video analysis, and text recognition.
This course has been authored by some of the best in their fields:
Prateek Joshi is a Computer Vision researcher and published author. He has over eight years of experience in this field with a primary focus on content-based analysis and deep learning. His work in this field has resulted in multiple patents, tech demos, and research papers at major IEEE conferences. You can visit his blog.
David Millán Escrivá
David Millán Escrivá has more than 13 years of experience in IT, with more than nine years of experience in Computer Vision, computer graphics, and pattern recognition, working on different projects and start-ups, applying his knowledge of Computer Vision, optical character recognition, and augmented reality. He is the author of the DamilesBlog, where he publishes research articles and tutorials on OpenCV, Computer Vision in general, and optical character recognition algorithms.
Vinícius Godoy is a computer graphics university professor at PUCPR. He started programming with C++ 18 years ago and ventured into the field of computer gaming and computer graphics 10 years ago. He is currently working with medical imaging systems for his PhD thesis.
Robert Laganiere is a professor at the School of Electrical Engineering and Computer Science of the University of Ottawa, Canada. He is also a faculty member of the VIVA research lab and is the co-author of several scientific publications and patents in content-based video analysis, visual surveillance, object recognition, and 3D reconstruction. Since 2011, Robert has also been Chief Scientist at Cognivue Corp, a leader in embedded vision solutions.
Let's see how to get OpenCV up and running on various operating systems.
We are going to use CMake to configure and check all the required dependencies of our project. So, let’s learn about the basic CMake configuration files and how to create a library. In this video, we will take a look at the
CMakelists.txt file and understand how to use the
CMake has the ability to search our dependencies and external libraries, giving us the ability to build complex projects depending on external components by adding some requirements. One of the most important dependency is, of course, OpenCV. Let’s learn how to add it to our projects. In this video, we will learn how to add OpenCV dependencyto all our projects:
Now that we know about managing dependencies, let’s take a look at a bit more complex script. This video we will show us a script that includes subfolders, libraries, and executables, all in only two files and a few lines. In this video, we will define two
CMakelists.txt files, write a new library in the
CMakeLists.txt file in the
UTILS folder, and use the
The most important structure in computer vision is without any doubt the images. The image in computer vision is a representation of the physical world captured with a digital device. In this video, we will learn about pixels and take a look at matrix format and matrix storage:
After the introduction of matrices, we are ready to start with the basics of an OpenCV code. This video will teach us how to read and write images. In this video, we will take a look at the
imwrite functions, use
.rows attributes of a matrix, and use
imshow function to show the image:
We now know how to read and write images but reading videos can be a bit tricky. This video introduces us to reading a video and camera with a simple example. In this video, we will take a look at the
CommandLineParser class, use the
.get function, and show the frames with the
We have learned about the
Vec3b classes, but we need to learn about other classes as well. In this video, we will learn about the most basic object types required in most of the projects:
In many applications, such as calibration or machine learning, when we are done with the calculations, we need to save the results in order to retrieve them for the next execution. Before we finish this section, we will explore the OpenCV functions to store and read our data. In this video, we will learn how to use the
mat constructor, how to use the
eye functions, and how to write to a file storage:
First, we need to prepare a CMake script file that enables us to compile our project, structure, and executable. In this video, we will generate a
CMakeLists.txt file, find the OpenCV library and show a message about the OpenCV library version, and add the source files to link them to the OpenCV library:
The main graphical user interface can be used in our application to create single buttons. In this video, we will add the required OpenCV headers, set up the instructions, and print the help message. Finally, we will set up showing the input image and waiting for a key press to finish our application:
Histogram is a statistical graphic representation of variable distribution that allows us to understand the density estimation and probability distribution of data. In this video, we will create three matrices to process each input image channel, calculate the histogram using the OpenCV
calcHist function, and finally, show the histogram image with the
Image equalization obtains a histogram with a uniform distribution of values. In this video, we will convert the input BGR image into YCrCb using the
cvtColor function, split the
YCrCb image into different channel matrices, and merge the resulting channels and convert the result to the BGR format:
Lomography is a photographic effect used in different mobile applications, such as Google Camera or Instagram. In this video, we will manipulate the red color with a curve transform, split our input image into channels using the
split function, and finally, convert the flat image matrix result to an 8-bit image:
The cartoonize effect creates an image that looks like a cartoon. In this video, we will detect the most important edges of the image, multiply our edges' result image by the color image, and merge the color and edges' results:
We will now introduce you to the first step of any AOI algorithm, that is, isolating different parts or objects in a scene. In this video, we preprocess the image and segment the image:
We will now create our new application. To create our new application, we require a few input parameters when the user executes them. In this video, we will enable user selection using the
CommandLineParser class. We will also check whether the input image is correctly loaded:
The preprocessing is the first change that we make in a new image before we start with our work and extract the information that we require from it. Normally, in the preprocessing step, we try to minimize the image noise, light conditions, or image deformations due to the camera lens. These steps minimize the errors when you try to detect objects or segment your image. In this video, we will learn how to remove noise, remove the background using the light pattern, and binarize the image:
Now, we will introduce you to the following two techniques used to segment our thresholded image:
With these two techniques, we will be allowed to extract each region of interest of our image where our target objects appear; in our case, a nut, screw, and ring. In this video, we will apply each algorithm to the next binarized image, return an integer with the number of detected labels, and draw each detected contour:
We will learn how to implement our own application that uses machine learning to classify objects in a slide tape. In this video, we will see the basic structure of computer vision applications with machine learning. We will perform the preprocess step, which involves light removal conditions, noise, thresholding, blur and so on, and finally, extract the regions of the image and isolate each one as a unique object:
Continuing with the example from the previous section, we will be able to recognize different objects to send notifications to a robot or put each one in different boxes. In this video, we will create an SVM model and train our SVM model with the training feature vector. For each object detected, we will extract the features and predict with an SVM model:
Now, let's extract the features of each object. To understand the feature concept of a feature vector, we will extract very simple features, but it is enough to get good results. In other solutions, we can get more complex features, such as texture descriptors, contour descriptors, and so on. In this video, we will create a function that takes an image as the input and returns two vectors of left and top positions for each object detected in the image as parameters, and we will create the output vector variable and contours variable that are to be used in our
findContours algorithm segmentation. Finally, we will show the detected objects in a window for user feedback.
In order to extract the Haar features, we need to calculate the sum of the pixel values enclosed in many rectangular regions of the image. To make it scale invariant, we need to compute these areas at multiple scales (that is, for various rectangle sizes). If implemented naively, this would be a very computationally intensive process. We would have to iterate over all the pixels of each rectangle, including reading the same pixels multiple times if they are contained in different overlapping rectangles. If you want to build a system that can run in real time, you cannot spend so much time in computation. We need to find a way to avoid this huge redundancy during the area computation because we iterate over the same pixels multiple times. In this video, we will see what integral images area and how to use them to avoid this redundancy:
OpenCV provides a nice face detection framework. We just need to load the cascade file and use it to detect the faces in an image. When we capture a video stream from the webcam, we can overlay funny masks on top of our faces. In this video, we will read input frames from the webcam, convert the image to grayscale, and extract the region of interest to overlay the mask:
Now that we understand how to detect faces, we can generalize this concept to detect different parts of the face. We will use an eye detector to overlay sunglasses in a live video. It's important to understand that the Viola-Jones framework can be applied to any object. The accuracy and robustness will depend on the uniqueness of the object. In this video, we will run the eye detector, adjust the size of the sunglasses, and overlay the sunglasses mask:
Background subtraction is very useful in video surveillance. Basically, the background subtraction technique performs really well in cases where we need to detect moving objects in a static scene. Now, how is this useful for video surveillance? The process of video surveillance involves dealing with a constant data flow. The data stream keeps coming in at all times, and we need to analyze it to identify any suspicious activities. Let's consider the example of a hotel lobby. All the walls and furniture have a fixed location. Now, if we build a background model, we can use it to identify suspicious activities in the lobby. We can take advantage of the fact that the background scene remains static (which happens to be true in this case). This helps us avoid any unnecessary computation overheads. In this video, we will build a model of the background. We will then create a background model that is used to detect background pixels. Finally, the difference between background model and image is computed:
We know that we cannot keep a static background image that can be used to detect objects. So, one of the ways to fix this would be to use frame differencing. It is one of the simplest techniques that we can use to see what parts of the video are moving. When we consider a live video stream, the difference between successive frames gives a lot of information. The concept is fairly straightforward. We just take the difference between successive frames and display the difference. In this video, we will calculate the difference between successive frames and display the difference. We will then take the frame differences and apply a bitwise
AND operator and resize the frames and convert them to grayscale.
In this video, we will formulate and implement a mixture of gaussians. We will first identify whether the data has multiple groups inside the subpopulation and then we will represent each subpopulation using the Gaussian function:
As discussed earlier, background subtraction methods are affected by many factors. Their accuracy depends on how we capture the data and how it's processed. One of the biggest factors that tend to affect these algorithms is the noise level. When we say noise, we are talking about things, such as graininess in an image, isolated black/white pixels, and so on. These issues tend to affect the quality of our algorithms. This is where morphological image processing comes into picture. Morphological image processing is used extensively in a lot of real-time systems to ensure the quality of the output. Morphological Image processing is used in processing the shapes of features in the image. In this video, we will use a structuring element to modify an image, and then check out what are the erosion and dilation operations:
In this video, we will see some other morphological operators that are interesting:
In order to build a good object tracker, we need to understand what characteristics can be used to make our tracking robust and accurate. So, let's take a baby step in this direction, and see how we can use colorspaces to come up with a good visual tracker. One thing to keep in mind is that the color information is sensitive to lighting conditions. In real-world applications, you need to do some preprocessing to take care of this. In this video, we will use colorspaces to come up with a good visual tracker. We will take the pixels of an image to the HSV space and use colorspace distances and threshold in this space thresholding to track a given object:
A colorspace-based tracker gives us the freedom to track a colored object, but we are also constrained to a predefined color. What if we just want to randomly pick an object? How do we build an object tracker that can learn the characteristics of the selected object and track it automatically? This is where the CAMShift algorithm, which stands for Continuously Adaptive Meanshift, comes into the picture. It's basically an improved version of the Meanshift algorithm. In this video, we will select a bunch of points based on the color histogram, we will move the bounding box of the object to a new location so that the new centroid becomes the center and use the CAMShift algorithm:
Corner detection is a technique used to detect interest points in an image. These interest points are also called feature points or simply features in computer vision terminology. A corner is basically an intersection of two edges. An interest point is basically something that can be uniquely detected in an image. A corner is a particular case of an interest point. These interest points help us characterize an image. These points are used extensively in applications such as object tracking, image classification, visual search, and so on. Since we know that the corners are interesting, let's see how we can detect them. In computer vision, there is a popular corner detection technique called the Harris corner detector. We construct a 2 x 2 matrix based on partial derivatives of the grayscale image, and then analyze the eigenvalues. Now what does this mean? Well, let's dissect it so that we can understand it better. Let's consider a small patch in the image. Our goal is to check whether this patch has a corner in it. So, we consider all the neighboring patches and compute the intensity difference between our patch and all those neighboring patches. If the difference is high in all directions, then we know that our patch has a corner in it. This is actually an oversimplification of the actual algorithm, but it covers the gist. If you want to understand the underlying mathematical details, you can take a look at the original paper by Harris and Stephens. A corner point is a point where both the eigenvalues would have large values. In this video, we will construct a 2 x 2 matrix based on partial derivatives, check whether this patch has a corner in it, and compute the intensity difference between our patch and all the neighboring patches:
The Harris corner detector performs well in many cases, but it can still be improved. Around six years after the original paper by Harris and Stephens, Shi-Tomasi came up with something better and they called it Good Features To Track. They used a different scoring function to improve the overall quality. Using this method, we can find the N strongest corners in the given image. This is very useful when we don't want to use every single corner to extract information from the image. As discussed earlier, a good interest point detector is very useful in applications, such as object tracking, object recognition, image search, and so on. In this video, we will find the N strongest corners in the given image, use the algorithm to detect the corners, and draw circles on these points to display the output image:
Feature-based tracking refers to tracking individual feature points across successive frames in the video. The advantage here is that we don't have to detect feature points in every single frame. We can just detect them once and keep tracking them after that. This is more efficient as compared to running the detector on every frame. We use a technique called optical flow to track these features. Optical flow is one of the most popular techniques in computer vision. We choose a bunch of feature points, and track them through the video stream. When we detect the feature points, we compute the displacement vectors and show the motion of those keypoints between consecutive frames. These vectors are called motion vectors. A motion vector for a particular point is just a directional line that indicates where that point has moved as compared to the previous frame. Different methods are used to detect these motion vectors. The two most popular algorithms are the Lucas-Kanade method and Farneback algorithm. In this video, we will choose a bunch of feature points and track them through the video stream. We will then compute the displacement vectors and show the motion of those key points:
Software that identify letters do so by comparing text with a previously recorded data. Classification results can be improved greatly if the input text is clear, if the letters are in a vertical position, and if there are no other elements, such as images that are sent to the classification software. In this video, we'll learn how to adjust text. This stage is called preprocessing. We will threshold the image and segment the text. Finally, we'll perform text extraction and skew adjustment:
Tesseract is an open source OCR engine originally developed by Hewlett-Packard Laboratories, Bristol, and Hewlett-Packard Co. It has all the code licenses under the Apache License and is hosted on GitHub. It is considered as one of the most accurate OCR engines that is available. It can read a wide variety of image formats and can convert text written in more than 60 languages. In this video, we will download the installer. We will then choose a suitable location for your installation and then look at setting up Tesseract dependencies:
As Tesseract OCR is already integrated with OpenCV 3.0, it still worth studying its API since it allows a finer control over Tesseract parameters. In this video, we create an OCR function, call the
SetImage method with the signature, and send the output to a file:
Packt has been committed to developer learning since 2004. A lot has changed in software since then - but Packt has remained responsive to these changes, continuing to look forward at the trends and tools defining the way we work and live. And how to put them to work.
With an extensive library of content - more than 4000 books and video courses -Packt's mission is to help developers stay relevant in a rapidly changing world. From new web frameworks and programming languages, to cutting edge data analytics, and DevOps, Packt takes software professionals in every field to what's important to them now.
From skills that will help you to develop and future proof your career to immediate solutions to every day tech challenges, Packt is a go-to resource to make you a better, smarter developer.
Packt Udemy courses continue this tradition, bringing you comprehensive yet concise video courses straight from the experts.