Support Vector Machines (SVM) and Support Vector Classifiers (SVC)

Sundog Education by Frank Kane
A free video tutorial from Sundog Education by Frank Kane
Founder, Sundog Education. Machine Learning Pro
4.5 instructor rating • 22 courses • 442,733 students

Lecture description

Support Vector Machines use the "Kernel Trick" to classify data. Hyperparameter tuning becomes important to find the right kernel to use, and the right parameters for that kernel.

Learn more from the full course

Autonomous Cars: Deep Learning and Computer Vision in Python

Learn OpenCV, Keras, object and lane detection, and traffic sign classification for self-driving cars

12:44:34 of on-demand video • Updated May 2020

  • Automatically detect lane markings in images
  • Detect cars and pedestrians using a trained classifier and with SVM
  • Classify traffic signs using Convolutional Neural Networks
  • Identify other vehicles in images using template matching
  • Build deep neural networks with Tensorflow and Keras
  • Analyze and visualize data with Numpy, Pandas, Matplotlib, and Seaborn
  • Process image data using OpenCV
  • Calibrate cameras in Python, correcting for distortion
  • Sharpen and blur images with convolution
  • Detect edges in images with Sobel, Laplace, and Canny
  • Transform images through translation, rotation, resizing, and perspective transform
  • Extract image features with HOG
  • Detect object corners with Harris
  • Classify data with machine learning techniques including regression, decision trees, Naive Bayes, and SVM
  • Classify data with artificial neural networks and deep learning
English [Auto] Next we'll talk about support vector machines. They're pretty hard to wrap your head around mathematically but they're a very powerful machine learning model out there and they actually do have applications within the world of autonomous vehicles so let's dive into support vector machines or SVM is in more depth. They're very useful for classifying higher dimensional data where we have lots of different features that we're trying to evaluate together at once like all the different pixels in an image coming from your self-driving cars camera for example. The idea is that it finds higher dimensional support vectors across which to divide the data. So from a mathematical standpoint the support vectors define hyperplane set separate your data into different classifications. And I'm not going to get into the mathematical details because when you start getting into these higher dimensions there's really no way to wrap your head around it. From an intuitive level. But the thing you need to know about SVM is that it depends on something called the kernel trick. And this is just a way of actually looking at smaller subsets of the data and aggregating those analyses of those smaller subsets over the entire integrated set of your data as a whole. This is a way that we can represent it in higher dimensional spaces to find these hyperplane. That might not be apparent in lower dimensions. So the kernel trick is just a way of making this process of finding the hyperplane sufficient and also to uncover these higher dimensional separations that you might not be able to visualize in lower dimensions that our brains are more accustomed to working in. And yeah we got all kind of like metor there with higher dimensions and hyper planes and stuff. The important point though is that SVM employs some pretty advanced mathematical trickery to cluster data and it can very be very good and effective at handling data sets with lots of features which you can think of as lots of different dimensions of the data. But it's also a very expensive thing to do. And the kernel trick that we describe briefly is the only thing that really makes this practical impossible in the real world. Now in practice we're going to use something called Visi that support vector classification to classify data using the SVM technique the support vector machine technique. And there are different kernels we can use for our kernel trick with SPC some work better than others we're given dataset. So for example there's a linear kernel which obviously finds linear patterns and separations in your data. There's the polynomial kernel which can get more curvy because it has polynomials and not just lines and the RPF kernel which gets even more curvy to kind of put things in colloquial terms. Here's an example of those three different kernel types on some data that I was playing with here. What I did was randomly distribute some points in this little graph here just around some different center points randomly scattered around the graph and give it some standard deviation around that so the actual patterns that were trying to find here for classification are really circular areas given by this radius of different clusters I threw in there and this is the attempt at the different kernels of finding those clusters. So you can see the linear kernel is limited to these linear separations between the clusters and it can't really do a very good job of finding those curvy boundaries that actually exist in the fundamental nature of the data itself. But it does the best it can. We have a polynomial kernel here which because of its polynomial nature can produce these curves that try to get there a little bit better. And this is getting a little bit closer to reality here. We're a little bit better. We can choose different degrees of the polynomial as well to describe how complicated you want them all to be and just how curvy it can get. Finally we have the RDF kernel RB B.F. stands for radial basis function. And that's really just a fancy way of saying that we're basing that kernel on the square distance between the points it's looking at. And this allows us to get more complex curves here that you can see here and it's starting to get more at those underlying actual clusters themselves so we're seeing things that are more approximating those circular areas or the clusters were actually generated within. So that's actually looking pretty promising there. However SVM and SVM can be prone to overfitting. They're very sensitive to the parameters you give them. So for example in our B-F we can give it what it's called a gamma value that defines just how curvy it can get basically. And your choice of gamma value can really influence how over fitted your results are to your training data. The gamma value specifies the area of influence of the support vectors and as you make that smaller and smaller with higher gamma values you end up overfitting the data more and more. So it's very important to find the right gamma value for your data. So here are some examples on some real data that Iran will using our B.F. with gamma of 10 you can see that we're kind of getting toward what the polynomial model looked like. It's a little bit more linear in nature a game of 100 cars it's sort of a better result that you know finding overall patterns. But overfitting a little bit here. You know those curves don't really exist in the function that generated this test data. So there is some overfitting happening already at a gamma of 1000. We're just overfitting like crazy. I mean we're basically just drawing circles around each individual data point here and you know obviously that's not going to work in a general purpose system at all. So again this is basically an overfitting problem in the way that we deal with overfitting is through K foaled cross-validation. So what we do to avoid this situation is use K fold cross-validation on a variety of different gamma values to try to find which ones come out best. Careful cross-validation again it's a way of trying to prevent overfitting by randomising our training data set into multiple training sites that are evaluated independently and averaged together so we can't overfit on any one choice of training data. Now the way that we actually evaluate these different things at once is made easy through a python class called grid search. See the grid search cross-validation is what that stands for of course and that allows us to very easily try out different hyper parameters. That's what gamma is called. It's one of many hyper parameters for a model and try to converge on the one that actually minimizes the Kafeel cross-validation error. We call this hyper parameter tuning and it's a very common problem in machine learning in general. A lot of these techniques that we use have a lot of different knobs and dials you can tweak such as the gamma setting and through hyper parameter tuning we have to arrive at the right the best value that does the best job of predicting values in our model. But without overfitting to the data that we're giving it let's dive into an example and make it real and will actually do a little bit of an exercise here using AVC and later on we'll do a project that's a little bit more directly related to self-driving cars as well.