Activation functions

Francisco Juretig
A free video tutorial from Francisco Juretig
Mr
3.9 instructor rating • 9 courses • 18,127 students

Lecture description

Every layer of a neural network works by multiplying the different weights by the inputs/neurons and after each sum is computed, an activation function is applied. In general, these activations are nonlinear and we can choose among several ones in Keras: sigmoid, elu, relu, tanh, etc.

Learn more from the full course

Keras: Deep Learning in Python

Build complex deep learning algorithms easily in Python

10:04:33 of on-demand video • Updated July 2017

  • Use Keras for classification and regression in typical data science problems
  • Use Keras for image classification
  • Define Convolutional neural networks
  • Train LSTM models for sequences
  • Process the data in order to achieve to the specific shape that Keras expects for each problem
  • Code neural networks directly in Theano using tensor multiplications
  • Understand what are the different layers that we have in Keras
  • Design neural networks that mitigate the effect of overfitting using specific layers
  • Understand how backpropagation and stochastic gradient descent work
English [Auto] It is actual briefly review what are the available activation functions that we are having keras. So before exploring this. Remember that in our typical neural network configuration we will have. Imagine that we have two features in this input layer. And then I am mapping these into just one neuron in the next layer. So let's assume that's my inputs are 20 and 2 for the first observation. So these are two features that I capture for a verse. So I will have a 10 here I will have a two here and I would print is in color one wait on the second white and I would use these wait times 10. So wait one times during class wait two plus two. But remember I also have a bias here so I will cut a one and wait three so I'll wait. Three times one. So all these three things will be used here to compute. Then this neuron neural one from these he then later he then one. And these will be input player input player. So these values will return a number. Right. Let's call this number x. Remember that these w terms are the weights and they need to be estimated or trained and they will be initiated with some random values and in every iteration they will be updated using the optimizer that Congress has. It will use of the Fianna or turns or floor to myself depending on what you're doing there. So the key element here is that this will be this is basically a linear operation. And we are applying it here and we get a linear combination of these things and sometimes the data is very complex. And part of the concept our whole concept of using neural networks is to exploit the nonlinearities in the data. It is one of the very few techniques in machine learning and statistics that really exploit the non-linearity in the data. So it's kind of far BT to define a neural network and do not explode fully exploding or insanity here. So that's what that's where our activation functions appear. They will transform the sound that we are generating by applying a nonlinear function. So this xt that we have here will be transformed by an activation function and I will get the value that will get passed here. That's the whole idea. So there are two ways of defining activations in Paris. This is the traditional way specifying in a dense layer. So this contains 64 neurons in this layer and then we find the activation which is the hyperbolic tangent I prefer to use a different approach which is to put everything on the same line. I think it's easier to read easier to understand and it requires less lines of code. So in this case we're again especially find Hyperball dungeon with 64 hidden 64 neurons in this letter and I only used one line. So what are the proper activation functions that we have so you can see we have quite a lot of them. And we also have advanced activations that are I don't think they're used frequently but anyway for some reason you need these advanced activation you can use a special modem in Karris that has them. I don't recommend digging into this. You should be able to solve any problem with this one over here. So what are the available functions that we have. So we have a soft. So I'll go to a Wiki article that describes here the activation functions so we have some marks. So let's search for us of Max. So much is I think probably the most important activation function in Garrus. It will be very useful for producing multiple outputs and this typically appears when you need to classify images so when you are classifying images you can for example cat dog kangaroo and say let's have one more elephant. So this activation function will produce numbers that will be bounded between 0 and 1 and all of this will some 1. So this will be between 0 and 1 Siro 1 0 1 0 on and you can see that that's why it appears here soft Max disappears. So from one to k. So Kate Lassus This is the expression. It can also be used in intermediate layers of course. But it's I think it's quite rare. In general you would use soft marks as the final output of them all. So you have you will typically have an input layer several key then layers. And finally when all of that will be produced by the final layer and typically you will use soft marks for while always you will use soft marks for our classification problems at the at the end of the configuration of the last layer. As I was saying you can use it here in the middle you can plot soft marks here or here but it's not very frequent. You can do it but it's hard to imagine why you would specifically need to do that anyway. So apart from soft marks we have lots of fire activation functions. The simplest one is the linear one which is basically nothing. It just returns the same value that we passed. So X returns X quite simple then we have two activation functions that are used a lot especially multi-layer perception configurations. If you know logistic regression you will certainly know sigmoid. If you don't you can you will know about it now. But they are basically very very similar. So you can find them here. So here you can the hyperbolic tangent and here you have the sigmoid so you can see that here the sigmoid is bounded between 0 and 1. So that's super important. And these divisions are clearly nonlinear. There is no doubt about that. And you can see that for large values. Basically the function is not changing too much. You can see the same thing here. Now it's values basically do not change to match the body. But when we have bodies that are in the middle of the possible range of values that we're producing they do produce a lot of change in the activation the value that we get. And the opposite thing happens when we come very very nobodies. When we come very very low body is basically a minor change here or an exchange if you want to think about that. Does basically not address anything. So any change here is not doing anything until you reach like this middle area where in fact there is a huge change. So this is these are two are very very well known activation functions because they are truly an super non-linear. They have this very nice feature that they are symmetric producing similar response when you're when you are here with large bodies we are here with small bodies anyway so we have seen this too. Now let's see. Reller so Rayno is another very very well known activation function which is essentially this one over here. It's the rectified linear unit. So these returns zero or the value that we get so let even get minus 20 in our in our computation by multiplying the wait times the inputs or times the neurons we get. We get zero if that's more than zero so minus 20 minus 30 minus 40 to get to Syria. But if we get 20 or 30 or 40 we get 20 30 or 40. We get this despite. So apart from this rectified being a unit we have other variations of the rectified in your unit. So let's see them. So we have a new so is the exponential thing at Union it is basically exactly the same as before when you do is greater than zero. This part is the same but what changes is this part when it does more than zero. It's a nine year thing. You remember that before it was a sewer. So it's a zero you can do your thing. Then we have some books where the subclass so soft Plus is this one over here which is a variation of the rectified one and we can see this here and jumping into that rectifier article in Wikipedia. And here we have the two of them. The blue one is the normal Rainbow which is this one on the green is the suffuse. So subclassed is basically smoothing this part which is you can see it's very clear that it's not very it's not very dramatic the change it's very very small. So then we would have practically seen all of these things. Our Missing soft side so soft sighing is actually not very used anyway but you can use it on soft scientists over here. So it's this expression it is basically similar to to the sigmoid and to cover valy tangent. You can see that again this is bounding between 0 and 1. Well well strictly Seirawan but mine is one on one the same as the hyperbolic tangent. And the reason is because if this is negative This part is negative and you got one blast to the body. So this goes between minus 1 and 1. So with this division Funches you should be ready to code any neural network configuration that you want and in case you don't find this sufficient you can exploit the advanced activations that you have here and then from the cast of layers that advanced activations.