Automated ARIMA Model Selection with auto.arima
Learn more from the full course
Introduction to Time Series Analysis and Forecasting in R
Work with time series and all sorts of time related data in R - Forecasting, Time Series Analysis, Predictive Analytics
08:24:28 of on-demand video • Updated March 2019
use R to perform calculations with time and date based data
create models for time series data
use models for forecasting
identify which models are suitable for a given dataset
visualize time series data
transform standard data into time series format
clean and pre-process time series
create ARIMA and exponential smoothing models
know how to interpret given models
identify the best time series libraries for a given problem
compare the accuracy of different models
English [Auto]
Ari is great at time series analysis. So why is that? Well, it is versatile. Many people contribute with their knowledge and provide packages. That means various solutions are available for a given problem. Arima modelling is a classic example for that. There are several time series packages available which offer Arima modelling capabilities. And our base has Arima tools available as well. A very popular tool for Arima. Modelling is the auto daughter Rima function of the forecast package. It is popular and well known, but also many debates exist about its quality and performance. There are professionals advising against its use since they think the function poses a danger of producing uninformed, low quality models. They believe that the function makes it too easy to produce an Arima model with analysts not using enough brain in the process. I personally believe that the function is a game changer and a must have in your toolbox. In my years as consultant, I've seen many people using this function successfully as their starting point into time series analysis. It actually gets you into the subject. You get some first results, some positive feedback, and from that starting point you can then dive deeper into the subject and deepen your skills. I also believe that the function output, if used correctly, provides an excellent benchmark which you can use to compare other models against. That said, let's take a look at an example with the auto dot Arima function and the Lynx dataset. We already talked about the statistical principles behind a Time series. This info will now be applied. So if you're not yet clear about stationarity and auto regression, I advise you to revisit the section on Statistical Traits of Time series. Taking a look at the plot of the links dataset. We see these waves. It is a cyclic pulse. There is no fixed seasonal interval, but it's a pattern nonetheless. If you see something like this and thinking about the nature of the dataset itself, your first idea should be auto regression. The variable influences itself. Logically, if the Lynx gets caught, it means it is extracted from the population. The next year there will be less offspring, less lynx to catch. This cycle is at least true. Once the level of links extracted from the gene pool reaches a certain threshold. And that cycle is exactly what we see with these waves. And this is what we will see in the Arema model as well. R stands for auto regressive. Therefore we have the first parameter P so the model will be largely an auto regressive one. So how do we confirm auto regression exactly with the ACF and the PCF plots? We will simply get the combination plot with TS display from forecast. So here we can see that several of these lags represented with these bars are outside the threshold line. This shows autocorrelation in these lags and this needs to be captured in the model, especially the ACF plot is interesting with the first two bars. These s bars indicate that this model will be at least an R two model. The R order might be even higher and there could be an Ma part as well. But more on these two parameters P and Q later on based on the links time series plot. Do we need differencing with parameter D? Well, we need Differencing. When the time series is non stationary, that means the statistical properties change over time in the plot, but on the plot there is no indication that at any point any relevant statistic changes. The pulse of the dataset stays throughout the time series and the height of the spikes is also somewhat constant. The three extra high spikes are distributed over the dataset, with they all be at the end of the dataset. Well, that could change the matter completely, but generally the whole plot looks harmonic throughout its entirety. Therefore, it is totally plausible if the model does not contain a D parameter, but in case you are in doubt, use the ADF dot test to check for stationarity. We discussed this test in the lecture on stationarity. All right, so let's finally create our first Arima model with Auto Dot Arima, which is part of the forecast package. We are simply going to run the function auto dot Arima on Linux. So this is the blank function without any extra arguments and extra tweaking. Now, this is likely not giving us the best solution possible, but I will use this as the starter to explain the process and the results. As you can see, we get an Arima 202 with a zero mean as a result. That is quite in line with the initial plot. The whole thing looks fairly autoregressive. Maybe there are better alternatives, but more on that later on in the output. We not only get the blank Arima model parameter order, we also get the values for the information criterion as can be seen at the bottom here. And most importantly, we get the values for the model parameter coefficients and the mean. Since we do not have a zero mean. But please be careful here with the mean. If you use Arima of our base. This part is called intercept, although it is the mean as well. I will actually show you how these numbers are used in a later video. Here we focus mainly on the function. So let's actually take a look at the function help. It has loads of arguments available which might overwhelm you at the beginning, but on the plus side it makes the function very versatile. It works on surprisingly little info as well as you just experienced first hand when I was using it simply with the data argument. It worked very fine with only that information. I would say that this is a sign of a quality function. On one hand, you can tailor it. On the other hand, you can just go with the flow. So let's actually check out these arguments one by one. The whole thing starts with a three Arima model parameters. I talked about these in the intro video and we will hear a lot more about these parameters in the upcoming videos. Basically, we can use lowercase and uppercase D respectively to set the order of differencing. So that is the middle parameter in the Arima model. Note that in our notation uppercase in an Arima model stands for the seasonal parameter. Seasonal Arima models are possible. They do have also parameters P, D and Q, which are used for the seasonal component. Again, lowercase P is for the standard part of the Arima model and these parameters are always present, although they can be zero of course. Uppercase P is for the seasonal component of the model. If the model is non seasonal, then these parameters are simply not present. We can also set the maximum numbers for our parameters. If you do not want your model to get overly complicated with too many orders, then this is to be specified with the Max arguments. This includes max dot order which specifies the overall number of orders the model has. This is the sum of P and Q of the model. As an example, if we would set max dot P to one and we cannot get a model like to zero one since we restricted that first parameter P. With the starts arguments, you can specify where the model selection procedure starts, but this only applies when you go for the stepwise procedure. If you set stationary to true, then I would not differance the dataset. It would not use the d parameter. This is at least good to have, although in reality this should be used only with a strong reasoning. With seasonal, we can specify if our should at least test for a seasonal model. This does not mean you will get this seasonal model. It only means that it is tested for, which is a wise thing to do. We can tell the computer which information criterion to use in order to make the model selection. So if you would choose AC equals ASC, it would base its decision solely on the AC. And then there is a pair of arguments which you should keep in mind, step wise and approximation. The thing is, they are active by default. If you have very large data sets or seasonal data sets, it is wise for them to be activated, but they produce results which are not as good as they could be. Especially the step wise argument tends to mislead the parameter selection process if your computer and the dataset size allows it. Put these two arguments to false. In many cases you will actually experience a change in model selection and the lower information criterion. And Trace is probably my favorite argument. I will show you why. If you put this one to true, the computer gives you a list of alternative models, including the information criterion. ICC is the default info criterion I uses for model selection. So if we run auto arima on links and trace is true, we get all the tasks that models together with the ICC criterion. And this is just a really helpful list. And this list can now be easily used to compare different models. Easiest way to do that is why the info criterion. You select the one with the lowest value, but alternatively you may want orders to be low or you want a specific parameter to be lower than the other one, and so on. All of this can easily be selected with this list. With a truncate argument, we can actually restrict the number of observations to be used for model selection. Again, depending on the dataset size, this might be a way to get a faster result. Or you may think that only a fraction of the data is representative and you want to selectively use this fraction with this truncate argument. You can do this for xreg. I will actually present an extra video. It is an interesting argument since it allows you to step outside the boundaries of a univariate time series analysis. X rag incorporates an explanatory variable. A predictor. For advanced users. There are the arguments test and seasonal test, which are essentially the tests the model selection is based upon. Allow me. And the drift should usually stay at true since with them you enable drift and non-zero mean. Mean is related to the constant of the model. If you have a non-zero mean, I will provide the constant to the model if it makes sense. Lambda and bias adjust are used in case you are implementing a box-cox transformation into your model. Parallel and numbered cause could be interesting. If you have a large data set and you want exact calculation with step wise, push to false. With parallel, you could allow parallel computation and with numpy cause you can specify how many cores of your processor are assigned to the task. If you have a large seasonal data set, you want exact model selection and you have a quad core like many data scientists are using nowadays, this could be an option to save some time. Although in most of the cases the standard settings should be sufficient. At last. I want to illustrate how Auto Dot Arima is tweaked in order to get a better result. You set stepwise to false and approximation to false. If you disable these two arguments, the results may or may not be a more appropriate fit. In our case, we now get in a AR only model of the fourth order. With this model, we have an even lower ICC than with the previous model. Based on this data, we can now say that Arima 202 is okay, but even better would be R four. So how do we know this? Well, the info criterion is lower with the AR only model. So always look for the lowest info criterion when selecting a model. In fact, these high order AR models down here, even up to AR five, do fairly well on the data set. So isn't this a coincidence at the beginning of this video? While taking a look at the Link's plot, we saw these waves. I said that extracting large numbers of links from the chain pool will have consequences on the future. And that is essentially what Autoregression is about. Therefore, we can base our models solely on this autoregression and do really well with it so we can take the autoregressive model of the fourth order as our model pick. All right. So this is actually an example of an improvement achieved thanks to clever usage of the auto dot remove function and its arguments. Rob Heineman, the package author, actually recommends this setting in case your machine and the dataset allows it. When you are doing Arima modelling, keep this function in mind. It provides ready made solutions. It is surprisingly easy to use and the results can be a nice benchmark to compare other models against.