A common way to select orders parameters (ex: to choose the number of AR terms to be included in the model ) in time series modelling is to rely on some Information Criteria (AIC, BIC, Hannan Quinn..) to measure the relative quality of the model : let’s call it Rule A.
Then in a second time, robustness tests are performed ( Ljung box test , Engle test ..).
However the methodology is not clear to me when I need to choose a model for a serie which has both autocorrelations in the mean and variance process :
I noticed that the model selected (by using rule A) is not always the same if :
- I use a “two steps method ” : First , I select orders parameters of the mean process using rule A , secondly, keeping the parameters obtained in the first step, I use rule A again to select parameters in the variance process.
Example : I fit all ARMA(p,q) to the series with (p,q)=0:2 and select the most parsimonious one. Let’s say the best model is p=1 and q=2. Second step : if fit all ARMA(1,2)-GARCH(s,t) models to the serie with (s,t)=0:2 and I select the "best" s,t parameters using rule A again. If we let p:q to be in the range 0:4 and s,t in the range 0:2 they are $5^2 + 3^2$ models to be estimated .
- OR a “direct way” modelling: I fit directly the full ARMA(p,q)-GARCH (s,t) to the time serie and select the best model (p,q,s,t) using rule A again. However in this case the number of combinations (number of models to be fitted) can be very high :if we let p:q to be in the range 0:4 and s,t in the range 0:2 they are $5^2 \times 3^2$ candidat models (it takes time and CPU..) .
Obviously the second method will evaluate the model selected by the two steps method and it may gives the strongest significant results. I said “may” because it is possible than the model selected by the direct method do not pass the misspecification part ..
My question is : How can I deal with this cost/efficiency problem ? How should I proceed ?
Answer
I will try to give a simple technique for Identifying $ARIMA(p,d,q)$ orders for a time series. It's an empirical technique, but the results are very closed to techniques based on $AIC$ or $BIC$ criterions.
-Indentifying the Integration order $d$ :
It's the first parameter to determine, indeed the ARMA models are based on the assumption that your time series $\{x_t\}$ is stationary. So, you should start by testing the stationarity of $x_t$ using Dickey–Fuller test for example (the is many other tests). If it's stationary then $d=0$, otherwise try a first integration $y_t = \Delta x_t = (x_{t+1}-x_t)$ and test for stationarity (generaly a first integration is sufficient), hence $d=1$, other whise try a first integration on $y_t$, thus $d=2$ and so on.
Let's assume $d=0$ (so that your x_t is already stationary)
-Indentifying the AutoRegressive (AR) order $p$ :
To determine this order, plot the Partial AutoCorrelation Function (PACF) of $x_t$, then $p$ will be the maximum lag at which the PACF is significant.
-Indentifying the Moving Average (MA) order $q$ :
Plot the AutoCorrelation Function (ACF) of $x_t$, and set $q$ to be the maximum lag at wich the ACF is significant.
Thus you get your empirical model $ARIMA(p,d,q)$.
If you are using R, you can to try to fit a model to your series by using auto.arima
function from forecast
package, and you will notice that the $AIC$ and $BIC$ criterions of this model are very closed the those of the automatic fitted model.
The techniques I explained above are inspired from the book Analysis of Financial Time Series ($3^{rd}$ Edition, by RUEY S. TSAY)
From my opinion, it's more interesting to do it this way, because you understand the relations between your parameters($\theta_i, \phi_j$ of the ARMA) and the ACF, PACF values, but also the economical justifications(how much laged days...).
No comments:
Post a Comment