Sunday, October 4, 2015

Time Series Regression with Overlapping Data


I am seeing a regression model which is regressing Year-on-Year stock index returns on lagged (12 months) Year-on-Year returns of the same stock index, credit spread (difference between monthly mean of risk-free bonds and corporate bond yields), YoY inflation rate and YoY index of industrial production.


It looks thus (though you'd substitute the data specific to India in this case):


SP500YOY(T) = a + b1*SP500YOY(T-12) + b2*CREDITSPREAD(T) 
+ b4*INDUSTRIALPRODUCTION(T+2) + b3*INFLATION(T+2)
+ b4*INFLATIONASYMM(T+2)


SP500YOY is the year-on-year returns for S&P500 index To compute this, monthly average of S&P500 values is computed and then converted to year-on-year returns for each month (i.e. Jan'10-Jan'11, Feb'10-Feb'11, Mar'10-Mar'11, . . ). On the explanatory variables side, a 12-month lagged value of the SP500YOY is used along with the CREDITSPREAD at time T and INFLATION and INDUSTRIALPRODUCTION two period AHEAD. The INFLATIONASYMM is a dummy for whether the Inflation is above a threshold value of 5.0%. The index in the parenthesis shows the time index for each variable.


This is estimated by standard OLS linear regression. To use this model for forecasting the 1,2 and 3-months ahead YOY returns of S&P500, one has to generate 3,4 and 5-month ahead forecasts for Inflation and the Index of Industrial Production. These forecasts are done after fitting an ARIMA model to each of the two individually. The CreditSpread forecasts for 1,2 and 3 month ahead are just thrown in as mental estimates.


I'd like to know whether this OLS linear regression is correct/incorrect, efficient/inefficient or generally valid statistical practice.


The first problem I see is that of using overlapping data. i.e. the daily values of the stock index are averaged each month, and then used to compute yearly returns which are rolled over monthly. This should make the error term autocorrelated. I would think that one would have to use some 'correction' on the lines of one of the following:



  • White's heteroscedasticity consistent covariance estimator

  • Newey & West heteroscedasticity and autocorrelation consistent (HAC) estimator

  • heteroscedasticity-consistent version of Hansen & Hodrick


Does it really make sense to apply standard OLS linear regression (without any corrections) to such overlapping data, and more so, use 3-period ahead ARIMA forecasts for explanatory variables to use in the original OLS linear regression for forecasting SP500YOY? I have not seen such a form before, and hence cannot really judge it, without the exception of correcting for the use of overlapping observations.




Answer



This question was ultimately answered on Cross Validated



Here are a couple of articles that deal with this subject:


Britten-Jones and Neuberger, Improved inference and estimation in regression with overlapping observations


Harri & Brorsen, The Overlapping Data Problem



No comments:

Post a Comment

technique - How credible is wikipedia?

I understand that this question relates more to wikipedia than it does writing but... If I was going to use wikipedia for a source for a res...