Friday, July 10, 2015

equities - How can you determine the correct significance of the Shiller P/E regression?


The "Shiller P/E regression" refers to the regression of real stock market returns over the next 20 years on the Shiller P/E. When I did this OLS regression myself (based on the data from Prof. Shiller), I got an R-squared of c. 67% and a p value for the regression of essentially zero (corresponding F stat of over 1600).


However, are the aforementioned OLS stats (R-squared, p value) the right ones to think about the goodness of fit and the significance of the regression? I am worried about the issue of overlapping time periods in the sample; i.e. the first item in the sample contains annualized real returns from Jan 1928 to Jan 1948, then the second item contains annualized real returns from Feb 1928 to Feb 1948 - thus the two items in the sample overlap for a whole 19 years and 11 months. Doesn't this somehow lower the "information content" in my sample compared to a sample that were to only contain non-overlapping time periods?


Note that this is an issue mentioned by Prof. Shiller himself in his book Irrational Exuberance (I have the 2nd edition from 2005 in front of me, p. 187):



The relation between price-earnings ratios and subsequent returns appears to be moderately strong, though there are questions about its statistical significance, since there are fewer than twelve nonoverlapping ten-year intervals in the 155 years' worth of data.




In the corresponding footnote, he further writes (p. 261 in my book):



However, the actual academic literature has still not resolved the question of statistical significance. There are unresolved statistical complexities, notably those due to the problem of (near) unit roots in the ratios and the dependency of both independent and dependent variables on price. There are other statistical issues too: a tendency toward rare big outlier observations, issues of the relevance of asymptotic distribution theory in small samples, questions about regime change, and measurement issues for the underlying data, as well as difficulty interpreting complex statistical evidence that has been selectively presented by a researcher who may have a preconceived bias.



He then presents a series of academic papers that address some of these issues and that arrive at differing conclusions. However, most of the papers cited in my edition (2005) are by now a bit dated and hence I am wondering whether a consensus has by now evolved in the academic literature.


(I am aware of this question that deals with overlapping time periods but I was wondering whether any progress has been made specifically on the topic of the "Shiller P/E regression".)



Answer




Let $r_{t \rightarrow t+k}$ be the log return from time $t$ to $t+k$. Imagine you're running a regression forecasting $k$ year returns using yearly data:



$$ r_{t \rightarrow t + k} = a + b x_t + \epsilon_{t+k}$$ Your intuition is correct that there's a problem with conventional standard errors.




  • For reference, non-overlapping returns $r_{t \rightarrow t+1}$ generally can be assumed to have zero autocorrelation and there's no issue.




  • But with overlapping $k$ period returns, the error terms $\epsilon_{t}, \ldots, \epsilon_{t+k-1}$ will be autocorrelated. (Two adjacent $k$ period returns will have $k-1$ overlapping periods.)




Hansen Hodrick (1980) standard errors



A sensible starting point is to compute Hansen Hodrick (1980) standard errors with $k-1$ overlapping periods. Some MATLAB code below:


b  = X \ y;           %solve b = (x'x)^-1 (x' y)
resid = y - X * b;
Sxx = (X'*X/n);
residrep = resid * ones(1,cols2);
V = (X .* residrep)' * (X .* residrep) / n;
for i=1:k-1,
V_lag = (X(1:end-i,:) .* residrep(1:end-i,:))' * (X(i+1:end,:) .* residrep(i+1:end,:)) / n;
V = V + V_lag + V_lag';
end;

Sxx_inv = inv(Sxx);
bcov = Sxx_inv * V * Sxx_inv / n;

John Cochrane's book Asset Pricing also discusses how to do this on p. 209.


Other approaches?


Glancing through the literature, there's some debate as to the small sample properties of the Hansen Hodrick (1980) correction and several others have been proposed. The answer you linked to on Cross-Validated provides some references on the subject.


References


Ang, Andrew and Geert Bekaert, 2007, "Stock Return Predictability: Is It There?" Review of Financial Studies


Cochrane, John, 2005, Asset Pricing (revised) p. 209


Cochrane, John, 2011, "Discount Rates," Journal of Finance



Hansen, Lars P. and Robert J. Hodrick, 1980, "Forward Exchange Rates as Optimal Predictors of Future Spot Rates: An Econometric Analysis," Journal of Political Economy


No comments:

Post a Comment

technique - How credible is wikipedia?

I understand that this question relates more to wikipedia than it does writing but... If I was going to use wikipedia for a source for a res...