risk management - Empirical distribution function of overlapping time series data

Monday, February 6, 2017

risk management - Empirical distribution function of overlapping time series data

If we model asset return volatility for periods of more than one (say more than one day) there is the square-root rule which holds true under some assumptions. The situation is more tricky if we look at the empirical distribution function.

To tackle this problem practitioners sometimes use rolling, overlapping data. Treating them as if they were non-overlapping seems wrong to me (it is wrong) - but how wrong and how can the approach be fixed?

With wrong I mean that the distribution of $\sum_{i=1}^{180} r_i$ with $r_i$ sampled uncorrelated will be different from the distribution of this sample of overlapping data.

I heard about he following modelling approach: They take a sample of $1000$ daily observations (daily returns/percentage changes) and then they build rolling $180$ day returns. Finally they look at the empirical distribution function (edf) and empirical quantiles of these rolling/overlapping returns.

Mathematically they have $(r_i)_{i=1}^{1000}$ and then they look at $y_1 = \sum_{i=1}^{180} r_i, \quad y_2 = \sum_{i=2}^{181} r_i, \quad y_3 = \sum_{i=3}^{182} r_i, \cdots$

The sample of the $(y_i)_{i=1}^{820}$ is a set of strongly dependent random variables. What are the properties of its edf? How does it relate to the edf of the sample of $(r_i)_{i=1}^{1000}$ ? How can we relate volatility estimates?

As we speak about asset returns we can assume the $(r_i)_{i=1}^{1000}$ to be serially uncorrelated but not independent. This makes a rigorous treatment difficult.

This is a cross post as the question did not receive enough attention (after several days and a bounty).

What can be said about e.g. variance of yearly data based on overlapping data (e.g. Jan 1st 2015 to Jan 1st 2016, Jan 2nd 2015 to Jan 2nd 2016, Jan 3rd 2015 to Jan 3rd 2016, ... and so forth). The emperical distribution function is maybe too hard. But what about the variance?

Summing up the aim is:

Estimate a quantile of the distribution of the return/loss of the following year (using the overlapping yearly data of the past).

Estimate the variance of the return/loss of the following year (using the overlapping yearly data of the past).

The result could be a set of properties of the estimators of the above quantities when we only have the overlapping data. There could be bias which has to be corrected.

EDIT: As some users point out one way to see that is as bootstrap statistics. I started reading this: Bootstrap Methods for Finance: Review and Analysis and the references therein. The boostrap point of view looks most promising. I have not yet arrived at a full answer.

Answer

The quantile and variance relationship depends on the data you are analyzing. Consider first the case where your data is (statistically) nice. That is, your one-period returns $r_i$ are all Gaussian and i.i.d.

We form sums $y_i = \sum_{j=t-(k+1)}^t r_j$ which are of course themselves gaussian, but as you note are now highly autocorrelated for $k>0$ .

Following Harri and Brosen we see that any OLS fit of $y_i$ will have variance of error terms of $k$ times the variance of an OLS fit to one-period returns. Since the mean estimator can be viewed as a trivial OLS regression, we have that the mean $\bar{y} = k\bar{r}$ and

$\mathrm{Var}\left[y_i-\bar{y}\right] = k \, \mathrm{Var}\left[r_i-\bar{r}\right]$

Because everything is gaussian, the distribution functions follow entirely.

If this were the whole story, we would conclude that there's never any real reason to oversample like this, and leave it at that.

Things get more interesting when we assume there is some underlying trend to the $r_i$ or a pattern to measurement errors in $r_j$ . For example, if some flaws in accounting result in quarterly sales to inventory ratios exhibiting seasonal patterns, then those patterns will be "washed out" in the annual sum. Essentially we will have exploited the error structure to obtain

$\mathrm{Var}\left[y_i-\bar{y}\right] < k \,\mathrm{Var}\left[r_i-\bar{r}\right].$

Edit: You asked about covariance structure of $y$ . For any two observations of $y$ we have (extending Harri and Brorsen)

$\mathrm{Cov}(y_\ell, y_j) = \mathrm{Var}(\{r_i\}) \,(\ell-|\ell - j|)^+$

where $(\cdot)^+$ indicates the function that returns $\max(\cdot, 0)$ .

Blog

Monday, February 6, 2017

risk management - Empirical distribution function of overlapping time series data

No comments:

Post a Comment

technique - How credible is wikipedia?