statistics - How to interpolate gaps in a time series using closely related time series?

Sunday, December 4, 2016

statistics - How to interpolate gaps in a time series using closely related time series?

I am trying to construct a daily time series of prices and returns for some large universe of securities. However, all I have available are a monthly time series of the prices/returns (as well as other characteristics) of the individual securities, a daily time series of a market-value-weighted index of all securities, and weekly time series of various sub-indices.

The constructed time series will ultimately be used to estimate parameters of a more general model, such as the probability of a security's issuer taking some action (e.g. refinancing their debt) as a function of the security's price. Therefore I feel it is not important to maintain causality. The issuer presumably knows the true price when taking the action, even though I do not, and I need to construct a best guess as to what the price was given everything I know today.

Note: it is not possible to obtain higher frequency data at the individual security level, either because the securities themselves do not trade that often, or because (AFAIK) nobody collects the data. The goal is to interpolate a reasonable-looking set of daily prices and returns based on all available information. Any advice on how to carry out this estimation would be appreciated.

I have some of my own ideas, which I may share after a while, but right now I'm still in the exploratory phase and I'm looking for some additional inspiration.

Just to make it clear what I mean by way of example, suppose I wanted to find the daily prices of all 1500 stocks in the S&P 1500, but all I had were monthly prices for the stocks, weekly prices for the 10 GICS sector indices and for the large cap 500, mid cap 400, and small cap 600, and daily prices for the S&P 1500 as a whole.

The purpose, in that example, would be to fit a model of announcements of share buybacks and secondary offerings based on interpolated valuation metrics.

UPDATE: One answer suggested applying the Expectation-Maximization algorithm. As far as I can tell, EM is not applicable to this problem. Applying EM to price, one gets a sawtooth-pattern where the filled values are on a different plane from the known values. I can't figure out a way to apply EM to returns, since I'm not missing any monthly returns, and I'm missing all daily/weekly returns for the individual securities.

Answer

You must apply the E-M algorithm to an invariant (time-homogenous i.i.d. variable) such as log-returns -- not prices.

The key to the E-M is is the simplifying assumption that the invariant (namely the distribution of returns) as well as the distribution of missings are i.i.d. Prices do not obey this property. The trick of assuming an i.i.d. invariant and then proceeding to impute originates with Little and Rubin (1987).

In your case, clearly the distribution of missings is not random however. The literature refers to this case as "Not Missing at Random". You can do some tests or rely on theory to determine whether assuming the distributions are "Missing at Random" or "Missing Completely at Random" (MCAR) is valid.

The bibliography of the paper Multiple Imputation for Missing Data (2003) cites the key papers in this area.

EDIT:

I read your update and noticed that you have only monthly returns not daily/weekly.

Here's one approach where you can still make the E-M method work.

At the monthly level, you have the security returns alongside the returns for the various indices and sub-indices. Measure the monthly covariance of the log returns and mean monthly log-returns of the various assets. Now project the monthly covariance to a daily level (simply divide both parameters by # of trading days in a month). You have daily returns for the S&P 1500. Now fill in the missing entries by replacing the missing values with their expected value conditional on the observations of daily prices for the S&P 1500 (using the E-M algorithm). The final step is to convert the log-returns back to arithmetic-returns.

Note that you are assuming that the correlation structure is stable over the estimation period. Your output would consist of normally-distributed well-behaved returns. These are the usual defects of the E-M approach.

Blog

Sunday, December 4, 2016

statistics - How to interpolate gaps in a time series using closely related time series?

No comments:

Post a Comment

technique - How credible is wikipedia?