Wednesday, September 23, 2015

How do I adjust a correlation matrix whose elements are generated from different market regimes?


Say I want to calculate a correlation matrix for 50 stocks using 3-year historical daily data. And there are some stocks that were recently listed for one year.


This is not technically challenging because the correlation function in R can optionally ignore missing data when calculating pair-wise correlation. But honestly it worried me after deep thought: all the correlations for the short listed stock will bias to the regime it live in.



For example, a stock that exists only in last few months will inevitably have higher correlation than the stocks that have full 3-year history. And I think it will cause bias and annoying problems in the following applications using this correlation matrix.


So, my question is: how do I adjust my correlation matrix whose elements are generated from different market regimes? I feel a dilemma. In order to assure that the whole matrix live in the same regime, it seems that either I need to use least possible samples, or I have to throw that stock away. But either way looks a waste to me.


I know the correlations for that stock got to be lower, but how much lower? Is there an approach or formula derived from solid arguments?


I did some research and surprisingly found that all public research on correlation matrix assuming adequateness of data. How can it be possible in practice?! A mystery for me to see this seemingly common problem has not been publicly addressed.



Answer



Quant Guy's list is really impressive! However, I am not sure they will readily solve your specific problem? I think there is one missing piece.


Please note that imputing missing data is a very broad topic. There are many recipes to impute missings but that's for their specific 'assumptions' and purposes. They do not necessarily intend to well address your specific problem: the regime change.


To best address your specific problem, you have to quantitatively define the market regime as a part of your adjusting formula. Otherwise, it wouldn't logically make sense that your model is aware of and able to react to it properly.


In Stambaugh's '97 research (which I think is the most relevant reference Qaunt Guy listed), Stambaugh's formula actually used B = V21*V11^(-1), i.e. Beta, to make adjustment. I have to say that not soon after, the history has taught us how vulnerable the beta is for several times, especially in a rapidly changing market environment (but I guess the application of Beta was still novel and not that fragile in the epoch of '90s?).


Now let's define market regime quantitatively. In common sense, average correlation is a pretty neat regime indicator. Simple and intuitive, easy to employ in a proprietary model (and I feel that's why Ledoit-Wolf's model is that popular :)). But yes, as Branson pointed out in Ian's answer, there is possibility that we will get very undesirable results.



One of the potential solutions is to map the intuitive indicator to proper space/dimension for operations, and then transform it back. This is a very useful technique that is commonly employed in machine learning. Correlation lives in a very constrained space [-1,1] and this greatly restricts what we can do about it. (Please don't think covariance will be less constrained. When you put them together in a matrix, trust me, it will be as constrained as correlation. Correlation is actually easier to work with to see possible problems)


Now, how about mapping correlation to an equally intuitive (at least to me) but less constrained space,



Signal-to-Noise Ratio (SNR) = Correlation^2 / (1 - Correlation^2)


** Correlation = sqrt(snr/(1+snr))



and refine my regime indicator as the median of SNR. (*I rarely use average in financial applications)


I don't know how people feel about SNR, but I feel very comfortable with a background in EE. In communication system, SNR is exactly the regime (environment) indicator that characterizes a channel. I feel a significant analogy here.


The remaining work will be straightforward. I will use the ratio of my regime indicators as a multiplier to adjust young asset's pairwise SNRs against other assets. Then map the final adjustment back to correlations.


You will at least gain the following benefits using this approach:




  • Correlations won't blow up as in your first attempt

  • Original ranking of pairiwise correlation (with short-lived assets) is preserved

  • Much easier to implement. No need to impute missing data.

  • Intuitive (to me), east to understand what's going on in your code.

  • This approach is compatible with many other techniques in Quant Guy's references such as Ledoit-Wolf Shrinkage, RMT, and weighted representative covariance matrices.


Last but not least, this is a collaboratory idea with one of my most brilliant colleagues and close friend, Manish Agarwal.


No comments:

Post a Comment

technique - How credible is wikipedia?

I understand that this question relates more to wikipedia than it does writing but... If I was going to use wikipedia for a source for a res...