Saturday, March 5, 2016

Which kind of normalization to prefer before PCA (generic solution for any factor analysis)


I have financial assets with totally different volatilities, thus I must standardize them before PCA, otherwise, assets with high variance may be considered as principle components, which is wrong.


At the moment I am trying to decide among following methods :




  • Calculate all in USD, divide each one by volatility or deviation coefficient, substract mean

  • Get logarithm on each price, substract mean

  • Limit time series by [-1:1]


Image below describes how time series look like after transformation, coefficients at the right is the first eigenvector.


Question : as you can see on the image, coefficients for each method are quite different and I would like to get an advice about which standardization method looks more appropriate in this case and does not create biases in calculations?


Purpose : I do not need unit form vectors, thus, I calculate PCA based on covariance matrix and want to have vectors that really represent projection of specific asset to selected principal component.



I think, that if second window shows lowest variance it means that usage of logarithms is the best option, am I right?



enter image description here


enter image description here




No comments:

Post a Comment

technique - How credible is wikipedia?

I understand that this question relates more to wikipedia than it does writing but... If I was going to use wikipedia for a source for a res...