Friday, January 20, 2017

statistics - How to make the final Interpretation of PCA?


I have question regarding final loading of data back to original variables.


So for example:




I have 10 variable from a,b,c....j using returns for last 300 days i got return matrix of 300 X 10. Further I have normalized returns and calculated covariance matrix of 10 X 10. Now I have calculated eigen values and eigen vectors, So I have vector of 10 X 1 and 10 X 10 corresponding eigen values. Screeplot says that 5 component explain 80% of variation so now there are 5 eigenvectors and corresponding eigenvalues.



Now further how to load them back to original variable and how can i conclude which of the variable from a,b,c.....j explain the maximum variation at time "t"



Answer



To make things really clear, you have an original matrix X of size 300×10 with all your returns.


Now what you do is that you choose the first k=5 eigenvectors (i.e. enough to get 80% of the variation given your data) and you form a vector U of size 10×5. Each of the columns of U represents a portfolio of the original dataset, and all of them are orthogonal.


PCA is a dimensionality-reduction method: you could use it to store your data in a matrix Z of size 300×5 by doing:


Z=XU


You can then recover an approximation of X which we can call ˆX as follows:


ˆX=ZU



Note that as your 5 eigenvectors only represent 80% of the variation of X, you will not have X=\hat{X}.


In practice for finance application, I don't see why you would want to perform these reduction operations.


In terms of factor analysis, you could sum the absolute value for each row of U; the vector with the highest score would be a good candidate I think.


No comments:

Post a Comment

technique - How credible is wikipedia?

I understand that this question relates more to wikipedia than it does writing but... If I was going to use wikipedia for a source for a res...