Friday, January 20, 2017

statistics - How to make the final Interpretation of PCA?


I have question regarding final loading of data back to original variables.


So for example:




I have 10 variable from a,b,c....j using returns for last 300 days i got return matrix of 300 X 10. Further I have normalized returns and calculated covariance matrix of 10 X 10. Now I have calculated eigen values and eigen vectors, So I have vector of 10 X 1 and 10 X 10 corresponding eigen values. Screeplot says that 5 component explain 80% of variation so now there are 5 eigenvectors and corresponding eigenvalues.



Now further how to load them back to original variable and how can i conclude which of the variable from a,b,c.....j explain the maximum variation at time "t"



Answer



To make things really clear, you have an original matrix $X$ of size $300 \times 10$ with all your returns.


Now what you do is that you choose the first $k=5$ eigenvectors (i.e. enough to get 80% of the variation given your data) and you form a vector $U$ of size $10 \times 5$. Each of the columns of $U$ represents a portfolio of the original dataset, and all of them are orthogonal.


PCA is a dimensionality-reduction method: you could use it to store your data in a matrix $Z$ of size $300 \times 5$ by doing:


$$Z = X U$$


You can then recover an approximation of $X$ which we can call $\hat{X}$ as follows:


$$ \hat{X} = Z U^\intercal $$



Note that as your 5 eigenvectors only represent 80% of the variation of X, you will not have $X=\hat{X}$.


In practice for finance application, I don't see why you would want to perform these reduction operations.


In terms of factor analysis, you could sum the absolute value for each row of $U$; the vector with the highest score would be a good candidate I think.


No comments:

Post a Comment

technique - How credible is wikipedia?

I understand that this question relates more to wikipedia than it does writing but... If I was going to use wikipedia for a source for a res...