I have question regarding final loading of data back to original variables.
So for example:
I have 10 variable from a,b,c....j using returns for last 300 days i got return matrix of 300 X 10. Further I have normalized returns and calculated covariance matrix of 10 X 10. Now I have calculated eigen values and eigen vectors, So I have vector of 10 X 1 and 10 X 10 corresponding eigen values. Screeplot says that 5 component explain 80% of variation so now there are 5 eigenvectors and corresponding eigenvalues.
Now further how to load them back to original variable and how can i conclude which of the variable from a,b,c.....j explain the maximum variation at time "t"
Answer
To make things really clear, you have an original matrix X of size 300×10 with all your returns.
Now what you do is that you choose the first k=5 eigenvectors (i.e. enough to get 80% of the variation given your data) and you form a vector U of size 10×5. Each of the columns of U represents a portfolio of the original dataset, and all of them are orthogonal.
PCA is a dimensionality-reduction method: you could use it to store your data in a matrix Z of size 300×5 by doing:
Z=XU
You can then recover an approximation of X which we can call ˆX as follows:
ˆX=ZU⊺
Note that as your 5 eigenvectors only represent 80% of the variation of X, you will not have X=\hat{X}.
In practice for finance application, I don't see why you would want to perform these reduction operations.
In terms of factor analysis, you could sum the absolute value for each row of U; the vector with the highest score would be a good candidate I think.
No comments:
Post a Comment