statistics - How to make the final Interpretation of PCA?

Friday, January 20, 2017

statistics - How to make the final Interpretation of PCA?

I have question regarding final loading of data back to original variables.

So for example:

I have 10 variable from a,b,c....j using returns for last 300 days i got return matrix of 300 X 10. Further I have normalized returns and calculated covariance matrix of 10 X 10. Now I have calculated eigen values and eigen vectors, So I have vector of 10 X 1 and 10 X 10 corresponding eigen values. Screeplot says that 5 component explain 80% of variation so now there are 5 eigenvectors and corresponding eigenvalues.

Now further how to load them back to original variable and how can i conclude which of the variable from a,b,c.....j explain the maximum variation at time "t"

Answer

To make things really clear, you have an original matrix $X$ of size $300 \times 10$ with all your returns.

Now what you do is that you choose the first $k=5$ eigenvectors (i.e. enough to get 80% of the variation given your data) and you form a vector $U$ of size $10 \times 5$ . Each of the columns of $U$ represents a portfolio of the original dataset, and all of them are orthogonal.

PCA is a dimensionality-reduction method: you could use it to store your data in a matrix $Z$ of size $300 \times 5$ by doing:

$Z = X U$

You can then recover an approximation of $X$ which we can call $\hat{X}$ as follows:

$\hat{X} = Z U^\intercal$

Note that as your 5 eigenvectors only represent 80% of the variation of X, you will not have $X=\hat{X}$ .

In practice for finance application, I don't see why you would want to perform these reduction operations.

In terms of factor analysis, you could sum the absolute value for each row of $U$ ; the vector with the highest score would be a good candidate I think.

Blog

Friday, January 20, 2017

statistics - How to make the final Interpretation of PCA?

No comments:

Post a Comment

technique - How credible is wikipedia?