I have question regarding final loading of data back to original variables.
So for example:
I have 10 variable from a,b,c....j using returns for last 300 days i got return matrix of 300 X 10. Further I have normalized returns and calculated covariance matrix of 10 X 10. Now I have calculated eigen values and eigen vectors, So I have vector of 10 X 1 and 10 X 10 corresponding eigen values. Screeplot says that 5 component explain 80% of variation so now there are 5 eigenvectors and corresponding eigenvalues.
Now further how to load them back to original variable and how can i conclude which of the variable from a,b,c.....j explain the maximum variation at time "t"
Answer
To make things really clear, you have an original matrix $X$ of size $300 \times 10$ with all your returns.
Now what you do is that you choose the first $k=5$ eigenvectors (i.e. enough to get 80% of the variation given your data) and you form a vector $U$ of size $10 \times 5$. Each of the columns of $U$ represents a portfolio of the original dataset, and all of them are orthogonal.
PCA is a dimensionality-reduction method: you could use it to store your data in a matrix $Z$ of size $300 \times 5$ by doing:
$$Z = X U$$
You can then recover an approximation of $X$ which we can call $\hat{X}$ as follows:
$$ \hat{X} = Z U^\intercal $$
Note that as your 5 eigenvectors only represent 80% of the variation of X, you will not have $X=\hat{X}$.
In practice for finance application, I don't see why you would want to perform these reduction operations.
In terms of factor analysis, you could sum the absolute value for each row of $U$; the vector with the highest score would be a good candidate I think.
No comments:
Post a Comment