In this paper on p315:
http://www.ssc.upenn.edu/~fdiebold/papers/paper55/DRAfinal.pdf
They explain that they use Levenberg Marquardt (LM) (along with BHHH) to maximize the likelihood. However as I understand it LM can only be used to solve Least Squares (LS) problems? Are the LS and MLE solutions the same for this type of problem?
I know that when the errors are normal like in OLS then the solutions are the same. Here the processes being estimated are AR(1) so the errors are normal even though the overall process is not. Can I still treat the MLE and LS solution interchangeably in this situation?
In which case can I just apply LM to solve the the LS solution safe in the knowledge that the optimal LS parameters are also the ones that will solve the MLE problem?
Or does the LM algorithm have to be changed in some way so that it can be applied directly to the MLE estimation? If so how?
Answer
An AR(1), once the time series and lags are aligned and everything is set-up, is in fact a standard regression problem. Let's look, for simplicity sake, at a "standard" regression problem. I will try to draw some conclusions from there.
Let's say we want to run a linear regression where we want to approximate $y$ with $$h_(x) = \sum_0^n \theta_i x_i = \theta^T x $$
OLS is a special case of a broader family of algorithms where the chosen cost function is:
$$ J(\theta) = \frac{1}{2} \sum_{i=1}^{m} (h_\theta(x_i) - y_i)^2 $$, where $\theta$ are the weights, $y$ is the regressand (features), $x$ the regressor (target variable) and $i$ denotes the $i$th sample element.
We could give a probabilistic interpretation to the above "mechanical" machine learning model. We have to write:
$$ y_i = \theta^T x_i + \epsilon_i$$
If we assume that $\epsilon_i \sim N(0, \sigma^2)$, we know that $$ p(\epsilon_i) = \frac{1}{\sqrt{2 \pi} \sigma} \exp \bigg(-\frac{\epsilon_i^2}{2 \sigma^2} \bigg)$$
We can then write the conditional probability
$$ p(y_i | x_i; \theta) = \frac{1}{\sqrt{2 \pi} \sigma} \exp \bigg(-\frac{(y_i - \theta^T x_i)^2}{2 \sigma^2} \bigg)$$
The likelihood function is then:
$$ L(\theta) = \prod_{i=1}^m p(y_i | x_i; \theta)$$
$$ log L(\theta) = \prod_{i=1}^m \frac{1}{\sqrt{2 \pi} \sigma} \exp \bigg(-\frac{(y_i - \theta^T x_i)^2}{2 \sigma^2} \bigg) $$
which after some reshuffling becomes:
$$ log L(\theta) = m \log{\frac{1}{\sqrt{2 \pi} \sigma}} - \frac{1}{\sigma^2}\frac{1}{2}\sum_{i=1}^{m}(y_i - \theta^T x_i)^2$$
which is in fact exactly the same optimization problem: you can trim the above and leave just the $J(\theta)$ function from the OLS problem.
Q: I know that when the errors are normal like in OLS then the solutions are the same. Here the processes being estimated are AR(1) so the errors are normal even though the overall process is not. Can I still treat the MLE and LS solution interchangeably in this situation?
In the derivation above we see that the MLE does not consider $\theta$ as a random variable, as is it otherwise the case in other methodology (Bayesian MAP) Therefore, I believe that you could treat MLE and LS interchangeably knowing that errors are normal. For proper inference: BLUE.
Q: does the LM algorithm have to be changed in some way so that it can be applied directly to the MLE estimation? If so how?
I believe this is the true as the optimization problem is the same.
No comments:
Post a Comment