I'm trying to study the Merton Model for portfolio optimization and the document doesn't explain a quite important step : if $$V(t,x)=\sup\{E[U(X_T(\phi))~|~X_t=x]~~ |~~\phi~~\text{an admissible trading strategy}\}$$ is the value function then, "under some regularity assumptions", it will satisfy the Hamilton-Jacobi-Bellman equation.
What are those regularity assumptions ? How can we prove them ?
Answer
This is an optimal control problem.
Consider a self-financing strategy $\pi := (\pi_s)_{s\in[t,T]}$ over the horizon $[t,T]$ consisting in, over each infinitesimal period of time $[t,t+dt[$, investing a fraction $\pi_t$ of the current wealth in a risky asset $S_t$ and placing the remaining part in the risk free asset $B_t$. Given the following dynamics $$ dS_t = S_t(\mu_t dt + \sigma_t dW_t)$$ $$ dB_t = B_t(r_t dt) $$ starting from an initial wealth $x$ is, the wealth at time $t$ of an investor following the strategy $\pi$ will be $$ X_t^{\pi,x} = \frac{ \pi_t X_t^{\pi,x} }{ S_t } S_t + \frac{ (1-\pi_t) X_t^{\pi,x} }{B_t} B_t $$ and its evolution will be governed by the following SDE $$ dX_t^{\pi,x} = X^{\pi,x} \left[ (r_t + \pi_t(\mu_t-r_t))dt + \pi_t \sigma_t dW_t \right]$$
Consider the value function $$ V(t,x;(\pi_s)_{s\in[t,T]}) = \Bbb{E}_t \left[ U(X_T^{\pi,x}) \right] $$
The optimal control $\pi_t^*$ is the stochastic process such that $$ (\pi^*_s)_{s \in [t,T]} = \text{argsup}_{(\pi_s)_{s \in [t,T]}} V(t,x;(\pi_s)_{s \in [t,T]}) $$
while the optimal cost is $$ V(t,x) = \Bbb{E}_t \left[ U(X_T^{\pi^*,x}) \right] $$
The optimal cost function solves the Hamilton-Jacobi-Bellman equations.
The proof can be obtained by viewing the control problem as a Dynamic Programming Problem and relying on Bellman's principle of optimality (see $(1)$ below).
As @noob2 mentions, at some point the Itô differential of the optimal cost $V(t,x)$ appears. Therefore regularity conditions are the usual conditions for Itô integration, both for $X_t$ and $V(t,x)$.
Some intuition \begin{align} V(t,x) &= \Bbb{E}_t \left[ U(X_T^{\pi^*,x}) \right] \\ &= \Bbb{E}_t \left[ \Bbb{E}_{t+dt} \left[ U\left(X_T^{\pi^*,x+dX_t(\pi^*_t)}\right) \right] \right] \\ &= \Bbb{E}_t \left[ V(t+dt, x+dX_t(\pi^*_t)) \right] \tag{1}\\ &= \sup_{\pi_t} \Bbb{E}_t \left[ V(t+dt, x+dX_t(\pi_t)) \right]\\ &= \sup_{\pi_t} \Bbb{E}_t \left[ V(t,x) + \frac{dV(t,x)}{dt} dt + \frac{dV(t,x)}{dx}dX_t + \frac{d^2V(t,x)}{dx^2}d\langle X \rangle_t \right] \\ &= V(t,x) + \frac{dV(t,x)}{dt} dt + \sup_{\pi_t} \left( \frac{dV(t,x)}{dx} x (r_t + \pi_t(\mu_t-r_t)) + \frac{1}{2} \frac{d^2V(t,x)}{dx^2} x^2 \pi_t^2 \sigma_t^2 \right) dt \end{align} hence finally $$ \frac{dV(t,x)}{dt} + \sup_{\pi_t} \left( \frac{dV(t,x)}{dx} x (r_t + \pi_t(\mu_t-r_t)) + \frac{1}{2} \frac{d^2V(t,x)}{dx^2} x^2 \pi_t^2 \sigma_t^2 \right) = 0 $$
[edit]
The DPP point of view consists in viewing the optimal control $(\pi^*_s)_{s \in [t,T]}$ as the "union" of what you choose to do over $[t,t+dt[$ and what you do over $[t+dt,T[$. Informally: $$ (\pi^*_s)_{s \in [t,T]} = \pi^*_t \cup (\pi^*_s)_{s \in [t+dt,T]} $$
At this point, Bellman's optimality principle tells you that the restriction of the optimal control $(\pi^*_s)_{s \in [t+dt,T]}$ is itself the optimal policy over the horizon $[t+dt,T[$. This is why in $(1)$ you can write that $$ \Bbb{E}_{t+dt} \left[ U\left(X_T^{\pi^*,x+dX_t(\pi^*_t)}\right) \right] = V(t+dt,x+dX_t(\pi^*_t)) $$ with $V$ the optimal cost (and not simply the value function).
No comments:
Post a Comment