Before the dynamic programming lectures

To get the mindset: 

Imagine that from a stock of (positive) size x, you get a proportional yield bx which can be reinvested for higher stock next period, or consumed now. b>0 is a constant.

You reinvest a fraction u (in the unit interval [0,1]) of bx, and consume (1-u)bx at log utility. There is capital depreciation at constant rate µ so we assume that this will give you a stock tomorrow of g(x,u) = (1-µ)x + bux = (1-µ+bu)x. There is no discounting.

To get a solution, we shall assume b>1-µ. (And, µ in [0,1].)

 

Starting at the case where there is no future looks over-simplistic, but still:

0. No more future

At the end of the horizon, there is no use for future stock. You reinvest nothing, because that maximizes utility ln((1-u)bx) = ln(bx)+ ln(1-u). Maximized by u = 0.

Often, one would not even formulate this "optimization" step: the model would say that at this stage, there is no reinvestment. 

Anyway: the value is ln b + ln where x is whatever stock you might have at this stage.

 

1. Time ends tomorrow. 

Choosing u today yields ln((1-u)bx) today. But tomorrow, you get a payoff from tomorrow's stock g = (1-µ+bu)x: Namely, ln b + ln g.

Inserting, you get to maximize ln((1-u)bx) + ln b + ln((1-µ+bu)x).
In this particular case, because logs behave so nice, we get 2 ln  + 2 ln b +  ln((1-u)(1-µ+bu)) to maximize. Already here, we see that the maximizer will not depend on x, so we will end up with 2 ln  + some constant. (The maximizer u* equals (b+µ-1)/2b (interior max, from the assumptions made). Insert for this, and you get the constant determined.)

 

What we just did.

We maximized to get what? Today's value = today's direct utility + tomorrow's value.
(If there were discounting: "present value of tomorrow's value".) 

Note, "tomorrow's value" is a function of tomorrow's state, which depends on today's state and today's choice: g(x,u).
(We could have had time-dependence too.)

So if we let f be today's running utility and V be tomorrow's value, then we have today's value  \(= v(x) = \displaystyle\max_{u\in[0,1]} \Big\{f(x,u)+V(g(x,u))\Big\}.\)

 

General principle

Value depends on the horizon. Call the time of the "0" case T. So the value ln b + ln x should be indexed with time. It is not uncommon to use the letter "J" for value (why not? It isn't that much used for other things?) - so we index it by time and write JxT() = ln b + ln x.

Recursively, we then have \({J_{t-1}(x)= \displaystyle\max_{u\in[0,1]} \Big\{f(t,x,u)+J_t(g(t,x,u))\Big\}}\)
(Here we have allowed both running utility f and the dynamics g to depend on time.)


In words: To get the optimal value today, optimize the sum of 

  • today's direct utility, and
  • value from tomorrow.

Note: "value" from tomorrow assumes that we behave optimally from tomorrow on. A half-assed attempt at implementation here in Norwegian: https://www.youtube.com/watch?v=BERRCrSBNdk - with an English translation at http://www.jakobsande.no/?info=12&dikt=822 

 

Exercise: 

With time t left, call the value JT-t(x). For the above problem: prove by induction that this value is of the form CT-t + AT-t ln x. (understood: where the A and C do not depend on x).

To note on the language: phrases like "T period model" could lead to a bit confusion. Is a static model of zero periods or one? I intended "With time t left" to mean that the "0" case above is t = 0, i.e. JT-t JT-0

 

Published Mar. 18, 2019 11:51 AM - Last modified Mar. 18, 2019 11:51 AM