Value iteration

https://d2l.ai/chapter_reinforcement-learning/value-iter.html

1 Like

Typo: the left hand side of the equation (17.2.9) should be pi*(s).

Great articles to clarify some basic ideas behind RL. I think it’s really a good starting point to begin RL learning.
But there are some small errors:

  1. The expectation over $r(s_0, a_0)$ is also needed for Eq. 17.2.2;
  2. In Eq. 17.2.9, it should be max rather than arg_max.