https://d2l.ai/chapter_reinforcement-learning/value-iter.html
1 Like
Typo: the left hand side of the equation (17.2.9) should be pi*(s).
Great articles to clarify some basic ideas behind RL. I think it’s really a good starting point to begin RL learning.
But there are some small errors:
- The expectation over $r(s_0, a_0)$ is also needed for Eq. 17.2.2;
- In Eq. 17.2.9, it should be max rather than arg_max.