https://d2l.ai/chapter_reinforcement-learning/value-iter.html
Typo: the left hand side of the equation (17.2.9) should be pi*(s).
Great articles to clarify some basic ideas behind RL. I think itâs really a good starting point to begin RL learning.
But there are some small errors:
- The expectation over $r(s_0, a_0)$ is also needed for Eq. 17.2.2;
- In Eq. 17.2.9, it should be max rather than arg_max.
Seems like a typo in Equation 17.2.2
Shouldnât Expectation over a_0, also include the first term?
do these code blocks still work? or do you need to follow the order of the book for the code to work? i havenât been able to get the first chunk running for this and iâd really like to try out these exercises for reinforcement learning
There are two problems with this article here.
The first is equation 17.2.9, where âargmaxâ should be corrected to âmaxâ.
The second is that in the code implementation of the value_iteration function, as it says in its own comments, âCalculate \sum_{sâ} p(sâ\mid s,a) [r + \gamma v_k(sâ)]â, which should be fixed to âCalculate [r + \sum_{sâ} p(sâ\mid s,a) \gamma v_k(sâ)]â, based on the previous equation 17.2.13.
Q[k,s,a] += pr * (reward + gamma * V[k - 1, nextstate])
should be fixed to:
Q[k,s,a] += (reward + pr * gamma * V[k - 1, nextstate])