Value iteration

rasoolfa · December 3, 2022, 7:42am

https://d2l.ai/chapter_reinforcement-learning/value-iter.html

Ritsuki_YAMADA · April 9, 2023, 6:53pm

Typo: the left hand side of the equation (17.2.9) should be pi*(s).

leo_leng · October 29, 2023, 1:15pm

Great articles to clarify some basic ideas behind RL. I think it’s really a good starting point to begin RL learning.
But there are some small errors:

The expectation over $r(s_0, a_0)$ is also needed for Eq. 17.2.2;
In Eq. 17.2.9, it should be max rather than arg_max.

Ashutosh_Nirala · April 25, 2024, 6:07pm

Seems like a typo in Equation 17.2.2

Shouldn’t Expectation over a_0, also include the first term?

eTimber_lan · July 31, 2024, 9:55am

do these code blocks still work? or do you need to follow the order of the book for the code to work? i haven’t been able to get the first chunk running for this and i’d really like to try out these exercises for reinforcement learning

cddc · September 29, 2024, 6:03am

There are two problems with this article here.
The first is equation 17.2.9, where ‘argmax’ should be corrected to ‘max’.
The second is that in the code implementation of the value_iteration function, as it says in its own comments, “Calculate \sum_{s‘} p(s’\mid s,a) [r + \gamma v_k(s’)]”, which should be fixed to “Calculate [r + \sum_{s’} p(s’\mid s,a) \gamma v_k(s’)]”, based on the previous equation 17.2.13.

Q[k,s,a] += pr * (reward + gamma * V[k - 1, nextstate])

should be fixed to:

Q[k,s,a] += (reward + pr * gamma * V[k - 1, nextstate])