# Q-learning

1 Like

https://d2l.ai/chapter_reinforcement-learning/qlearning.html

When I click on this URL I get redirected to the homepage, does it mean it is not published yet? Is there a way I can see it anyway?

1 Like

Looking forward to the DQN chapter.

Coming from the MITxâs McroMasters series on Stats and Probability a year ago, I can still understand (mostly) whatâs going on.

Great work. Thank you.

1 Like

For convenience, these 2 functions should make it easy to switch into 8x8. Param âsizeâ expects either â4x4â or â8x8â. These 2 functions exist within torch: i just copied them out and adapted for my own use.

Everything else kept the same: A 4 times increase in lake area resulted in an increase of iterations (to reach something usable), from 256 to about 17000, a 66 times increase. Even then, the bottom left area is clearly not optimal.

``````def frozen_lake_sizeable(seed, size):

"""Defined in :numref:`sec_utils`"""

import gym
env = gym.make('FrozenLake-v1', is_slippery=False, map_name=size)
env.seed(seed)
env.action_space.np_random.seed(seed)
env.action_space.seed(seed)
env_info = {}
env_info['desc'] = env.desc  # 2D array specifying what each grid item means
env_info['num_states'] = env.nS  # Number of observations/states or obs/state dim
env_info['num_actions'] = env.nA  # Number of actions or action dim

# Define indices for (transition probability, nextstate, reward, done) tuple
env_info['trans_prob_idx'] = 0  # Index of transition probability entry
env_info['nextstate_idx'] = 1  # Index of next state entry
env_info['reward_idx'] = 2  # Index of reward entry
env_info['done_idx'] = 3  # Index of done entry
env_info['mdp'] = {}
env_info['env'] = env

for (s, others) in env.P.items():
# others(s) = {a0: [ (p(s'|s,a0), s', reward, done),...], a1:[...], ...}
for (a, pxrds) in others.items():
# pxrds is [(p1,next1,r1,d1),(p2,next2,r2,d2),..].
# e.g. [(0.3, 0, 0, False), (0.3, 0, 0, False), (0.3, 4, 1, False)]
env_info['mdp'][(s,a)] = pxrds
return env_info

def show_Q_function_progress(env_desc, V_all, pi_all, size):

"""Defined in :numref:`sec_utils`"""
# This function visualizes how value and policy changes over time.
# V: [num_iters, num_states]
# pi: [num_iters, num_states]

from matplotlib import pyplot as plt
if size == "4x4":
side = 4
if size == "8x8":
side = 8
# We want to only shows few values
num_iters_all = V_all.shape[0]
num_iters = num_iters_all // 10
vis_indx = np.arange(0, num_iters_all, num_iters).tolist()
vis_indx.append(num_iters_all - 1)
V = np.zeros((len(vis_indx), V_all.shape[1]))
pi = np.zeros((len(vis_indx), V_all.shape[1]))
for c, i in enumerate(vis_indx):
V[c]  = V_all[i]
pi[c] = pi_all[i]
num_iters = V.shape[0]
fig, ax = plt.subplots(figsize=(15, 15))
for k in range(V.shape[0]):
plt.subplot(4, 4, k + 1)
plt.imshow(V[k].reshape(side, side), cmap="bone")
ax = plt.gca()
ax.set_xticks(np.arange(0, side+1)-.5, minor=True)
ax.set_yticks(np.arange(0, side+1)-.5, minor=True)
ax.grid(which="minor", color="w", linestyle='-', linewidth=3)
ax.tick_params(which="minor", bottom=False, left=False)
ax.set_xticks([])
ax.set_yticks([])

# LEFT action: 0, DOWN action: 1
# RIGHT action: 2, UP action: 3
action2dxdy = {0:(-.25, 0),1:(0, .25),
2:(0.25, 0),3:(-.25, 0)}

for y in range(side):
for x in range(side):
action = pi[k].reshape(side, side)[y, x]
dx, dy = action2dxdy[action]
if env_desc[y,x].decode() == 'H':
ax.text(x, y, str(env_desc[y,x].decode()),
ha="center", va="center", color="y",
size=20, fontweight='bold')
elif env_desc[y,x].decode() == 'G':
ax.text(x, y, str(env_desc[y,x].decode()),
ha="center", va="center", color="w",
size=20, fontweight='bold')
else:
ax.text(x, y, str(env_desc[y,x].decode()),
ha="center", va="center", color="g",
size=15, fontweight='bold')

# No arrow for cells with G and H labels
if env_desc[y,x].decode() != 'G' and env_desc[y,x].decode() != 'H':
ax.set_title("Step = "  + str(vis_indx[k] + 1), fontsize=20)
fig.tight_layout()
plt.show()
``````
1 Like

Thank you for the feedback and glad that you found our RL chapters useful.
Sure, DQN will be coming in a few weeks.

Yes, the number of training steps increases as state space gets larger. epsilon and alpha might need to be changed for a larger version of this problem. In general, model-free methods have very high sample complexity.

Weâll definitely make our codes more flexible so it can better visualize larger size grids. Also, thank you for your code snippet.

Rasool

1 Like

Glad to know that DQN is on the way! Thanks for your contribution.
May I know what you will or plan to present in the whole RL chapter for the final version? That will be helpful since I would like to schedule or learn in advance.

there is a bug when running the following code in colab.

# Now set up the environment

env_info = d2l.make_env(âFrozenLake-v1â, seed=seed)

bug is as follow.
AttributeError: âFrozenLakeEnvâ object has no attribute ânSâ

1 Like

the reinforcement learning part is a bit too compactâŚ still wish to see DQN

Excellent content! I just have slight ambiguity in the sign in eq. 17.3.3; shouldnât alpha be preceded with + in second term of the equation? Similarly in 17.3.4.

Yes, the RL part wonât run.

First of all, I have to manually install âgymâ, itâs not part of the environment set up for the book overall.

Secondly, even after doing that, get this error:
AttributeError Traceback (most recent call last)
Cell In[3], line 13
10 np.random.seed(seed)
12 # Now set up the environment
â> 13 env_info = d2l.make_env(âFrozenLake-v1â, seed=seed)

File /usr/local/lib/python3.10/site-packages/d2l/torch.py:2852, in make_env(name, seed)
2848 # Input parameters:
2849 # name: specifies a gym environment.
2850 # For Value iteration, only FrozenLake-v1 is supported.
2851 if name == âFrozenLake-v1â:
â 2852 return frozen_lake(seed)
2854 else:
2855 raise ValueError(â%s env is not supported in this Notebookâ)

File /usr/local/lib/python3.10/site-packages/d2l/torch.py:2821, in frozen_lake(seed)
2818 import gym
2820 env = gym.make(âFrozenLake-v1â, is_slippery=False)
â 2821 env.seed(seed)
2822 env.action_space.np_random.seed(seed)
2823 env.action_space.seed(seed)

File /usr/local/lib/python3.10/site-packages/gym/core.py:241, in Wrapper.getattr(self, name)
239 if name.startswith(â_â):
240 raise AttributeError(f"accessing private attribute â{name}â is prohibited")
â 241 return getattr(self.env, name)

File /usr/local/lib/python3.10/site-packages/gym/core.py:241, in Wrapper.getattr(self, name)
239 if name.startswith(â_â):
240 raise AttributeError(f"accessing private attribute â{name}â is prohibited")
â 241 return getattr(self.env, name)

File /usr/local/lib/python3.10/site-packages/gym/core.py:241, in Wrapper.getattr(self, name)
239 if name.startswith(â_â):
240 raise AttributeError(f"accessing private attribute â{name}â is prohibited")
â 241 return getattr(self.env, name)

AttributeError: âFrozenLakeEnvâ object has no attribute âseedâ