It is mentioned that the input dimension is n x d. But if the first hidden layer is treated as the first input how can the dimensions match?
The input dimensions (& hence their respective weight’s dimension) for the 1st layer and the subsequent layers (i.e. 2nd 3rd etc…) are not same. We cannot reuse the code of 1-layer RNN/GRU/LSTM (as in Chapter 8, 9.1, 9.2) by stacking them up to form multi-layer RNNs.
We need to write separate code for 1st layer RNNs and subsequent layer RNNs, exactly due to the difference you mentioned.
And Hence the Exercise 1.