Bidirectional Recurrent Neural Networks

https://d2l.ai/chapter_recurrent-modern/bi-rnn.html

Can you explain : why do we consider summing over all the possible combinations of choices for h1,…,hT , when there is no latent variable in P(xj∣x−j) ?