In exercise 2 of 4.6.7, I increase the epochs to 100 with dropout1 and dropout2 (0.2/0.5), run several times. In each time, always got a result with train/test acc having a dropping part during training. In my knowledge, if increasing the epochs with other appropriate parameters, the result should be better, but not worse. See the below results:



Is there any theory to explain the result? ( I saw there was a student having the same problem using pytorch)