http://d2l.ai/chapter_multilayer-perceptrons/numerical-stability-and-init.html
Hi :I think the sentence " Assume that we have a simple MLP with one hidden layer and two units. In this case, we could permute the weights W_(1) of the first layer and likewise permute the weights of the output layer to obtain the same function." is misleading. It should be more focused on the permutation of hidden units or input.