With this loss where reduction is ‘sum’, I think the model does not consider the data size(batch size) in gradient descent.
Isn’t it better to use loss = nn.BCEWithLogitsLoss(reduction='mean') with metric.add(update_D(X, Z, net_D, net_G, loss, trainer_D)* batch_size ,update_G(Z, net_D, net_G, loss, trainer_G)* batch_size, batch_size)
If the generator does a perfect job, then 𝐷(𝐱′)≈1D(x′)≈1 so the above loss near 0, which results the gradients are too small to make a good progress for the discriminator. So commonly we minimize the following loss: … which is just feed 𝐱′=𝐺(𝐳)x′=G(z) into the discriminator but giving label 𝑦=1y=1.
The sentences above look quite confusing to me… There might be some grammatical errors in there…
Could you please rewrite these so they look coherent and natural?