StevenJokess
Still suck at GAN to tensorflow…
Anyone helps?
Still suck at GAN to tensorflow…
Anyone helps?
loss = nn.BCEWithLogitsLoss(reduction='sum')
With this loss where reduction is ‘sum’, I think the model does not consider the data size(batch size) in gradient descent.
Isn’t it better to use
loss = nn.BCEWithLogitsLoss(reduction='mean')
with
metric.add(update_D(X, Z, net_D, net_G, loss, trainer_D)* batch_size ,update_G(Z, net_D, net_G, loss, trainer_G)* batch_size, batch_size)
If it is right, Could I commit this change?
Thanks
1 reply@goldpiggy @astonzhang Thanks in advance
If the generator does a perfect job, then 𝐷(𝐱′)≈1D(x′)≈1 so the above loss near 0, which results the gradients are too small to make a good progress for the discriminator. So commonly we minimize the following loss:
…
which is just feed 𝐱′=𝐺(𝐳)x′=G(z) into the discriminator but giving label 𝑦=1y=1.
The sentences above look quite confusing to me… There might be some grammatical errors in there…
Could you please rewrite these so they look coherent and natural?
This sentence also confuses me a lot. Authors, please, correct this. Or at least specify the source of this sentence?
Same to me, I don’t understand why the above loss is near 0 and the gradients become small?
1 replyThe equation states that we need the parameters of the generator that maximize the loss, and the parameters of the discriminator that minimize the loss. However in the loss plot we see that the discriminator loss increases, while the generator loss decreases. Can someone please clarify?
As the model gets trained, the discriminator loss decreases as it is increasingly being fooled by the generator. The generator loss decreases as the the generator outputs are increasingly being predicted as 1 (Loss_G = loss(D(G(x)), ones)), so we have smaller and smaller loss_G
There is an error in the text. For the loss function 20.1.1 to work, D should be the probability that the data is real. And this is what the original article by Goodfellow, et.al. says. So if the discriminator does a good job D(G(z)) should be close to 0, not 1. Then the log will be close to 0 and gradients will be small.