Generative Adversarial Networks

https://d2l.ai/chapter_generative-adversarial-networks/gan.html

==
The PyTorch adaptation of this section was initially contributed by @StevenJokess, as reviewed and revised by @anirudh. The PR may not show up on Git due to the suspended account of the former.

Still suck at GAN to tensorflow…
Anyone helps?

https://www.heywhale.com/mw/project/6062924694a58b00178fc4d6

TF finished!
http://preview.d2l.ai/d2l-en/PR-1716/chapter_generative-adversarial-networks/gan.html

loss = nn.BCEWithLogitsLoss(reduction='sum')

With this loss where reduction is ‘sum’, I think the model does not consider the data size(batch size) in gradient descent.

Isn’t it better to use
loss = nn.BCEWithLogitsLoss(reduction='mean') with
metric.add(update_D(X, Z, net_D, net_G, loss, trainer_D)* batch_size ,update_G(Z, net_D, net_G, loss, trainer_G)* batch_size, batch_size)

If it is right, Could I commit this change?

Thanks

@goldpiggy @astonzhang Thanks in advance :slight_smile:

If the generator does a perfect job, then 𝐷(𝐱′)≈1D(x′)≈1 so the above loss near 0, which results the gradients are too small to make a good progress for the discriminator. So commonly we minimize the following loss:

which is just feed 𝐱′=𝐺(𝐳)x′=G(z) into the discriminator but giving label 𝑦=1y=1.

The sentences above look quite confusing to me… There might be some grammatical errors in there…
Could you please rewrite these so they look coherent and natural?

1 Like

This sentence also confuses me a lot. Authors, please, correct this. Or at least specify the source of this sentence?

Same to me, I don’t understand why the above loss is near 0 and the gradients become small?

The equation states that we need the parameters of the generator that maximize the loss, and the parameters of the discriminator that minimize the loss. However in the loss plot we see that the discriminator loss increases, while the generator loss decreases. Can someone please clarify?

As the model gets trained, the discriminator loss decreases as it is increasingly being fooled by the generator. The generator loss decreases as the the generator outputs are increasingly being predicted as 1 (Loss_G = loss(D(G(x)), ones)), so we have smaller and smaller loss_G