Generative Adversarial Networks

uniq · October 2, 2020, 5:18pm

Thank you for your reply!!
The fake_Y is computed by passing the fake_X into the discriminator. Therefore when we call backward on the loss_G variable , since fake_X is computed by discriminator , will the autograd calculate the gradients of the parameter in the discriminator network ?

uniq · October 2, 2020, 5:20pm

Yes exactly…While backpropagating since we computed fake_x through net_D , then the gradients for the net_d will also be calculated right ? But we update only net_G. Why can’t we set the gradient_req to be null for the parameters of net_D ?

StevenJokess · October 2, 2020, 5:23pm

loss_G.backward() is the only code to backpropagate.
@uniq
I’m not sure(newbie too), but I think the gradients for the net_d will also be calculated…
I’m trying to make a code note to the process.

goldpiggy · October 2, 2020, 5:57pm

Hi @StevenJokess and @uniq, great discussion! Since here we are given discriminator 𝐷, so we need to calculate its gradients and use it in the chain rule for the purpose of updating the weights of generator 𝐺 only. You can think of the discriminator 𝐷 as a regular function rather than a network. Let me know if it makes sense.

StevenJokess · October 2, 2020, 6:00pm

@goldpiggy
I agree that updating the weights of generator 𝐺 only.
But I think, the gradients for the net_d will also be calculated when updating.
It is not contradiction.

OK. I found you also said that need to calculate net_D…Sorry to miss that.

discriminator D is only something that is trying to give you output ,like 0.8(near 1, so we think it is more like real data)

uniq · October 3, 2020, 4:42am

To calculate d(loss_G)/d(G) (G : generator network), why we need to compute the gradients of the discriminator network? The chain rule doesn’t involve any expression of the gradients of the discriminator. Correct me if i am wrong…

uniq · October 3, 2020, 4:48am

uniq · October 3, 2020, 4:49am

There is no expression involving gradients of the discriminator network. Then why we need to calculate it ?
Thanks in advance

StevenJokess · October 3, 2020, 5:04am

…
I can’t understand what you are drawing…
Is it net_D?

uniq · October 3, 2020, 5:23am

Sorry about my handwriting. If you write the expression for d(loss_G) / d(G) , will you find anywhere in the chain rule the expression involving gradients of D ? This is my question…Hope I am clear…If there is no expression involving gradients of D, then why we need to compute it? We can set the grad_req to be null right ?

Thanks in advance

StevenJokess · October 3, 2020, 8:59am

uniq · October 3, 2020, 5:10pm

I don’t think we still need to compute the gradients of the Discriminator network. Can you show me the chain rule expression from loss_G to the net_G involving anything with the gradients of net_D ?

uniq · October 3, 2020, 5:32pm

This is what I am trying to convey. I still get the same loss configuration If I set the discriminator network’s parameters gradient_requirement to be null in the update_G function. Correct me If I am wrong. If the discriminator network were a large neural network , then computing the gradients will be a costly operation.

StevenJokess · October 4, 2020, 8:23am

I’m not sure about my opinion.
But my thoughts are that we don’t update gradient for net_D in the update_G function, but only computing the gradient of net_D.

uniq · October 4, 2020, 8:52am

See the description for ‘null’. It says the gradient arrays will not be allocated. Therefore , how will the gradients will be calculated and where it will be stored ?

StevenJokess · November 7, 2020, 11:22am

I guess that we just used it without storing…@uniq

Media · November 17, 2020, 3:49am

" If the generator does a perfect job, then D(x′)≈1D(x′)≈1 so the above loss near 0, which results the gradients are too small to make a good progress for the discriminator. So commonly we minimize the following loss:"
Don’t you think this will lead to a large error? You can simply plot it.

peng · November 19, 2020, 10:33am

I doubt about that, too.@Media @goldpiggy

peng · November 19, 2020, 3:27pm

I guess this final expression is what the loss_G.backward() calculates, which should include net_G and net_D in the code because the gradient is calculated using the weights in net_D and net_G.

peng · November 19, 2020, 3:36pm

Hi, very impressive discussion about the “black box”. I wonder if any progress for GAN learning the exact matrix A and b just using the “real data”. Or maybe the NN just care about the result. That is interesting because we could have various A and b to make the data looks the “same”, howerve, they are actually different. @goldpiggy @Donald_Smith