http://d2l.ai/chapter_convolutional-neural-networks/conv-layer.html
For (2) When you try to automatically find the gradient for the Conv2D
class we created, what kind of error message do you see?
I get an error message to the effect of “Inplace operations are not supported using autograd” (exact message below). How do I perform a convolution without pre-allocating the tensor and setting one element at a time?
MXNetError: [05:22:28] src/imperative/imperative.cc:261:
Check failed: AGInfo::IsNone(*output): Inplace operations (+=, -=, x[:]=, etc) are not supported when recording with autograd.
Hi @ganeshk, great question! As you may find that In-place-operations may raise an error when the storage of modified inputs is referenced by any other variables. Check here to see how to implement it from scratch!
Thanks @goldpiggy. Did you mean to post the link to the LeNet page in D2L as an example of how to implement in-place operations from scratch? I don’t see such an implementation on that page. Maybe I’m missing it?
Hey @ganeshk. No, LeNet is an example of CNN implementation. If you need want to see the in-place operations from scratch, please go check MXNet/Torch/TF official operators. (We cannot teach from how to build a laptop from scratch )
Is there print(f’batch {i + 1}, loss {float(l.sum()):.3f}’) should be replaced by print(f’epoch {i + 1}, loss {float(l.sum()):.3f}’) ?
I have trouble understanding the intuition behind these questions.
1.What is the form of a kernel for the second derivative?
2.What is the kernel for an integral?
3.What is the minimum size of a kernel to obtain a derivative of degree d?
In my understanding, the kernel is the matrix learned for convolution. (Or cross correlation)
How are they used to compute derivatives and integrals?
What is the input and output? Is input the derivative of a matrix? The output dimensions of convolution is always less than the input dimensions. So how do we get the derivative?
My train of thought:
Convolution takes in an element, and spits out a linear combination with the surrounding elements with respect to the kernel shape. To compute the derivative, we need to implement a function of degree 2, lets say.
f(x) = ax^2 + b^x + c.
Therefore, f’(x) = 2ax + b.
Therefore, given an element, we need to learn 2ax + b. Therefore, we require one element for the kernel?
NOTE: Since the kernel function is implemented as x + (linear combination of surrounding elements), we ned the kernel to learn (2a-1)*x + b. (b can be the bias)
If f(x) = ax^3 + bx^2 + cx + d,
f’(x) = 3ax^2 + 2b^x + c. <-- We need to represent this.
How do we learn a quadratic function using the conv2d operations, since it performs linear combinations?
We can probably give it a lot of inputs and outputs and make it learn weights that arent necessarily related with the coefficients of the function. In this case, the kernel window can be as large as allowed. But how does that generalise to degree d?
I think I will get an idea for integral once I am clarified with this.
Hi, here the derivative is supposed as the image derivative rather than a derivative on continuous functions. You can check thik link to get some ideas. Hope it will help you.
Yes, I think so. Good point.