Auto Differentiation

StevenJokes · June 13, 2020, 3:23am

Q1.
Does it mean that if I ‘clone’ a tensor or a Variable that has requires_grad attribute, then I don’t need to .requires_grad() for the new one?

Use detach() to remove a tensor? However, why can I still judge x.grad() == u.grad? “Remove” doesn’t mean that x will not exist? I think that x and u are just two different names for the same storage.

Can my code about 2.5.2’s Variable add to 2.5.2?

Q2.
I have understand that d is a function of which scales a. But what difference with f(b)?
Because, I think that f(b) = 2 * b.

First, b(formal parameter) =2 * a (argument b).
Then, not enter the loop. (b > 1000)
Then, True for if, return b(formal parameter) which is 2 * b(argument)

Q3.
Thanks. I will try it next time. I have did pip install d2l
The specified module could not be found
Did the problem happen because the module doesn’t exist in my sys.path?

anirudh · June 16, 2020, 10:29pm

Ans1.
Yes we don’t use Variable in PyTorch now. We can use Tensor to do everything a variable did earlier with the latest version.

Ans2.
Yes, f(b) will be 2*b and if you change it to the following you will get True.

b = torch.randn(size=(1,), requires_grad=True)
d=f(b)
d.backward()

b.grad == d/b
>>>True

Ans3. Just uninstall the pip version and run python setup.py develop in your root d2l-en directory for installing the library.

anirudh · June 17, 2020, 1:30am

Ans3.
Just run this python setup.py develop in your repository.

StevenJokes · June 18, 2020, 5:25am

Q2.

b = b + 1000

Why does it make False?
I found that print(b.grad) is None?
Why did it happen?

ccpvirus · August 2, 2020, 10:57pm

my solution to question 5

import numpy as np
from d2l import torch as d2l
x = np.linspace(- np.pi,np.pi,100)
x = torch.tensor(x, requires_grad=True)
y = torch.sin(x)
for i in range(100):
y[i].backward(retain_graph = True)

d2l.plot(x.detach(),(y.detach(),x.grad),legend = ((‘sin(x)’,“grad w.s.t x”)))

looks dummy since I compute the grad 100 times, is there any better way?

SONG_PAN · August 27, 2020, 10:27pm

I don’t quite understand 2.5.2.

My understanding is that by default the framework will convert the actual output matrix into a vector based on a “gradient vector” that we passed in. The description of this “gradient vector” is

which specifies the gradient of the differentiated function w.r.t self.

What does it mean? Does “gradient of differentiated function” mean “second order gradient”?

SONG_PAN · August 27, 2020, 10:46pm

My answers to the questions: please point out if I am misunderstood anything

Why is the second derivative much more expensive to compute than the first derivative?

Because instead of following the original computation graph, we need to construct a new one that corresponds to the calculation of first-order gradient?

After running the function for backpropagation, immediately run it again and see what happens.

RuntimeError: Trying to backward through the graph a second time, but the saved intermediate results have already been freed. Specify retain_graph=True when calling backward the first time.

In the control flow example where we calculate the derivative of d with respect to a , what would happen if we changed the variable a to a random vector or matrix. At this point, the result of the calculation f(a) is no longer a scalar. What happens to the result? How do we analyze this?

Directly changing would give the “blah blah … only scalar output” which is what 2.5.2 is talking about. I changed the code to

a = torch.randn(size=(3,), requires_grad=True)
d = f(a)
d.sum().backward()

and a.grad == d / a still gives true. This is because in the function, there is no cross-element operation. So each element of the vector is independent and doesn’t affect other element’s differentiation.

Redesign an example of finding the gradient of the control flow. Run and analyze the result.

SKIP

Let $f(x) = \sin(x)$. Plot $f(x)$ and $\frac{df(x)}{dx}$, where the latter is computed without exploiting that $f’(x) = \cos(x)$.

import numpy as np
from d2l import torch as d2l
x = np.linspace(- np.pi,np.pi,100)
x = torch.tensor(x, requires_grad=True)
y = torch.sin(x)
y.sum().backward()

d2l.plot(x.detach(),(y.detach(),x.grad),legend = (('sin(x)',"grad w.s.t x")))

goldpiggy · August 29, 2020, 12:38am

Hi @SONG_PAN, this is a first order gradient. The gradient is only available after we differentiate the function.

Aaron_L · November 21, 2020, 4:20pm

Hi, May I confirm my understanding about the auto differentiation of Python control flow?
I assume, as long as the Python control flow is established by functions and variables from Pytorch, then the auto differentiation is doable. Am I right on this? No other functions should be involved such as sin() from math package(have to use sin() from Pytorch instead if I want auto differentiation).

StevenJokess · November 27, 2020, 6:35am

github.com

StevenJokess/d2l-en-read/blob/29574f604221e53149aa71288d56e5de5d7a34cd/discuss/Ch02/Auto Differentiation.ipynb

{
 "metadata": {
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.6-final"
  },
  "orig_nbformat": 2,
  "kernelspec": {
   "name": "python3",
   "display_name": "Python 3"
  }
 },

This file has been truncated. show original

I tried. Did you mean this? @Aaron_L

Lefan · February 8, 2021, 1:36pm

I guess you can replace
for i in range(100):
y[i].backward(retain_graph = True)

with the following sentence:
y.sum().backward(retain_graph = True)

gary · March 16, 2021, 4:03am

Anyone know how the backpropagation of CNN and RNN works? Are there any step by step tutorials to show the derivation? Thanks!

StevenJokess · March 31, 2021, 4:17am

@gary

“Searching” is the best way to solve common questions like this.
Do you really want to learn? Or you only want to ask to act like a learner.
CNN: TODO:for you
RNN: http://preview.d2l.ai/d2l-en/master/chapter_recurrent-neural-networks/bptt.html

JackFu123 · May 9, 2021, 10:28pm

How about:
x = np.linspace(- np.pi,np.pi,100)
x = torch.tensor(x, requires_grad=True)
y = torch.sin(x)
z = y.sum()
z.backward()

d2l.plot(x.detach(),(y.detach(),x.grad),legend = (('sin(x)','grad w.s.t x')))

Luis · May 11, 2021, 12:40am

My solution to the last exercise

import torch
import matplotlib.pyplot as plt
import numpy as np
import math
%matplotlib inline

x = torch.arange(0, 2*math.pi, 0.1)
x.requires_grad_(True)
y = torch.sin(x).sum().backward() #Derivando y respecto a x

dy_dx = x.grad

x = x.detach().numpy()

y = np.sin(x)
dy_dx = dy_dx.numpy()

fig = plt.figure()
plt.plot(x, y, ‘-’, label = ‘y = sin(x)’)
plt.plot(x, dy_dx, ‘–’, label =’ dy_dx = cos(x)’)
plt.axis(‘equal’)
leg = plt.legend();

CE_I · July 19, 2021, 8:53am

My solution to the last exercise

Mingming_Li · August 9, 2021, 2:55pm

x = torch.arange(-np.pi, np.pi, 0.1, requires_grad=True)
y = torch.sin(x)
y.sum().backward()
x.grad

pbouzon · November 9, 2021, 1:05pm

Exercise 1. The second derivative is much more expensive because, for a function f with n variables, the Hessian Matrix has n² elements while the gradient vector has n elements. In other words, there are n² possible second order derivatives ( ∂²f/∂x1², ∂²f/∂x1∂x2, …) while there are n possible first order derivatives (∂f/∂x1, ∂f/∂x2, …, ∂f/∂xn).

Donald_Slowik · April 4, 2022, 8:58pm

Backward for Non-Scalar Variables:

Seems, in the text, by calculating y.backward(torch.ones(len(x))), we contract the Jacobian over the 4 components of y(x). The Jacobian is a 4x4 matrix. I can extract each column via y.backward(torch.tensor([1,0,0,0])), #where i shift the 1 to each of the 4 column slots.
Is this best way to calculate the Jacobian?

GpuTooExpensive · August 11, 2022, 10:09am

My anwser for ex 2 3 4，welcom to tell me if you find any error

'''
ex.2
Aafter running the function for backpropagation, immediately run it again and see what hap-
pens. Why?

reference:
[https://blog.csdn.net/rothschild666/article/details/124170794](https://blog.csdn.net/rothschild666/article/details/124170794)
'''
import torch
x = torch.arange(4.0, requires_grad=True)
x.requires_grad_(True)
y = 2 * torch.dot(x, x)
y.backward(retain_graph=True)#need to set para = Ture
y.backward()
x.grad

out for ex2:
tensor([ 0., 8., 16., 24.])

'''
ex.3
In the control flow example where we calculate the derivative of d with respect to a, what
would happen if we changed the variable a to a random vector or a matrix? At this point, the
result of the calculation f(a) is no longer a scalar. What happens to the result? How do we
analyze this?


reference:
[https://www.jb51.net/article/211983.htm](https://www.jb51.net/article/211983.htm)
'''
import torch
def f(a):
    b = a * 2
    while b.norm() < 1000:
        b = b * 2
    if b.sum() > 0:
        c = b
    else:
        c = 100 * b
    return c
a = torch.randn(size=(2,3), requires_grad=True)
#a = torch.tensor([1.0,2.0], requires_grad=True)
print(a)
d = f(a)
d.backward(d.detach())#This is neccesary, I don't know why, is it because there are b and c in the function?
#d.backward()#wrong
print(a.grad)

out for ex3:
tensor([[-0.6848, 0.2447, 1.5633],
[-0.1291, 0.2607, 0.9181]], requires_grad=True)
tensor([[-179512.9062, 64147.3203, 409814.3750],
[ -33844.5430, 68346.7969, 240686.7344]])

'''
ex.4
Let f (x) = sin(x). Plot the graph of f and of its derivative f ′ . Do not exploit the fact that
f ′ (x) = cos(x) but rather use automatic differentiation to get the result.
'''
#suppose the functions in 2.4.2Visualization Utilities has been constructed in d2l
from d2l import torch as d2l
x = d2l.arange(0,10,0.1, requires_grad = True)
y = d2l.sin(x)
y.backward(gradient=d2l.ones(len(y)))#y.sum().backward() can do the same thing
#.detach.numpy() is needed because x is set to requires_grad = True
d2l.plot(x.detach().numpy(), [y.detach().numpy(),x.grad], 'x', 'f(x)', legend = ['sin(x)','sin\'(x)'])

out for ex4: