You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

449 lines
12 KiB

This file contains ambiguous Unicode characters!

This file contains ambiguous Unicode characters that may be confused with others in your current locale. If your use case is intentional and legitimate, you can safely ignore this warning. Use the Escape button to highlight these characters.

{
"cells": [
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"0.4.1\n"
]
}
],
"source": [
"import torch\n",
"\n",
"print(torch.__version__)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# 2.3 自动求梯度\n",
"## 2.3.1 概念\n",
"上一节介绍的`Tensor`是这个包的核心类,如果将其属性`.requires_grad`设置为`True`,它将开始追踪(track)在其上的所有操作。完成计算后,可以调用`.backward()`来完成所有梯度计算。此`Tensor`的梯度将累积到`.grad`属性中。\n",
"> 注意在调用`.backward()`时,如果`Tensor`是标量,则不需要为`backward()`指定任何参数;否则,需要指定一个求导变量。\n",
"\n",
"如果不想要被继续追踪,可以调用`.detach()`将其从追踪记录中分离出来,这样就可以防止将来的计算被追踪。此外,还可以用`with torch.no_grad()`将不想被追踪的操作代码块包裹起来,这种方法在评估模型的时候很常用,因为在评估模型时,我们并不需要计算可训练参数(`requires_grad=True`)的梯度。\n",
"\n",
"`Function`是另外一个很重要的类。`Tensor`和`Function`互相结合就可以构建一个记录有整个计算过程的非循环图。每个`Tensor`都有一个`.grad_fn`属性,该属性即创建该`Tensor`的`Function`(除非用户创建的`Tensor`s时设置了`grad_fn=None`)。\n",
"\n",
"下面通过一些例子来理解这些概念。\n",
"\n",
"## 2.3.2 `Tensor`"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"tensor([[1., 1.],\n",
" [1., 1.]], requires_grad=True)\n",
"None\n"
]
}
],
"source": [
"x = torch.ones(2, 2, requires_grad=True)\n",
"print(x)\n",
"print(x.grad_fn)"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"tensor([[3., 3.],\n",
" [3., 3.]], grad_fn=<AddBackward>)\n",
"<AddBackward object at 0x10ed634a8>\n"
]
}
],
"source": [
"y = x + 2\n",
"print(y)\n",
"print(y.grad_fn)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"注意x是直接创建的所以它没有`grad_fn`, 而y是通过一个加法操作创建的所以它有一个为`<AddBackward>`的`grad_fn`。"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"True False\n"
]
}
],
"source": [
"print(x.is_leaf, y.is_leaf)"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"tensor([[27., 27.],\n",
" [27., 27.]], grad_fn=<MulBackward>) tensor(27., grad_fn=<MeanBackward1>)\n"
]
}
],
"source": [
"z = y * y * 3\n",
"out = z.mean()\n",
"print(z, out)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"通过`.requires_grad_()`来用in-place的方式改变`requires_grad`属性:"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"False\n",
"True\n",
"<SumBackward0 object at 0x10ed63c50>\n"
]
}
],
"source": [
"a = torch.randn(2, 2) # 缺失情况下默认 requires_grad = False\n",
"a = ((a * 3) / (a - 1))\n",
"print(a.requires_grad) # False\n",
"a.requires_grad_(True)\n",
"print(a.requires_grad) # True\n",
"b = (a * a).sum()\n",
"print(b.grad_fn)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 2.3.3 梯度 \n",
"\n",
"因为`out`是一个标量,所以调用`backward()`时不需要指定求导变量:"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"tensor([[4.5000, 4.5000],\n",
" [4.5000, 4.5000]])\n"
]
}
],
"source": [
"out.backward() # 等价于 out.backward(torch.tensor(1.))\n",
"print(x.grad)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"我们令`out`为 $o$ , 因为\n",
"$$\n",
"o=\\frac14\\sum_{i=1}^4z_i=\\frac14\\sum_{i=1}^43(x_i+2)^2\n",
"$$\n",
"所以\n",
"$$\n",
"\\frac{\\partial{o}}{\\partial{x_i}}\\bigr\\rvert_{x_i=1}=\\frac{9}{2}=4.5\n",
"$$\n",
"所以上面的输出是正确的。"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"数学上,如果有一个函数值和自变量都为向量的函数 $\\vec{y}=f(\\vec{x})$, 那么 $\\vec{y}$ 关于 $\\vec{x}$ 的梯度就是一个雅可比矩阵Jacobian matrix:\n",
"\n",
"$$\n",
"J=\\left(\\begin{array}{ccc}\n",
" \\frac{\\partial y_{1}}{\\partial x_{1}} & \\cdots & \\frac{\\partial y_{1}}{\\partial x_{n}}\\\\\n",
" \\vdots & \\ddots & \\vdots\\\\\n",
" \\frac{\\partial y_{m}}{\\partial x_{1}} & \\cdots & \\frac{\\partial y_{m}}{\\partial x_{n}}\n",
" \\end{array}\\right)\n",
"$$\n",
"\n",
"而``torch.autograd``这个包就是用来计算一些雅克比矩阵的乘积的。例如,如果 $v$ 是一个标量函数的 $l=g\\left(\\vec{y}\\right)$ 的梯度:\n",
"\n",
"$$\n",
"v=\\left(\\begin{array}{ccc}\\frac{\\partial l}{\\partial y_{1}} & \\cdots & \\frac{\\partial l}{\\partial y_{m}}\\end{array}\\right)\n",
"$$\n",
"\n",
"那么根据链式法则我们有 $l$ 关于 $\\vec{x}$ 的雅克比矩阵就为:\n",
"\n",
"$$\n",
"v \\cdot J=\\left(\\begin{array}{ccc}\\frac{\\partial l}{\\partial y_{1}} & \\cdots & \\frac{\\partial l}{\\partial y_{m}}\\end{array}\\right) \\left(\\begin{array}{ccc}\n",
" \\frac{\\partial y_{1}}{\\partial x_{1}} & \\cdots & \\frac{\\partial y_{1}}{\\partial x_{n}}\\\\\n",
" \\vdots & \\ddots & \\vdots\\\\\n",
" \\frac{\\partial y_{m}}{\\partial x_{1}} & \\cdots & \\frac{\\partial y_{m}}{\\partial x_{n}}\n",
" \\end{array}\\right)=\\left(\\begin{array}{ccc}\\frac{\\partial l}{\\partial x_{1}} & \\cdots & \\frac{\\partial l}{\\partial x_{n}}\\end{array}\\right)\n",
"$$"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"注意grad在反向传播过程中是累加的(accumulated),这意味着每一次运行反向传播,梯度都会累加之前的梯度,所以一般在反向传播之前需把梯度清零。"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"tensor([[5.5000, 5.5000],\n",
" [5.5000, 5.5000]])\n",
"tensor([[1., 1.],\n",
" [1., 1.]])\n"
]
}
],
"source": [
"# 再来反向传播一次注意grad是累加的\n",
"out2 = x.sum()\n",
"out2.backward()\n",
"print(x.grad)\n",
"\n",
"out3 = x.sum()\n",
"x.grad.data.zero_()\n",
"out3.backward()\n",
"print(x.grad)"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"tensor([[2., 4.],\n",
" [6., 8.]], grad_fn=<ViewBackward>)\n"
]
}
],
"source": [
"x = torch.tensor([1.0, 2.0, 3.0, 4.0], requires_grad=True)\n",
"y = 2 * x\n",
"z = y.view(2, 2)\n",
"print(z)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"现在 `y` 不是一个标量,所以在调用`backward`时需要传入一个和`y`同形的权重向量进行加权求和得到一个标量。"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"tensor([2.0000, 0.2000, 0.0200, 0.0020])\n"
]
}
],
"source": [
"v = torch.tensor([[1.0, 0.1], [0.01, 0.001]], dtype=torch.float)\n",
"z.backward(v)\n",
"\n",
"print(x.grad)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"再来看看中断梯度追踪的例子:"
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"tensor(1., requires_grad=True) True\n",
"tensor(1., grad_fn=<PowBackward0>) True\n",
"tensor(1.) False\n",
"tensor(2., grad_fn=<ThAddBackward>) True\n"
]
}
],
"source": [
"x = torch.tensor(1.0, requires_grad=True)\n",
"y1 = x ** 2 \n",
"with torch.no_grad():\n",
" y2 = x ** 3\n",
"y3 = y1 + y2\n",
" \n",
"print(x, x.requires_grad)\n",
"print(y1, y1.requires_grad)\n",
"print(y2, y2.requires_grad)\n",
"print(y3, y3.requires_grad)"
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"tensor(2.)\n"
]
}
],
"source": [
"y3.backward()\n",
"print(x.grad)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"为什么是2呢$ y_3 = y_1 + y_2 = x^2 + x^3$,当 $x=1$ 时 $\\frac {dy_3} {dx}$ 不应该是5吗事实上由于 $y_2$ 的定义是被`torch.no_grad():`包裹的,所以与 $y_2$ 有关的梯度是不会回传的,只有与 $y_1$ 有关的梯度才会回传,即 $x^2$ 对 $x$ 的梯度。\n",
"\n",
"上面提到,`y2.requires_grad=False`,所以不能调用 `y2.backward()`。"
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"# y2.backward() # 会报错 RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"如果我们想要修改`tensor`的数值,但是又不希望被`autograd`记录(即不会影响反向传播),那么我么可以对`tensor.data`进行操作."
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"tensor([1.])\n",
"False\n",
"tensor([100.], requires_grad=True)\n",
"tensor([2.])\n"
]
}
],
"source": [
"x = torch.ones(1,requires_grad=True)\n",
"\n",
"print(x.data) # 还是一个tensor\n",
"print(x.data.requires_grad) # 但是已经是独立于计算图之外\n",
"\n",
"y = 2 * x\n",
"x.data *= 100 # 只改变了值,不会记录在计算图,所以不会影响梯度传播\n",
"\n",
"y.backward()\n",
"print(x) # 更改data的值也会影响tensor的值\n",
"print(x.grad)"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python [default]",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.3"
}
},
"nbformat": 4,
"nbformat_minor": 1
}