push

4 years ago · e1da9dd954
parent 94bce3fbe9
commit e1da9dd954
326 changed files with 157786 additions and 0 deletions
--- a/.github/ISSUE_TEMPLATE/bug_report.md
+++ b/.github/ISSUE_TEMPLATE/bug_report.md
@ -0,0 +1,17 @@
+---
+name: Bug report
+about: bug相关issue请按照此模板填写否则会被直接关闭
+title: ''
+labels: ''
+assignees: ''
+
+---
+
+**bug描述**
+描述一下你遇到的bug, 例如报错位置、报错信息（重要, 可以直接截个图）等
+
+**版本信息**
+pytorch:
+torchvision:
+torchtext:
+...
--- a/5
+++ b/5
@ -0,0 +1,5 @@
+FROM node:alpine
+RUN npm i docsify-cli -g
+COPY . /data
+WORKDIR /data
+CMD [ "docsify", "serve", "docs" ]
--- a/201
+++ b/201
@ -0,0 +1,201 @@
+                                 Apache License
+                           Version 2.0, January 2004
+                        http://www.apache.org/licenses/
+
+   TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
+
+   1. Definitions.
+
+      "License" shall mean the terms and conditions for use, reproduction,
+      and distribution as defined by Sections 1 through 9 of this document.
+
+      "Licensor" shall mean the copyright owner or entity authorized by
+      the copyright owner that is granting the License.
+
+      "Legal Entity" shall mean the union of the acting entity and all
+      other entities that control, are controlled by, or are under common
+      control with that entity. For the purposes of this definition,
+      "control" means (i) the power, direct or indirect, to cause the
+      direction or management of such entity, whether by contract or
+      otherwise, or (ii) ownership of fifty percent (50%) or more of the
+      outstanding shares, or (iii) beneficial ownership of such entity.
+
+      "You" (or "Your") shall mean an individual or Legal Entity
+      exercising permissions granted by this License.
+
+      "Source" form shall mean the preferred form for making modifications,
+      including but not limited to software source code, documentation
+      source, and configuration files.
+
+      "Object" form shall mean any form resulting from mechanical
+      transformation or translation of a Source form, including but
+      not limited to compiled object code, generated documentation,
+      and conversions to other media types.
+
+      "Work" shall mean the work of authorship, whether in Source or
+      Object form, made available under the License, as indicated by a
+      copyright notice that is included in or attached to the work
+      (an example is provided in the Appendix below).
+
+      "Derivative Works" shall mean any work, whether in Source or Object
+      form, that is based on (or derived from) the Work and for which the
+      editorial revisions, annotations, elaborations, or other modifications
+      represent, as a whole, an original work of authorship. For the purposes
+      of this License, Derivative Works shall not include works that remain
+      separable from, or merely link (or bind by name) to the interfaces of,
+      the Work and Derivative Works thereof.
+
+      "Contribution" shall mean any work of authorship, including
+      the original version of the Work and any modifications or additions
+      to that Work or Derivative Works thereof, that is intentionally
+      submitted to Licensor for inclusion in the Work by the copyright owner
+      or by an individual or Legal Entity authorized to submit on behalf of
+      the copyright owner. For the purposes of this definition, "submitted"
+      means any form of electronic, verbal, or written communication sent
+      to the Licensor or its representatives, including but not limited to
+      communication on electronic mailing lists, source code control systems,
+      and issue tracking systems that are managed by, or on behalf of, the
+      Licensor for the purpose of discussing and improving the Work, but
+      excluding communication that is conspicuously marked or otherwise
+      designated in writing by the copyright owner as "Not a Contribution."
+
+      "Contributor" shall mean Licensor and any individual or Legal Entity
+      on behalf of whom a Contribution has been received by Licensor and
+      subsequently incorporated within the Work.
+
+   2. Grant of Copyright License. Subject to the terms and conditions of
+      this License, each Contributor hereby grants to You a perpetual,
+      worldwide, non-exclusive, no-charge, royalty-free, irrevocable
+      copyright license to reproduce, prepare Derivative Works of,
+      publicly display, publicly perform, sublicense, and distribute the
+      Work and such Derivative Works in Source or Object form.
+
+   3. Grant of Patent License. Subject to the terms and conditions of
+      this License, each Contributor hereby grants to You a perpetual,
+      worldwide, non-exclusive, no-charge, royalty-free, irrevocable
+      (except as stated in this section) patent license to make, have made,
+      use, offer to sell, sell, import, and otherwise transfer the Work,
+      where such license applies only to those patent claims licensable
+      by such Contributor that are necessarily infringed by their
+      Contribution(s) alone or by combination of their Contribution(s)
+      with the Work to which such Contribution(s) was submitted. If You
+      institute patent litigation against any entity (including a
+      cross-claim or counterclaim in a lawsuit) alleging that the Work
+      or a Contribution incorporated within the Work constitutes direct
+      or contributory patent infringement, then any patent licenses
+      granted to You under this License for that Work shall terminate
+      as of the date such litigation is filed.
+
+   4. Redistribution. You may reproduce and distribute copies of the
+      Work or Derivative Works thereof in any medium, with or without
+      modifications, and in Source or Object form, provided that You
+      meet the following conditions:
+
+      (a) You must give any other recipients of the Work or
+          Derivative Works a copy of this License; and
+
+      (b) You must cause any modified files to carry prominent notices
+          stating that You changed the files; and
+
+      (c) You must retain, in the Source form of any Derivative Works
+          that You distribute, all copyright, patent, trademark, and
+          attribution notices from the Source form of the Work,
+          excluding those notices that do not pertain to any part of
+          the Derivative Works; and
+
+      (d) If the Work includes a "NOTICE" text file as part of its
+          distribution, then any Derivative Works that You distribute must
+          include a readable copy of the attribution notices contained
+          within such NOTICE file, excluding those notices that do not
+          pertain to any part of the Derivative Works, in at least one
+          of the following places: within a NOTICE text file distributed
+          as part of the Derivative Works; within the Source form or
+          documentation, if provided along with the Derivative Works; or,
+          within a display generated by the Derivative Works, if and
+          wherever such third-party notices normally appear. The contents
+          of the NOTICE file are for informational purposes only and
+          do not modify the License. You may add Your own attribution
+          notices within Derivative Works that You distribute, alongside
+          or as an addendum to the NOTICE text from the Work, provided
+          that such additional attribution notices cannot be construed
+          as modifying the License.
+
+      You may add Your own copyright statement to Your modifications and
+      may provide additional or different license terms and conditions
+      for use, reproduction, or distribution of Your modifications, or
+      for any such Derivative Works as a whole, provided Your use,
+      reproduction, and distribution of the Work otherwise complies with
+      the conditions stated in this License.
+
+   5. Submission of Contributions. Unless You explicitly state otherwise,
+      any Contribution intentionally submitted for inclusion in the Work
+      by You to the Licensor shall be under the terms and conditions of
+      this License, without any additional terms or conditions.
+      Notwithstanding the above, nothing herein shall supersede or modify
+      the terms of any separate license agreement you may have executed
+      with Licensor regarding such Contributions.
+
+   6. Trademarks. This License does not grant permission to use the trade
+      names, trademarks, service marks, or product names of the Licensor,
+      except as required for reasonable and customary use in describing the
+      origin of the Work and reproducing the content of the NOTICE file.
+
+   7. Disclaimer of Warranty. Unless required by applicable law or
+      agreed to in writing, Licensor provides the Work (and each
+      Contributor provides its Contributions) on an "AS IS" BASIS,
+      WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
+      implied, including, without limitation, any warranties or conditions
+      of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
+      PARTICULAR PURPOSE. You are solely responsible for determining the
+      appropriateness of using or redistributing the Work and assume any
+      risks associated with Your exercise of permissions under this License.
+
+   8. Limitation of Liability. In no event and under no legal theory,
+      whether in tort (including negligence), contract, or otherwise,
+      unless required by applicable law (such as deliberate and grossly
+      negligent acts) or agreed to in writing, shall any Contributor be
+      liable to You for damages, including any direct, indirect, special,
+      incidental, or consequential damages of any character arising as a
+      result of this License or out of the use or inability to use the
+      Work (including but not limited to damages for loss of goodwill,
+      work stoppage, computer failure or malfunction, or any and all
+      other commercial damages or losses), even if such Contributor
+      has been advised of the possibility of such damages.
+
+   9. Accepting Warranty or Additional Liability. While redistributing
+      the Work or Derivative Works thereof, You may choose to offer,
+      and charge a fee for, acceptance of support, warranty, indemnity,
+      or other liability obligations and/or rights consistent with this
+      License. However, in accepting such obligations, You may act only
+      on Your own behalf and on Your sole responsibility, not on behalf
+      of any other Contributor, and only if You agree to indemnify,
+      defend, and hold each Contributor harmless for any liability
+      incurred by, or claims asserted against, such Contributor by reason
+      of your accepting any such warranty or additional liability.
+
+   END OF TERMS AND CONDITIONS
+
+   APPENDIX: How to apply the Apache License to your work.
+
+      To apply the Apache License to your work, attach the following
+      boilerplate notice, with the fields enclosed by brackets "[]"
+      replaced with your own identifying information. (Don't include
+      the brackets!)  The text should be enclosed in the appropriate
+      comment syntax for the file format. We also recommend that a
+      file or class name and description of purpose be included on the
+      same "printed page" as the copyright notice for easier
+      identification within third-party archives.
+
+   Copyright [yyyy] [name of copyright owner]
+
+   Licensed under the Apache License, Version 2.0 (the "License");
+   you may not use this file except in compliance with the License.
+   You may obtain a copy of the License at
+
+       http://www.apache.org/licenses/LICENSE-2.0
+
+   Unless required by applicable law or agreed to in writing, software
+   distributed under the License is distributed on an "AS IS" BASIS,
+   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+   See the License for the specific language governing permissions and
+   limitations under the License.
--- a/code/chapter02_prerequisite/2.2_tensor.ipynb
+++ b/code/chapter02_prerequisite/2.2_tensor.ipynb
@ -0,0 +1,735 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# 2.2 数据操作"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 1,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "0.4.1\n"
+     ]
+    }
+   ],
+   "source": [
+    "import torch\n",
+    "\n",
+    "torch.manual_seed(0)\n",
+    "torch.cuda.manual_seed(0)\n",
+    "print(torch.__version__)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 2.2.1 创建`Tensor`\n",
+    "\n",
+    "创建一个5x3的未初始化的`Tensor`："
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 2,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "tensor([[0.0000e+00, 1.0842e-19, 1.6162e+22],\n",
+      "        [2.8643e-42, 5.6052e-45, 0.0000e+00],\n",
+      "        [0.0000e+00, 0.0000e+00, 0.0000e+00],\n",
+      "        [0.0000e+00, 0.0000e+00, 0.0000e+00],\n",
+      "        [0.0000e+00, 1.0842e-19, 1.3314e+22]])\n"
+     ]
+    }
+   ],
+   "source": [
+    "x = torch.empty(5, 3)\n",
+    "print(x)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "创建一个5x3的随机初始化的`Tensor`:\n",
+    "\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 3,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "tensor([[0.4963, 0.7682, 0.0885],\n",
+      "        [0.1320, 0.3074, 0.6341],\n",
+      "        [0.4901, 0.8964, 0.4556],\n",
+      "        [0.6323, 0.3489, 0.4017],\n",
+      "        [0.0223, 0.1689, 0.2939]])\n"
+     ]
+    }
+   ],
+   "source": [
+    "x = torch.rand(5, 3)\n",
+    "print(x)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "创建一个5x3的long型全0的`Tensor`:\n",
+    "\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 4,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "tensor([[0, 0, 0],\n",
+      "        [0, 0, 0],\n",
+      "        [0, 0, 0],\n",
+      "        [0, 0, 0],\n",
+      "        [0, 0, 0]])\n"
+     ]
+    }
+   ],
+   "source": [
+    "x = torch.zeros(5, 3, dtype=torch.long)\n",
+    "print(x)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "直接根据数据创建:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 5,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "tensor([5.5000, 3.0000])\n"
+     ]
+    }
+   ],
+   "source": [
+    "x = torch.tensor([5.5, 3])\n",
+    "print(x)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "还可以通过现有的`Tensor`来创建，此方法会默认重用输入`Tensor`的一些属性，例如数据类型，除非自定义数据类型。"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 6,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "tensor([[1., 1., 1.],\n",
+      "        [1., 1., 1.],\n",
+      "        [1., 1., 1.],\n",
+      "        [1., 1., 1.],\n",
+      "        [1., 1., 1.]], dtype=torch.float64)\n",
+      "tensor([[ 0.6035,  0.8110, -0.0451],\n",
+      "        [ 0.8797,  1.0482, -0.0445],\n",
+      "        [-0.7229,  2.8663, -0.5655],\n",
+      "        [ 0.1604, -0.0254,  1.0739],\n",
+      "        [ 2.2628, -0.9175, -0.2251]])\n"
+     ]
+    }
+   ],
+   "source": [
+    "x = x.new_ones(5, 3, dtype=torch.float64)      # 返回的tensor默认具有相同的torch.dtype和torch.device\n",
+    "print(x)\n",
+    "\n",
+    "x = torch.randn_like(x, dtype=torch.float)    # 指定新的数据类型\n",
+    "print(x)                                    "
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "我们可以通过`shape`或者`size()`来获取`Tensor`的形状:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 7,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "torch.Size([5, 3])\n",
+      "torch.Size([5, 3])\n"
+     ]
+    }
+   ],
+   "source": [
+    "print(x.size())\n",
+    "print(x.shape)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "> 注意：返回的torch.Size其实就是一个tuple, 支持所有tuple的操作。"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 2.2.2 操作\n",
+    "### 算术操作\n",
+    "* **加法形式一**"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 8,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "tensor([[ 1.3967,  1.0892,  0.4369],\n",
+      "        [ 1.6995,  2.0453,  0.6539],\n",
+      "        [-0.1553,  3.7016, -0.3599],\n",
+      "        [ 0.7536,  0.0870,  1.2274],\n",
+      "        [ 2.5046, -0.1913,  0.4760]])\n"
+     ]
+    }
+   ],
+   "source": [
+    "y = torch.rand(5, 3)\n",
+    "print(x + y)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "* **加法形式二**"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 9,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "tensor([[ 1.3967,  1.0892,  0.4369],\n",
+      "        [ 1.6995,  2.0453,  0.6539],\n",
+      "        [-0.1553,  3.7016, -0.3599],\n",
+      "        [ 0.7536,  0.0870,  1.2274],\n",
+      "        [ 2.5046, -0.1913,  0.4760]])\n"
+     ]
+    }
+   ],
+   "source": [
+    "print(torch.add(x, y))"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 10,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "tensor([[ 1.3967,  1.0892,  0.4369],\n",
+      "        [ 1.6995,  2.0453,  0.6539],\n",
+      "        [-0.1553,  3.7016, -0.3599],\n",
+      "        [ 0.7536,  0.0870,  1.2274],\n",
+      "        [ 2.5046, -0.1913,  0.4760]])\n"
+     ]
+    }
+   ],
+   "source": [
+    "result = torch.empty(5, 3)\n",
+    "torch.add(x, y, out=result)\n",
+    "print(result)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "* **加法形式三、inplace**"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 11,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "tensor([[ 1.3967,  1.0892,  0.4369],\n",
+      "        [ 1.6995,  2.0453,  0.6539],\n",
+      "        [-0.1553,  3.7016, -0.3599],\n",
+      "        [ 0.7536,  0.0870,  1.2274],\n",
+      "        [ 2.5046, -0.1913,  0.4760]])\n"
+     ]
+    }
+   ],
+   "source": [
+    "# adds x to y\n",
+    "y.add_(x)\n",
+    "print(y)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "> **注：PyTorch操作inplace版本都有后缀\"_\", 例如`x.copy_(y), x.t_()`**\n",
+    "\n",
+    "### 索引\n",
+    "我们还可以使用类似NumPy的索引操作来访问`Tensor`的一部分，需要注意的是：**索引出来的结果与原数据共享内存，也即修改一个，另一个会跟着修改。** "
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 12,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "tensor([1.6035, 1.8110, 0.9549])\n",
+      "tensor([1.6035, 1.8110, 0.9549])\n"
+     ]
+    }
+   ],
+   "source": [
+    "y = x[0, :]\n",
+    "y += 1\n",
+    "print(y)\n",
+    "print(x[0, :]) # 源tensor也被改了"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### 改变形状\n",
+    "用`view()`来改变`Tensor`的形状："
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 13,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "torch.Size([5, 3]) torch.Size([15]) torch.Size([3, 5])\n"
+     ]
+    }
+   ],
+   "source": [
+    "y = x.view(15)\n",
+    "z = x.view(-1, 5)  # -1所指的维度可以根据其他维度的值推出来\n",
+    "print(x.size(), y.size(), z.size())"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "**注意`view()`返回的新tensor与源tensor共享内存，也即更改其中的一个，另外一个也会跟着改变。**"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 14,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "tensor([[2.6035, 2.8110, 1.9549],\n",
+      "        [1.8797, 2.0482, 0.9555],\n",
+      "        [0.2771, 3.8663, 0.4345],\n",
+      "        [1.1604, 0.9746, 2.0739],\n",
+      "        [3.2628, 0.0825, 0.7749]])\n",
+      "tensor([2.6035, 2.8110, 1.9549, 1.8797, 2.0482, 0.9555, 0.2771, 3.8663, 0.4345,\n",
+      "        1.1604, 0.9746, 2.0739, 3.2628, 0.0825, 0.7749])\n"
+     ]
+    }
+   ],
+   "source": [
+    "x += 1\n",
+    "print(x)\n",
+    "print(y) # 也加了1"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "如果不想共享内存，推荐先用`clone`创造一个副本然后再使用`view`。"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 15,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "tensor([[ 1.6035,  1.8110,  0.9549],\n",
+      "        [ 0.8797,  1.0482, -0.0445],\n",
+      "        [-0.7229,  2.8663, -0.5655],\n",
+      "        [ 0.1604, -0.0254,  1.0739],\n",
+      "        [ 2.2628, -0.9175, -0.2251]])\n",
+      "tensor([2.6035, 2.8110, 1.9549, 1.8797, 2.0482, 0.9555, 0.2771, 3.8663, 0.4345,\n",
+      "        1.1604, 0.9746, 2.0739, 3.2628, 0.0825, 0.7749])\n"
+     ]
+    }
+   ],
+   "source": [
+    "x_cp = x.clone().view(15)\n",
+    "x -= 1\n",
+    "print(x)\n",
+    "print(x_cp)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "另外一个常用的函数就是`item()`, 它可以将一个标量`Tensor`转换成一个Python number："
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 16,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "tensor([2.3466])\n",
+      "2.3466382026672363\n"
+     ]
+    }
+   ],
+   "source": [
+    "x = torch.randn(1)\n",
+    "print(x)\n",
+    "print(x.item())"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 2.2.3 广播机制"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 17,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "tensor([[1, 2]])\n",
+      "tensor([[1],\n",
+      "        [2],\n",
+      "        [3]])\n",
+      "tensor([[2, 3],\n",
+      "        [3, 4],\n",
+      "        [4, 5]])\n"
+     ]
+    }
+   ],
+   "source": [
+    "x = torch.arange(1, 3).view(1, 2)\n",
+    "print(x)\n",
+    "y = torch.arange(1, 4).view(3, 1)\n",
+    "print(y)\n",
+    "print(x + y)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 2.2.4 运算的内存开销"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 18,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "False\n"
+     ]
+    }
+   ],
+   "source": [
+    "x = torch.tensor([1, 2])\n",
+    "y = torch.tensor([3, 4])\n",
+    "id_before = id(y)\n",
+    "y = y + x\n",
+    "print(id(y) == id_before)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 19,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "True\n"
+     ]
+    }
+   ],
+   "source": [
+    "x = torch.tensor([1, 2])\n",
+    "y = torch.tensor([3, 4])\n",
+    "id_before = id(y)\n",
+    "y[:] = y + x\n",
+    "print(id(y) == id_before)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 20,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "True\n"
+     ]
+    }
+   ],
+   "source": [
+    "x = torch.tensor([1, 2])\n",
+    "y = torch.tensor([3, 4])\n",
+    "id_before = id(y)\n",
+    "torch.add(x, y, out=y) # y += x, y.add_(x)\n",
+    "print(id(y) == id_before)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 2.2.5 `Tensor`和NumPy相互转换\n",
+    "**`numpy()`和`from_numpy()`这两个函数产生的`Tensor`和NumPy array实际是使用的相同的内存，改变其中一个时另一个也会改变！！！**\n",
+    "### `Tensor`转NumPy"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 21,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "tensor([1., 1., 1., 1., 1.]) [1. 1. 1. 1. 1.]\n",
+      "tensor([2., 2., 2., 2., 2.]) [2. 2. 2. 2. 2.]\n",
+      "tensor([3., 3., 3., 3., 3.]) [3. 3. 3. 3. 3.]\n"
+     ]
+    }
+   ],
+   "source": [
+    "a = torch.ones(5)\n",
+    "b = a.numpy()\n",
+    "print(a, b)\n",
+    "\n",
+    "a += 1\n",
+    "print(a, b)\n",
+    "b += 1\n",
+    "print(a, b)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### NumPy数组转`Tensor`"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 22,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "[1. 1. 1. 1. 1.] tensor([1., 1., 1., 1., 1.], dtype=torch.float64)\n",
+      "[2. 2. 2. 2. 2.] tensor([2., 2., 2., 2., 2.], dtype=torch.float64)\n",
+      "[3. 3. 3. 3. 3.] tensor([3., 3., 3., 3., 3.], dtype=torch.float64)\n"
+     ]
+    }
+   ],
+   "source": [
+    "import numpy as np\n",
+    "a = np.ones(5)\n",
+    "b = torch.from_numpy(a)\n",
+    "print(a, b)\n",
+    "\n",
+    "a += 1\n",
+    "print(a, b)\n",
+    "b += 1\n",
+    "print(a, b)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "直接用`torch.tensor()`将NumPy数组转换成`Tensor`，该方法总是会进行数据拷贝，返回的`Tensor`和原来的数据不再共享内存。"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 23,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "[4. 4. 4. 4. 4.] tensor([3., 3., 3., 3., 3.], dtype=torch.float64)\n"
+     ]
+    }
+   ],
+   "source": [
+    "# 用torch.tensor()转换时不会共享内存\n",
+    "c = torch.tensor(a)\n",
+    "a += 1\n",
+    "print(a, c)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 2.2.6 `Tensor` on GPU"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 24,
+   "metadata": {
+    "collapsed": true
+   },
+   "outputs": [],
+   "source": [
+    "# 以下代码只有在PyTorch GPU版本上才会执行\n",
+    "if torch.cuda.is_available():\n",
+    "    device = torch.device(\"cuda\")          # GPU\n",
+    "    y = torch.ones_like(x, device=device)  # 直接创建一个在GPU上的Tensor\n",
+    "    x = x.to(device)                       # 等价于 .to(\"cuda\")\n",
+    "    z = x + y\n",
+    "    print(z)\n",
+    "    print(z.to(\"cpu\", torch.double))       # to()还可以同时更改数据类型"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "collapsed": true
+   },
+   "outputs": [],
+   "source": []
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python [default]",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.6.3"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 1
+}
--- a/code/chapter02_prerequisite/2.3_autograd.ipynb
+++ b/code/chapter02_prerequisite/2.3_autograd.ipynb
@ -0,0 +1,448 @@
+{
+ "cells": [
+  {
+   "cell_type": "code",
+   "execution_count": 1,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "0.4.1\n"
+     ]
+    }
+   ],
+   "source": [
+    "import torch\n",
+    "\n",
+    "print(torch.__version__)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# 2.3 自动求梯度\n",
+    "## 2.3.1 概念\n",
+    "上一节介绍的`Tensor`是这个包的核心类，如果将其属性`.requires_grad`设置为`True`，它将开始追踪(track)在其上的所有操作。完成计算后，可以调用`.backward()`来完成所有梯度计算。此`Tensor`的梯度将累积到`.grad`属性中。\n",
+    "> 注意在调用`.backward()`时，如果`Tensor`是标量，则不需要为`backward()`指定任何参数；否则，需要指定一个求导变量。\n",
+    "\n",
+    "如果不想要被继续追踪，可以调用`.detach()`将其从追踪记录中分离出来，这样就可以防止将来的计算被追踪。此外，还可以用`with torch.no_grad()`将不想被追踪的操作代码块包裹起来，这种方法在评估模型的时候很常用，因为在评估模型时，我们并不需要计算可训练参数（`requires_grad=True`）的梯度。\n",
+    "\n",
+    "`Function`是另外一个很重要的类。`Tensor`和`Function`互相结合就可以构建一个记录有整个计算过程的非循环图。每个`Tensor`都有一个`.grad_fn`属性，该属性即创建该`Tensor`的`Function`（除非用户创建的`Tensor`s时设置了`grad_fn=None`）。\n",
+    "\n",
+    "下面通过一些例子来理解这些概念。\n",
+    "\n",
+    "## 2.3.2 `Tensor`"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 2,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "tensor([[1., 1.],\n",
+      "        [1., 1.]], requires_grad=True)\n",
+      "None\n"
+     ]
+    }
+   ],
+   "source": [
+    "x = torch.ones(2, 2, requires_grad=True)\n",
+    "print(x)\n",
+    "print(x.grad_fn)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 3,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "tensor([[3., 3.],\n",
+      "        [3., 3.]], grad_fn=<AddBackward>)\n",
+      "<AddBackward object at 0x10ed634a8>\n"
+     ]
+    }
+   ],
+   "source": [
+    "y = x + 2\n",
+    "print(y)\n",
+    "print(y.grad_fn)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "注意x是直接创建的，所以它没有`grad_fn`, 而y是通过一个加法操作创建的，所以它有一个为`<AddBackward>`的`grad_fn`。"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 4,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "True False\n"
+     ]
+    }
+   ],
+   "source": [
+    "print(x.is_leaf, y.is_leaf)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 5,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "tensor([[27., 27.],\n",
+      "        [27., 27.]], grad_fn=<MulBackward>) tensor(27., grad_fn=<MeanBackward1>)\n"
+     ]
+    }
+   ],
+   "source": [
+    "z = y * y * 3\n",
+    "out = z.mean()\n",
+    "print(z, out)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "通过`.requires_grad_()`来用in-place的方式改变`requires_grad`属性："
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 6,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "False\n",
+      "True\n",
+      "<SumBackward0 object at 0x10ed63c50>\n"
+     ]
+    }
+   ],
+   "source": [
+    "a = torch.randn(2, 2) # 缺失情况下默认 requires_grad = False\n",
+    "a = ((a * 3) / (a - 1))\n",
+    "print(a.requires_grad) # False\n",
+    "a.requires_grad_(True)\n",
+    "print(a.requires_grad) # True\n",
+    "b = (a * a).sum()\n",
+    "print(b.grad_fn)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 2.3.3 梯度 \n",
+    "\n",
+    "因为`out`是一个标量，所以调用`backward()`时不需要指定求导变量："
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 7,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "tensor([[4.5000, 4.5000],\n",
+      "        [4.5000, 4.5000]])\n"
+     ]
+    }
+   ],
+   "source": [
+    "out.backward() # 等价于 out.backward(torch.tensor(1.))\n",
+    "print(x.grad)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "我们令`out`为 $o$ , 因为\n",
+    "$$\n",
+    "o=\\frac14\\sum_{i=1}^4z_i=\\frac14\\sum_{i=1}^43(x_i+2)^2\n",
+    "$$\n",
+    "所以\n",
+    "$$\n",
+    "\\frac{\\partial{o}}{\\partial{x_i}}\\bigr\\rvert_{x_i=1}=\\frac{9}{2}=4.5\n",
+    "$$\n",
+    "所以上面的输出是正确的。"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "数学上，如果有一个函数值和自变量都为向量的函数 $\\vec{y}=f(\\vec{x})$, 那么 $\\vec{y}$ 关于 $\\vec{x}$ 的梯度就是一个雅可比矩阵（Jacobian matrix）:\n",
+    "\n",
+    "$$\n",
+    "J=\\left(\\begin{array}{ccc}\n",
+    "   \\frac{\\partial y_{1}}{\\partial x_{1}} & \\cdots & \\frac{\\partial y_{1}}{\\partial x_{n}}\\\\\n",
+    "   \\vdots & \\ddots & \\vdots\\\\\n",
+    "   \\frac{\\partial y_{m}}{\\partial x_{1}} & \\cdots & \\frac{\\partial y_{m}}{\\partial x_{n}}\n",
+    "   \\end{array}\\right)\n",
+    "$$\n",
+    "\n",
+    "而``torch.autograd``这个包就是用来计算一些雅克比矩阵的乘积的。例如，如果 $v$ 是一个标量函数的 $l=g\\left(\\vec{y}\\right)$ 的梯度：\n",
+    "\n",
+    "$$\n",
+    "v=\\left(\\begin{array}{ccc}\\frac{\\partial l}{\\partial y_{1}} & \\cdots & \\frac{\\partial l}{\\partial y_{m}}\\end{array}\\right)\n",
+    "$$\n",
+    "\n",
+    "那么根据链式法则我们有 $l$ 关于 $\\vec{x}$ 的雅克比矩阵就为:\n",
+    "\n",
+    "$$\n",
+    "v \\cdot J=\\left(\\begin{array}{ccc}\\frac{\\partial l}{\\partial y_{1}} & \\cdots & \\frac{\\partial l}{\\partial y_{m}}\\end{array}\\right) \\left(\\begin{array}{ccc}\n",
+    "   \\frac{\\partial y_{1}}{\\partial x_{1}} & \\cdots & \\frac{\\partial y_{1}}{\\partial x_{n}}\\\\\n",
+    "   \\vdots & \\ddots & \\vdots\\\\\n",
+    "   \\frac{\\partial y_{m}}{\\partial x_{1}} & \\cdots & \\frac{\\partial y_{m}}{\\partial x_{n}}\n",
+    "   \\end{array}\\right)=\\left(\\begin{array}{ccc}\\frac{\\partial l}{\\partial x_{1}} & \\cdots & \\frac{\\partial l}{\\partial x_{n}}\\end{array}\\right)\n",
+    "$$"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "注意：grad在反向传播过程中是累加的(accumulated)，这意味着每一次运行反向传播，梯度都会累加之前的梯度，所以一般在反向传播之前需把梯度清零。"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 8,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "tensor([[5.5000, 5.5000],\n",
+      "        [5.5000, 5.5000]])\n",
+      "tensor([[1., 1.],\n",
+      "        [1., 1.]])\n"
+     ]
+    }
+   ],
+   "source": [
+    "# 再来反向传播一次，注意grad是累加的\n",
+    "out2 = x.sum()\n",
+    "out2.backward()\n",
+    "print(x.grad)\n",
+    "\n",
+    "out3 = x.sum()\n",
+    "x.grad.data.zero_()\n",
+    "out3.backward()\n",
+    "print(x.grad)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 9,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "tensor([[2., 4.],\n",
+      "        [6., 8.]], grad_fn=<ViewBackward>)\n"
+     ]
+    }
+   ],
+   "source": [
+    "x = torch.tensor([1.0, 2.0, 3.0, 4.0], requires_grad=True)\n",
+    "y = 2 * x\n",
+    "z = y.view(2, 2)\n",
+    "print(z)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "现在 `y` 不是一个标量，所以在调用`backward`时需要传入一个和`y`同形的权重向量进行加权求和得到一个标量。"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 10,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "tensor([2.0000, 0.2000, 0.0200, 0.0020])\n"
+     ]
+    }
+   ],
+   "source": [
+    "v = torch.tensor([[1.0, 0.1], [0.01, 0.001]], dtype=torch.float)\n",
+    "z.backward(v)\n",
+    "\n",
+    "print(x.grad)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "再来看看中断梯度追踪的例子："
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 11,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "tensor(1., requires_grad=True) True\n",
+      "tensor(1., grad_fn=<PowBackward0>) True\n",
+      "tensor(1.) False\n",
+      "tensor(2., grad_fn=<ThAddBackward>) True\n"
+     ]
+    }
+   ],
+   "source": [
+    "x = torch.tensor(1.0, requires_grad=True)\n",
+    "y1 = x ** 2 \n",
+    "with torch.no_grad():\n",
+    "    y2 = x ** 3\n",
+    "y3 = y1 + y2\n",
+    "    \n",
+    "print(x, x.requires_grad)\n",
+    "print(y1, y1.requires_grad)\n",
+    "print(y2, y2.requires_grad)\n",
+    "print(y3, y3.requires_grad)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 12,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "tensor(2.)\n"
+     ]
+    }
+   ],
+   "source": [
+    "y3.backward()\n",
+    "print(x.grad)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "为什么是2呢？$ y_3 = y_1 + y_2 = x^2 + x^3$，当 $x=1$ 时 $\\frac {dy_3} {dx}$ 不应该是5吗？事实上，由于 $y_2$ 的定义是被`torch.no_grad():`包裹的，所以与 $y_2$ 有关的梯度是不会回传的，只有与 $y_1$ 有关的梯度才会回传，即 $x^2$ 对 $x$ 的梯度。\n",
+    "\n",
+    "上面提到，`y2.requires_grad=False`，所以不能调用 `y2.backward()`。"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 13,
+   "metadata": {
+    "collapsed": true
+   },
+   "outputs": [],
+   "source": [
+    "# y2.backward() # 会报错 RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "如果我们想要修改`tensor`的数值，但是又不希望被`autograd`记录（即不会影响反向传播），那么我么可以对`tensor.data`进行操作."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 14,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "tensor([1.])\n",
+      "False\n",
+      "tensor([100.], requires_grad=True)\n",
+      "tensor([2.])\n"
+     ]
+    }
+   ],
+   "source": [
+    "x = torch.ones(1,requires_grad=True)\n",
+    "\n",
+    "print(x.data) # 还是一个tensor\n",
+    "print(x.data.requires_grad) # 但是已经是独立于计算图之外\n",
+    "\n",
+    "y = 2 * x\n",
+    "x.data *= 100 # 只改变了值，不会记录在计算图，所以不会影响梯度传播\n",
+    "\n",
+    "y.backward()\n",
+    "print(x) # 更改data的值也会影响tensor的值\n",
+    "print(x.grad)"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python [default]",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.6.3"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 1
+}
--- a/code/chapter03_DL-basics/3.10_mlp-pytorch.ipynb
+++ b/code/chapter03_DL-basics/3.10_mlp-pytorch.ipynb
@ -0,0 +1,129 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# 3.10 多层感知机的简洁实现"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 1,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "0.4.1\n"
+     ]
+    }
+   ],
+   "source": [
+    "import torch\n",
+    "from torch import nn\n",
+    "from torch.nn import init\n",
+    "import numpy as np\n",
+    "import sys\n",
+    "sys.path.append(\"..\") \n",
+    "import d2lzh_pytorch as d2l\n",
+    "\n",
+    "print(torch.__version__)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 3.10.1 定义模型"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 2,
+   "metadata": {
+    "collapsed": true
+   },
+   "outputs": [],
+   "source": [
+    "num_inputs, num_outputs, num_hiddens = 784, 10, 256\n",
+    "    \n",
+    "net = nn.Sequential(\n",
+    "        d2l.FlattenLayer(),\n",
+    "        nn.Linear(num_inputs, num_hiddens),\n",
+    "        nn.ReLU(),\n",
+    "        nn.Linear(num_hiddens, num_outputs), \n",
+    "        )\n",
+    "    \n",
+    "for params in net.parameters():\n",
+    "    init.normal_(params, mean=0, std=0.01)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 3.10.2 读取数据并训练模型"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 3,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "epoch 1, loss 0.0031, train acc 0.703, test acc 0.757\n",
+      "epoch 2, loss 0.0019, train acc 0.824, test acc 0.822\n",
+      "epoch 3, loss 0.0016, train acc 0.845, test acc 0.825\n",
+      "epoch 4, loss 0.0015, train acc 0.855, test acc 0.811\n",
+      "epoch 5, loss 0.0014, train acc 0.865, test acc 0.846\n"
+     ]
+    }
+   ],
+   "source": [
+    "batch_size = 256\n",
+    "train_iter, test_iter = d2l.load_data_fashion_mnist(batch_size)\n",
+    "loss = torch.nn.CrossEntropyLoss()\n",
+    "\n",
+    "optimizer = torch.optim.SGD(net.parameters(), lr=0.5)\n",
+    "\n",
+    "num_epochs = 5\n",
+    "d2l.train_ch3(net, train_iter, test_iter, loss, num_epochs, batch_size, None, None, optimizer)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "collapsed": true
+   },
+   "outputs": [],
+   "source": []
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python [default]",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.6.3"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}
--- a/code/chapter03_DL-basics/3.11_underfit-overfit.ipynb
+++ b/code/chapter03_DL-basics/3.11_underfit-overfit.ipynb
--- a/code/chapter03_DL-basics/3.12_weight-decay.ipynb
+++ b/code/chapter03_DL-basics/3.12_weight-decay.ipynb
--- a/code/chapter03_DL-basics/3.13_dropout.ipynb
+++ b/code/chapter03_DL-basics/3.13_dropout.ipynb
@ -0,0 +1,278 @@
+{
+ "cells": [
+  {
+   "cell_type": "code",
+   "execution_count": 1,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "1.2.0\n"
+     ]
+    }
+   ],
+   "source": [
+    "%matplotlib inline\n",
+    "import torch\n",
+    "import torch.nn as nn\n",
+    "import numpy as np\n",
+    "import sys\n",
+    "sys.path.append(\"..\") \n",
+    "import d2lzh_pytorch as d2l\n",
+    "\n",
+    "print(torch.__version__)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 2,
+   "metadata": {
+    "collapsed": true
+   },
+   "outputs": [],
+   "source": [
+    "def dropout(X, drop_prob):\n",
+    "    X = X.float()\n",
+    "    assert 0 <= drop_prob <= 1\n",
+    "    keep_prob = 1 - drop_prob\n",
+    "    # 这种情况下把全部元素都丢弃\n",
+    "    if keep_prob == 0:\n",
+    "        return torch.zeros_like(X)\n",
+    "    mask = (torch.rand(X.shape) < keep_prob).float()\n",
+    "    \n",
+    "    return mask * X / keep_prob"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 3,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "tensor([[ 0.,  1.,  2.,  3.,  4.,  5.,  6.,  7.],\n",
+       "        [ 8.,  9., 10., 11., 12., 13., 14., 15.]])"
+      ]
+     },
+     "execution_count": 3,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "X = torch.arange(16).view(2, 8)\n",
+    "dropout(X, 0)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 4,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "tensor([[ 0.,  0.,  4.,  6.,  0.,  0., 12., 14.],\n",
+       "        [ 0., 18., 20., 22.,  0.,  0., 28.,  0.]])"
+      ]
+     },
+     "execution_count": 4,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "dropout(X, 0.5)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 5,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "tensor([[0., 0., 0., 0., 0., 0., 0., 0.],\n",
+       "        [0., 0., 0., 0., 0., 0., 0., 0.]])"
+      ]
+     },
+     "execution_count": 5,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "dropout(X, 1.0)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 6,
+   "metadata": {
+    "collapsed": true
+   },
+   "outputs": [],
+   "source": [
+    "num_inputs, num_outputs, num_hiddens1, num_hiddens2 = 784, 10, 256, 256\n",
+    "\n",
+    "W1 = torch.tensor(np.random.normal(0, 0.01, size=(num_inputs, num_hiddens1)), dtype=torch.float, requires_grad=True)\n",
+    "b1 = torch.zeros(num_hiddens1, requires_grad=True)\n",
+    "W2 = torch.tensor(np.random.normal(0, 0.01, size=(num_hiddens1, num_hiddens2)), dtype=torch.float, requires_grad=True)\n",
+    "b2 = torch.zeros(num_hiddens2, requires_grad=True)\n",
+    "W3 = torch.tensor(np.random.normal(0, 0.01, size=(num_hiddens2, num_outputs)), dtype=torch.float, requires_grad=True)\n",
+    "b3 = torch.zeros(num_outputs, requires_grad=True)\n",
+    "\n",
+    "params = [W1, b1, W2, b2, W3, b3]"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 7,
+   "metadata": {
+    "collapsed": true
+   },
+   "outputs": [],
+   "source": [
+    "drop_prob1, drop_prob2 = 0.2, 0.5\n",
+    "\n",
+    "def net(X, is_training=True):\n",
+    "    X = X.view(-1, num_inputs)\n",
+    "    H1 = (torch.matmul(X, W1) + b1).relu()\n",
+    "    if is_training:  # 只在训练模型时使用丢弃法\n",
+    "        H1 = dropout(H1, drop_prob1)  # 在第一层全连接后添加丢弃层\n",
+    "    H2 = (torch.matmul(H1, W2) + b2).relu()\n",
+    "    if is_training:\n",
+    "        H2 = dropout(H2, drop_prob2)  # 在第二层全连接后添加丢弃层\n",
+    "    return torch.matmul(H2, W3) + b3"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 8,
+   "metadata": {
+    "collapsed": true
+   },
+   "outputs": [],
+   "source": [
+    "# def evaluate_accuracy(data_iter, net):\n",
+    "#     acc_sum, n = 0.0, 0\n",
+    "#     for X, y in data_iter:\n",
+    "#         if isinstance(net, torch.nn.Module):\n",
+    "#             net.eval() # 评估模式, 这会关闭dropout\n",
+    "#             acc_sum += (net(X).argmax(dim=1) == y).float().sum().item()\n",
+    "#             net.train() # 改回训练模式\n",
+    "#         else: # 自定义的模型\n",
+    "#             if('is_training' in net.__code__.co_varnames): # 如果有is_training这个参数\n",
+    "#                 # 将is_training设置成False\n",
+    "#                 acc_sum += (net(X, is_training=False).argmax(dim=1) == y).float().sum().item() \n",
+    "#             else:\n",
+    "#                 acc_sum += (net(X).argmax(dim=1) == y).float().sum().item() \n",
+    "#         n += y.shape[0]\n",
+    "#     return acc_sum / n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 9,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "epoch 1, loss 0.0045, train acc 0.561, test acc 0.662\n",
+      "epoch 2, loss 0.0023, train acc 0.783, test acc 0.786\n",
+      "epoch 3, loss 0.0019, train acc 0.823, test acc 0.773\n",
+      "epoch 4, loss 0.0017, train acc 0.838, test acc 0.847\n",
+      "epoch 5, loss 0.0016, train acc 0.848, test acc 0.809\n"
+     ]
+    }
+   ],
+   "source": [
+    "num_epochs, lr, batch_size = 5, 100.0, 256 # 这里的学习率设置的很大，原因同3.9.6节。\n",
+    "loss = torch.nn.CrossEntropyLoss()\n",
+    "train_iter, test_iter = d2l.load_data_fashion_mnist(batch_size)\n",
+    "d2l.train_ch3(net, train_iter, test_iter, loss, num_epochs, batch_size, params, lr)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 10,
+   "metadata": {
+    "collapsed": true
+   },
+   "outputs": [],
+   "source": [
+    "net = nn.Sequential(\n",
+    "        d2l.FlattenLayer(),\n",
+    "        nn.Linear(num_inputs, num_hiddens1),\n",
+    "        nn.ReLU(),\n",
+    "        nn.Dropout(drop_prob1),\n",
+    "        nn.Linear(num_hiddens1, num_hiddens2), \n",
+    "        nn.ReLU(),\n",
+    "        nn.Dropout(drop_prob2),\n",
+    "        nn.Linear(num_hiddens2, 10)\n",
+    "        )\n",
+    "\n",
+    "for param in net.parameters():\n",
+    "    nn.init.normal_(param, mean=0, std=0.01)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 11,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "epoch 1, loss 0.0048, train acc 0.526, test acc 0.743\n",
+      "epoch 2, loss 0.0023, train acc 0.779, test acc 0.764\n",
+      "epoch 3, loss 0.0020, train acc 0.815, test acc 0.819\n",
+      "epoch 4, loss 0.0018, train acc 0.836, test acc 0.814\n",
+      "epoch 5, loss 0.0016, train acc 0.848, test acc 0.842\n"
+     ]
+    }
+   ],
+   "source": [
+    "optimizer = torch.optim.SGD(net.parameters(), lr=0.5)\n",
+    "d2l.train_ch3(net, train_iter, test_iter, loss, num_epochs, batch_size, None, None, optimizer)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "collapsed": true
+   },
+   "outputs": [],
+   "source": []
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.6.2"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}
--- a/code/chapter03_DL-basics/3.16_kaggle-house-price.ipynb
+++ b/code/chapter03_DL-basics/3.16_kaggle-house-price.ipynb
--- a/code/chapter03_DL-basics/3.1_linear-regression.ipynb
+++ b/code/chapter03_DL-basics/3.1_linear-regression.ipynb
@ -0,0 +1,160 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# 3.1 线性回归"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 1,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "0.4.1\n"
+     ]
+    }
+   ],
+   "source": [
+    "import torch\n",
+    "from time import time\n",
+    "\n",
+    "print(torch.__version__)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 2,
+   "metadata": {
+    "collapsed": true
+   },
+   "outputs": [],
+   "source": [
+    "a = torch.ones(1000)\n",
+    "b = torch.ones(1000)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "将这两个向量按元素逐一做标量加法:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 3,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "0.020173072814941406\n"
+     ]
+    }
+   ],
+   "source": [
+    "start = time()\n",
+    "c = torch.zeros(1000)\n",
+    "for i in range(1000):\n",
+    "    c[i] = a[i] + b[i]\n",
+    "print(time() - start)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "将这两个向量直接做矢量加法:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 4,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "8.20159912109375e-05\n"
+     ]
+    }
+   ],
+   "source": [
+    "start = time()\n",
+    "d = a + b\n",
+    "print(time() - start)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "**结果很明显，后者比前者更省时。因此，我们应该尽可能采用矢量计算，以提升计算效率。**"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "广播机制例子🌰："
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 5,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "tensor([11., 11., 11.])\n"
+     ]
+    }
+   ],
+   "source": [
+    "a = torch.ones(3)\n",
+    "b = 10\n",
+    "print(a + b)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "collapsed": true
+   },
+   "outputs": [],
+   "source": []
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python [default]",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.6.3"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}
--- a/code/chapter03_DL-basics/3.2_linear-regression-scratch.ipynb
+++ b/code/chapter03_DL-basics/3.2_linear-regression-scratch.ipynb
--- a/code/chapter03_DL-basics/3.3_linear-regression-pytorch.ipynb
+++ b/code/chapter03_DL-basics/3.3_linear-regression-pytorch.ipynb
@ -0,0 +1,430 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# 3.3 线性回归的简洁实现"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 1,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "0.4.1\n"
+     ]
+    }
+   ],
+   "source": [
+    "import torch\n",
+    "from torch import nn\n",
+    "import numpy as np\n",
+    "torch.manual_seed(1)\n",
+    "\n",
+    "print(torch.__version__)\n",
+    "torch.set_default_tensor_type('torch.FloatTensor')"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 3.3.1 生成数据集"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 2,
+   "metadata": {
+    "collapsed": true
+   },
+   "outputs": [],
+   "source": [
+    "num_inputs = 2\n",
+    "num_examples = 1000\n",
+    "true_w = [2, -3.4]\n",
+    "true_b = 4.2\n",
+    "features = torch.tensor(np.random.normal(0, 1, (num_examples, num_inputs)), dtype=torch.float)\n",
+    "labels = true_w[0] * features[:, 0] + true_w[1] * features[:, 1] + true_b\n",
+    "labels += torch.tensor(np.random.normal(0, 0.01, size=labels.size()), dtype=torch.float)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 3.3.2 读取数据"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 3,
+   "metadata": {
+    "collapsed": true
+   },
+   "outputs": [],
+   "source": [
+    "import torch.utils.data as Data\n",
+    "\n",
+    "batch_size = 10\n",
+    "\n",
+    "# 将训练数据的特征和标签组合\n",
+    "dataset = Data.TensorDataset(features, labels)\n",
+    "\n",
+    "# 把 dataset 放入 DataLoader\n",
+    "data_iter = Data.DataLoader(\n",
+    "    dataset=dataset,      # torch TensorDataset format\n",
+    "    batch_size=batch_size,      # mini batch size\n",
+    "    shuffle=True,               # 要不要打乱数据 (打乱比较好)\n",
+    "    num_workers=2,              # 多线程来读数据\n",
+    ")"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 4,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "tensor([[-0.0163, -1.0072],\n",
+      "        [-0.3554, -0.1807],\n",
+      "        [-1.2406, -2.3683],\n",
+      "        [ 1.3847,  1.9209],\n",
+      "        [-0.7570, -0.3135],\n",
+      "        [ 0.3181, -0.8122],\n",
+      "        [-0.3864,  0.0382],\n",
+      "        [ 1.0939, -0.1225],\n",
+      "        [ 0.7272,  0.4801],\n",
+      "        [ 0.6706, -0.7972]]) \n",
+      " tensor([7.6005, 4.1017, 9.7864, 0.4568, 3.7355, 7.5675, 3.2881, 6.7967, 4.0404,\n",
+      "        8.2513])\n"
+     ]
+    }
+   ],
+   "source": [
+    "for X, y in data_iter:\n",
+    "    print(X, '\\n', y)\n",
+    "    break"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 3.3.3 定义模型"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 5,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "LinearNet(\n",
+      "  (linear): Linear(in_features=2, out_features=1, bias=True)\n",
+      ")\n"
+     ]
+    }
+   ],
+   "source": [
+    "class LinearNet(nn.Module):\n",
+    "    def __init__(self, n_feature):\n",
+    "        super(LinearNet, self).__init__()\n",
+    "        self.linear = nn.Linear(n_feature, 1)\n",
+    "\n",
+    "    def forward(self, x):\n",
+    "        y = self.linear(x)\n",
+    "        return y\n",
+    "    \n",
+    "net = LinearNet(num_inputs)\n",
+    "print(net) # 使用print可以打印出网络的结构"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 6,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Sequential(\n",
+      "  (linear): Linear(in_features=2, out_features=1, bias=True)\n",
+      ")\n",
+      "Linear(in_features=2, out_features=1, bias=True)\n"
+     ]
+    }
+   ],
+   "source": [
+    "# 写法一\n",
+    "net = nn.Sequential(\n",
+    "    nn.Linear(num_inputs, 1)\n",
+    "    # 此处还可以传入其他层\n",
+    "    )\n",
+    "\n",
+    "# 写法二\n",
+    "net = nn.Sequential()\n",
+    "net.add_module('linear', nn.Linear(num_inputs, 1))\n",
+    "# net.add_module ......\n",
+    "\n",
+    "# 写法三\n",
+    "from collections import OrderedDict\n",
+    "net = nn.Sequential(OrderedDict([\n",
+    "          ('linear', nn.Linear(num_inputs, 1))\n",
+    "          # ......\n",
+    "        ]))\n",
+    "\n",
+    "print(net)\n",
+    "print(net[0])"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 7,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Parameter containing:\n",
+      "tensor([[0.5347, 0.7057]], requires_grad=True)\n",
+      "Parameter containing:\n",
+      "tensor([0.6873], requires_grad=True)\n"
+     ]
+    }
+   ],
+   "source": [
+    "for param in net.parameters():\n",
+    "    print(param)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 3.3.4 初始化模型参数"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 8,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "Parameter containing:\n",
+       "tensor([0.], requires_grad=True)"
+      ]
+     },
+     "execution_count": 8,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "from torch.nn import init\n",
+    "\n",
+    "init.normal_(net[0].weight, mean=0.0, std=0.01)\n",
+    "init.constant_(net[0].bias, val=0.0)  # 也可以直接修改bias的data: net[0].bias.data.fill_(0)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 9,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Parameter containing:\n",
+      "tensor([[-0.0142, -0.0161]], requires_grad=True)\n",
+      "Parameter containing:\n",
+      "tensor([0.], requires_grad=True)\n"
+     ]
+    }
+   ],
+   "source": [
+    "for param in net.parameters():\n",
+    "    print(param)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 3.3.5 定义损失函数"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 10,
+   "metadata": {
+    "collapsed": true
+   },
+   "outputs": [],
+   "source": [
+    "loss = nn.MSELoss()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 3.3.6 定义优化算法"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 11,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "SGD (\n",
+      "Parameter Group 0\n",
+      "    dampening: 0\n",
+      "    lr: 0.03\n",
+      "    momentum: 0\n",
+      "    nesterov: False\n",
+      "    weight_decay: 0\n",
+      ")\n"
+     ]
+    }
+   ],
+   "source": [
+    "import torch.optim as optim\n",
+    "\n",
+    "optimizer = optim.SGD(net.parameters(), lr=0.03)\n",
+    "print(optimizer)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 12,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# 为不同子网络设置不同的学习率\n",
+    "# optimizer =optim.SGD([\n",
+    "#                 # 如果对某个参数不指定学习率，就使用最外层的默认学习率\n",
+    "#                 {'params': net.subnet1.parameters()}, # lr=0.03\n",
+    "#                 {'params': net.subnet2.parameters(), 'lr': 0.01}\n",
+    "#             ], lr=0.03)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 13,
+   "metadata": {
+    "collapsed": true
+   },
+   "outputs": [],
+   "source": [
+    "# # 调整学习率\n",
+    "# for param_group in optimizer.param_groups:\n",
+    "#     param_group['lr'] *= 0.1 # 学习率为之前的0.1倍"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 3.3.7 训练模型"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 14,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "epoch 1, loss: 0.000457\n",
+      "epoch 2, loss: 0.000081\n",
+      "epoch 3, loss: 0.000198\n"
+     ]
+    }
+   ],
+   "source": [
+    "num_epochs = 3\n",
+    "for epoch in range(1, num_epochs + 1):\n",
+    "    for X, y in data_iter:\n",
+    "        output = net(X)\n",
+    "        l = loss(output, y.view(-1, 1))\n",
+    "        optimizer.zero_grad() # 梯度清零，等价于net.zero_grad()\n",
+    "        l.backward()\n",
+    "        optimizer.step()\n",
+    "    print('epoch %d, loss: %f' % (epoch, l.item()))"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 15,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "[2, -3.4] tensor([[ 1.9999, -3.4005]])\n",
+      "4.2 tensor([4.2011])\n"
+     ]
+    }
+   ],
+   "source": [
+    "dense = net[0]\n",
+    "print(true_w, dense.weight.data)\n",
+    "print(true_b, dense.bias.data)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "collapsed": true
+   },
+   "outputs": [],
+   "source": []
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python [default]",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.6.3"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}
--- a/code/chapter03_DL-basics/3.5_fashion-mnist.ipynb
+++ b/code/chapter03_DL-basics/3.5_fashion-mnist.ipynb
--- a/code/chapter03_DL-basics/3.6_softmax-regression-scratch.ipynb
+++ b/code/chapter03_DL-basics/3.6_softmax-regression-scratch.ipynb
--- a/code/chapter03_DL-basics/3.7_softmax-regression-pytorch.ipynb
+++ b/code/chapter03_DL-basics/3.7_softmax-regression-pytorch.ipynb
@ -0,0 +1,205 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# 3.7 softmax回归的简洁实现"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 1,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "0.4.1\n"
+     ]
+    }
+   ],
+   "source": [
+    "import torch\n",
+    "from torch import nn\n",
+    "from torch.nn import init\n",
+    "import numpy as np\n",
+    "import sys\n",
+    "sys.path.append(\"..\") \n",
+    "import d2lzh_pytorch as d2l\n",
+    "\n",
+    "print(torch.__version__)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 3.7.1 获取和读取数据"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 2,
+   "metadata": {
+    "collapsed": true
+   },
+   "outputs": [],
+   "source": [
+    "batch_size = 256\n",
+    "train_iter, test_iter = d2l.load_data_fashion_mnist(batch_size)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 3.7.2 定义和初始化模型"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 3,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "num_inputs = 784\n",
+    "num_outputs = 10\n",
+    "\n",
+    "# class LinearNet(nn.Module):\n",
+    "#     def __init__(self, num_inputs, num_outputs):\n",
+    "#         super(LinearNet, self).__init__()\n",
+    "#         self.linear = nn.Linear(num_inputs, num_outputs)\n",
+    "#     def forward(self, x): # x shape: (batch, 1, 28, 28)\n",
+    "#         y = self.linear(x.view(x.shape[0], -1))\n",
+    "#         return y\n",
+    "    \n",
+    "# net = LinearNet(num_inputs, num_outputs)\n",
+    "\n",
+    "class FlattenLayer(nn.Module):\n",
+    "    def __init__(self):\n",
+    "        super(FlattenLayer, self).__init__()\n",
+    "    def forward(self, x): # x shape: (batch, *, *, ...)\n",
+    "        return x.view(x.shape[0], -1)\n",
+    "\n",
+    "from collections import OrderedDict\n",
+    "net = nn.Sequential(\n",
+    "        # FlattenLayer(),\n",
+    "        # nn.Linear(num_inputs, num_outputs)\n",
+    "        OrderedDict([\n",
+    "          ('flatten', FlattenLayer()),\n",
+    "          ('linear', nn.Linear(num_inputs, num_outputs))])\n",
+    "        )"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 4,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "Parameter containing:\n",
+       "tensor([0., 0., 0., 0., 0., 0., 0., 0., 0., 0.], requires_grad=True)"
+      ]
+     },
+     "execution_count": 4,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "init.normal_(net.linear.weight, mean=0, std=0.01)\n",
+    "init.constant_(net.linear.bias, val=0) "
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 3.7.3 softmax和交叉熵损失函数"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 5,
+   "metadata": {
+    "collapsed": true
+   },
+   "outputs": [],
+   "source": [
+    "loss = nn.CrossEntropyLoss()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 3.7.4 定义优化算法"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 6,
+   "metadata": {
+    "collapsed": true
+   },
+   "outputs": [],
+   "source": [
+    "optimizer = torch.optim.SGD(net.parameters(), lr=0.1)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 3.7.5 训练模型"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 7,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "epoch 1, loss 0.0031, train acc 0.748, test acc 0.785\n",
+      "epoch 2, loss 0.0022, train acc 0.813, test acc 0.802\n",
+      "epoch 3, loss 0.0021, train acc 0.824, test acc 0.808\n",
+      "epoch 4, loss 0.0020, train acc 0.833, test acc 0.824\n",
+      "epoch 5, loss 0.0019, train acc 0.837, test acc 0.806\n"
+     ]
+    }
+   ],
+   "source": [
+    "num_epochs = 5\n",
+    "d2l.train_ch3(net, train_iter, test_iter, loss, num_epochs, batch_size, None, None, optimizer)"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python [default]",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.6.3"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}
--- a/code/chapter03_DL-basics/3.8_mlp.ipynb
+++ b/code/chapter03_DL-basics/3.8_mlp.ipynb
--- a/code/chapter03_DL-basics/3.9_mlp-scratch.ipynb
+++ b/code/chapter03_DL-basics/3.9_mlp-scratch.ipynb
@ -0,0 +1,197 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# 3.9 多层感知机的从零开始实现"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 1,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "0.4.1\n"
+     ]
+    }
+   ],
+   "source": [
+    "import torch\n",
+    "import numpy as np\n",
+    "import sys\n",
+    "sys.path.append(\"..\") # 为了导入上层目录的d2lzh_pytorch\n",
+    "import d2lzh_pytorch as d2l\n",
+    "\n",
+    "print(torch.__version__)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 3.9.1 获取和读取数据"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 2,
+   "metadata": {
+    "collapsed": true
+   },
+   "outputs": [],
+   "source": [
+    "batch_size = 256\n",
+    "train_iter, test_iter = d2l.load_data_fashion_mnist(batch_size)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 3.9.2 定义模型参数"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 3,
+   "metadata": {
+    "collapsed": true
+   },
+   "outputs": [],
+   "source": [
+    "num_inputs, num_outputs, num_hiddens = 784, 10, 256\n",
+    "\n",
+    "W1 = torch.tensor(np.random.normal(0, 0.01, (num_inputs, num_hiddens)), dtype=torch.float)\n",
+    "b1 = torch.zeros(num_hiddens, dtype=torch.float)\n",
+    "W2 = torch.tensor(np.random.normal(0, 0.01, (num_hiddens, num_outputs)), dtype=torch.float)\n",
+    "b2 = torch.zeros(num_outputs, dtype=torch.float)\n",
+    "\n",
+    "params = [W1, b1, W2, b2]\n",
+    "for param in params:\n",
+    "    param.requires_grad_(requires_grad=True)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 3.9.3 定义激活函数"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 4,
+   "metadata": {
+    "collapsed": true
+   },
+   "outputs": [],
+   "source": [
+    "def relu(X):\n",
+    "    return torch.max(input=X, other=torch.tensor(0.0))"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 3.9.4 定义模型"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 5,
+   "metadata": {
+    "collapsed": true
+   },
+   "outputs": [],
+   "source": [
+    "def net(X):\n",
+    "    X = X.view((-1, num_inputs))\n",
+    "    H = relu(torch.matmul(X, W1) + b1)\n",
+    "    return torch.matmul(H, W2) + b2"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 3.9.5 定义损失函数"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 6,
+   "metadata": {
+    "collapsed": true
+   },
+   "outputs": [],
+   "source": [
+    "loss = torch.nn.CrossEntropyLoss()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 3.9.6 训练模型"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 7,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "epoch 1, loss 0.0030, train acc 0.714, test acc 0.753\n",
+      "epoch 2, loss 0.0019, train acc 0.821, test acc 0.777\n",
+      "epoch 3, loss 0.0017, train acc 0.842, test acc 0.834\n",
+      "epoch 4, loss 0.0015, train acc 0.857, test acc 0.839\n",
+      "epoch 5, loss 0.0014, train acc 0.865, test acc 0.845\n"
+     ]
+    }
+   ],
+   "source": [
+    "num_epochs, lr = 5, 100.0\n",
+    "d2l.train_ch3(net, train_iter, test_iter, loss, num_epochs, batch_size, params, lr)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "collapsed": true
+   },
+   "outputs": [],
+   "source": []
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python [default]",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.6.3"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}
--- a/code/chapter03_DL-basics/submission.csv
+++ b/code/chapter03_DL-basics/submission.csv
--- a/code/chapter04_DL_computation/4.1_model-construction.ipynb
+++ b/code/chapter04_DL_computation/4.1_model-construction.ipynb
@ -0,0 +1,468 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# 4.1 模型构造"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 1,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "1.2.0\n"
+     ]
+    }
+   ],
+   "source": [
+    "import torch\n",
+    "from torch import nn\n",
+    "\n",
+    "print(torch.__version__)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 4.1.1 继承`Module`类来构造模型"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 2,
+   "metadata": {
+    "collapsed": true
+   },
+   "outputs": [],
+   "source": [
+    "class MLP(nn.Module):\n",
+    "    # 声明带有模型参数的层，这里声明了两个全连接层\n",
+    "    def __init__(self, **kwargs):\n",
+    "        # 调用MLP父类Block的构造函数来进行必要的初始化。这样在构造实例时还可以指定其他函数\n",
+    "        # 参数，如“模型参数的访问、初始化和共享”一节将介绍的模型参数params\n",
+    "        super(MLP, self).__init__(**kwargs)\n",
+    "        self.hidden = nn.Linear(784, 256) # 隐藏层\n",
+    "        self.act = nn.ReLU()\n",
+    "        self.output = nn.Linear(256, 10)  # 输出层\n",
+    "         \n",
+    "\n",
+    "    # 定义模型的前向计算，即如何根据输入x计算返回所需要的模型输出\n",
+    "    def forward(self, x):\n",
+    "        a = self.act(self.hidden(x))\n",
+    "        return self.output(a)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 3,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "MLP(\n",
+      "  (hidden): Linear(in_features=784, out_features=256, bias=True)\n",
+      "  (act): ReLU()\n",
+      "  (output): Linear(in_features=256, out_features=10, bias=True)\n",
+      ")\n"
+     ]
+    },
+    {
+     "data": {
+      "text/plain": [
+       "tensor([[ 0.0234, -0.2646, -0.1168, -0.2127,  0.0884, -0.0456,  0.0811,  0.0297,\n",
+       "          0.2032,  0.1364],\n",
+       "        [ 0.1479, -0.1545, -0.0265, -0.2119, -0.0543, -0.0086,  0.0902, -0.1017,\n",
+       "          0.1504,  0.1144]], grad_fn=<AddmmBackward>)"
+      ]
+     },
+     "execution_count": 3,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "X = torch.rand(2, 784)\n",
+    "net = MLP()\n",
+    "print(net)\n",
+    "net(X)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 4.1.2 `Module`的子类\n",
+    "### 4.1.2.1 `Sequential`类"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 4,
+   "metadata": {
+    "collapsed": true
+   },
+   "outputs": [],
+   "source": [
+    "class MySequential(nn.Module):\n",
+    "    from collections import OrderedDict\n",
+    "    def __init__(self, *args):\n",
+    "        super(MySequential, self).__init__()\n",
+    "        if len(args) == 1 and isinstance(args[0], OrderedDict): # 如果传入的是一个OrderedDict\n",
+    "            for key, module in args[0].items():\n",
+    "                self.add_module(key, module)  # add_module方法会将module添加进self._modules(一个OrderedDict)\n",
+    "        else:  # 传入的是一些Module\n",
+    "            for idx, module in enumerate(args):\n",
+    "                self.add_module(str(idx), module)\n",
+    "    def forward(self, input):\n",
+    "        # self._modules返回一个 OrderedDict，保证会按照成员添加时的顺序遍历成\n",
+    "        for module in self._modules.values():\n",
+    "            input = module(input)\n",
+    "        return input"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 5,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "MySequential(\n",
+      "  (0): Linear(in_features=784, out_features=256, bias=True)\n",
+      "  (1): ReLU()\n",
+      "  (2): Linear(in_features=256, out_features=10, bias=True)\n",
+      ")\n"
+     ]
+    },
+    {
+     "data": {
+      "text/plain": [
+       "tensor([[ 0.1273,  0.1642, -0.1060,  0.1401,  0.0609, -0.0199, -0.0140, -0.0588,\n",
+       "          0.1765, -0.1296],\n",
+       "        [ 0.0267,  0.1670, -0.0626,  0.0744,  0.0574,  0.0413,  0.1313, -0.1479,\n",
+       "          0.0932, -0.0615]], grad_fn=<AddmmBackward>)"
+      ]
+     },
+     "execution_count": 5,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "net = MySequential(\n",
+    "        nn.Linear(784, 256),\n",
+    "        nn.ReLU(),\n",
+    "        nn.Linear(256, 10), \n",
+    "        )\n",
+    "print(net)\n",
+    "net(X)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### 4.1.2.2 `ModuleList`类"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 6,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Linear(in_features=256, out_features=10, bias=True)\n",
+      "ModuleList(\n",
+      "  (0): Linear(in_features=784, out_features=256, bias=True)\n",
+      "  (1): ReLU()\n",
+      "  (2): Linear(in_features=256, out_features=10, bias=True)\n",
+      ")\n"
+     ]
+    }
+   ],
+   "source": [
+    "net = nn.ModuleList([nn.Linear(784, 256), nn.ReLU()])\n",
+    "net.append(nn.Linear(256, 10)) # # 类似List的append操作\n",
+    "print(net[-1])  # 类似List的索引访问\n",
+    "print(net)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 7,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# net(torch.zeros(1, 784)) # 会报NotImplementedError"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 8,
+   "metadata": {
+    "collapsed": true
+   },
+   "outputs": [],
+   "source": [
+    "class MyModule(nn.Module):\n",
+    "    def __init__(self):\n",
+    "        super(MyModule, self).__init__()\n",
+    "        self.linears = nn.ModuleList([nn.Linear(10, 10) for i in range(10)])\n",
+    "\n",
+    "    def forward(self, x):\n",
+    "        # ModuleList can act as an iterable, or be indexed using ints\n",
+    "        for i, l in enumerate(self.linears):\n",
+    "            x = self.linears[i // 2](x) + l(x)\n",
+    "        return x"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 9,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "net1:\n",
+      "torch.Size([10, 10])\n",
+      "torch.Size([10])\n",
+      "net2:\n"
+     ]
+    }
+   ],
+   "source": [
+    "class Module_ModuleList(nn.Module):\n",
+    "    def __init__(self):\n",
+    "        super(Module_ModuleList, self).__init__()\n",
+    "        self.linears = nn.ModuleList([nn.Linear(10, 10)])\n",
+    "    \n",
+    "class Module_List(nn.Module):\n",
+    "    def __init__(self):\n",
+    "        super(Module_List, self).__init__()\n",
+    "        self.linears = [nn.Linear(10, 10)]\n",
+    "\n",
+    "net1 = Module_ModuleList()\n",
+    "net2 = Module_List()\n",
+    "\n",
+    "print(\"net1:\")\n",
+    "for p in net1.parameters():\n",
+    "    print(p.size())\n",
+    "\n",
+    "print(\"net2:\")\n",
+    "for p in net2.parameters():\n",
+    "    print(p)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### 4.1.2.3 `ModuleDict`类"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 10,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Linear(in_features=784, out_features=256, bias=True)\n",
+      "Linear(in_features=256, out_features=10, bias=True)\n",
+      "ModuleDict(\n",
+      "  (act): ReLU()\n",
+      "  (linear): Linear(in_features=784, out_features=256, bias=True)\n",
+      "  (output): Linear(in_features=256, out_features=10, bias=True)\n",
+      ")\n"
+     ]
+    }
+   ],
+   "source": [
+    "net = nn.ModuleDict({\n",
+    "    'linear': nn.Linear(784, 256),\n",
+    "    'act': nn.ReLU(),\n",
+    "})\n",
+    "net['output'] = nn.Linear(256, 10) # 添加\n",
+    "print(net['linear']) # 访问\n",
+    "print(net.output)\n",
+    "print(net)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 11,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# net(torch.zeros(1, 784)) # 会报NotImplementedError"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 4.1.3 构造复杂的模型"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 12,
+   "metadata": {
+    "collapsed": true
+   },
+   "outputs": [],
+   "source": [
+    "class FancyMLP(nn.Module):\n",
+    "    def __init__(self, **kwargs):\n",
+    "        super(FancyMLP, self).__init__(**kwargs)\n",
+    "        \n",
+    "        self.rand_weight = torch.rand((20, 20), requires_grad=False) # 不可训练参数（常数参数）\n",
+    "        self.linear = nn.Linear(20, 20)\n",
+    "\n",
+    "    def forward(self, x):\n",
+    "        x = self.linear(x)\n",
+    "        # 使用创建的常数参数，以及nn.functional中的relu函数和mm函数\n",
+    "        x = nn.functional.relu(torch.mm(x, self.rand_weight.data) + 1)\n",
+    "        \n",
+    "        # 复用全连接层。等价于两个全连接层共享参数\n",
+    "        x = self.linear(x)\n",
+    "        # 控制流，这里我们需要调用item函数来返回标量进行比较\n",
+    "        while x.norm().item() > 1:\n",
+    "            x /= 2\n",
+    "        if x.norm().item() < 0.8:\n",
+    "            x *= 10\n",
+    "        return x.sum()"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 13,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "FancyMLP(\n",
+      "  (linear): Linear(in_features=20, out_features=20, bias=True)\n",
+      ")\n"
+     ]
+    },
+    {
+     "data": {
+      "text/plain": [
+       "tensor(0.8907, grad_fn=<SumBackward0>)"
+      ]
+     },
+     "execution_count": 13,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "X = torch.rand(2, 20)\n",
+    "net = FancyMLP()\n",
+    "print(net)\n",
+    "net(X)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 14,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Sequential(\n",
+      "  (0): NestMLP(\n",
+      "    (net): Sequential(\n",
+      "      (0): Linear(in_features=40, out_features=30, bias=True)\n",
+      "      (1): ReLU()\n",
+      "    )\n",
+      "  )\n",
+      "  (1): Linear(in_features=30, out_features=20, bias=True)\n",
+      "  (2): FancyMLP(\n",
+      "    (linear): Linear(in_features=20, out_features=20, bias=True)\n",
+      "  )\n",
+      ")\n"
+     ]
+    },
+    {
+     "data": {
+      "text/plain": [
+       "tensor(-0.4605, grad_fn=<SumBackward0>)"
+      ]
+     },
+     "execution_count": 14,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "class NestMLP(nn.Module):\n",
+    "    def __init__(self, **kwargs):\n",
+    "        super(NestMLP, self).__init__(**kwargs)\n",
+    "        self.net = nn.Sequential(nn.Linear(40, 30), nn.ReLU()) \n",
+    "\n",
+    "    def forward(self, x):\n",
+    "        return self.net(x)\n",
+    "\n",
+    "net = nn.Sequential(NestMLP(), nn.Linear(30, 20), FancyMLP())\n",
+    "\n",
+    "X = torch.rand(2, 40)\n",
+    "print(net)\n",
+    "net(X)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "collapsed": true
+   },
+   "outputs": [],
+   "source": []
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.6.2"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}
--- a/code/chapter04_DL_computation/4.2_parameters.ipynb
+++ b/code/chapter04_DL_computation/4.2_parameters.ipynb
@ -0,0 +1,378 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# 4.2 模型参数的访问、初始化和共享"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 1,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "0.4.1\n"
+     ]
+    }
+   ],
+   "source": [
+    "import torch\n",
+    "from torch import nn\n",
+    "from torch.nn import init\n",
+    "\n",
+    "print(torch.__version__)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 2,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Sequential(\n",
+      "  (0): Linear(in_features=4, out_features=3, bias=True)\n",
+      "  (1): ReLU()\n",
+      "  (2): Linear(in_features=3, out_features=1, bias=True)\n",
+      ")\n"
+     ]
+    }
+   ],
+   "source": [
+    "net = nn.Sequential(nn.Linear(4, 3), nn.ReLU(), nn.Linear(3, 1))  # pytorch已进行默认初始化\n",
+    "\n",
+    "print(net)\n",
+    "X = torch.rand(2, 4)\n",
+    "Y = net(X).sum()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 4.2.1 访问模型参数"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 3,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "<class 'generator'>\n",
+      "0.weight torch.Size([3, 4])\n",
+      "0.bias torch.Size([3])\n",
+      "2.weight torch.Size([1, 3])\n",
+      "2.bias torch.Size([1])\n"
+     ]
+    }
+   ],
+   "source": [
+    "print(type(net.named_parameters()))\n",
+    "for name, param in net.named_parameters():\n",
+    "    print(name, param.size())"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 4,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "weight torch.Size([3, 4]) <class 'torch.nn.parameter.Parameter'>\n",
+      "bias torch.Size([3]) <class 'torch.nn.parameter.Parameter'>\n"
+     ]
+    }
+   ],
+   "source": [
+    "for name, param in net[0].named_parameters():\n",
+    "    print(name, param.size(), type(param))"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 5,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "weight1\n"
+     ]
+    }
+   ],
+   "source": [
+    "class MyModel(nn.Module):\n",
+    "    def __init__(self, **kwargs):\n",
+    "        super(MyModel, self).__init__(**kwargs)\n",
+    "        self.weight1 = nn.Parameter(torch.rand(20, 20))\n",
+    "        self.weight2 = torch.rand(20, 20)\n",
+    "    def forward(self, x):\n",
+    "        pass\n",
+    "    \n",
+    "n = MyModel()\n",
+    "for name, param in n.named_parameters():\n",
+    "    print(name)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 6,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "tensor([[ 0.2719, -0.0898, -0.2462,  0.0655],\n",
+      "        [-0.4669, -0.2703,  0.3230,  0.2067],\n",
+      "        [-0.2708,  0.1171, -0.0995,  0.3913]])\n",
+      "None\n",
+      "tensor([[-0.2281, -0.0653, -0.1646, -0.2569],\n",
+      "        [-0.1916, -0.0549, -0.1382, -0.2158],\n",
+      "        [ 0.0000,  0.0000,  0.0000,  0.0000]])\n"
+     ]
+    }
+   ],
+   "source": [
+    "weight_0 = list(net[0].parameters())[0]\n",
+    "print(weight_0.data)\n",
+    "print(weight_0.grad)\n",
+    "Y.backward()\n",
+    "print(weight_0.grad)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 4.2.2 初始化模型参数"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 7,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "0.weight tensor([[ 0.0030,  0.0094,  0.0070, -0.0010],\n",
+      "        [ 0.0001,  0.0039,  0.0105, -0.0126],\n",
+      "        [ 0.0105, -0.0135, -0.0047, -0.0006]])\n",
+      "2.weight tensor([[-0.0074,  0.0051,  0.0066]])\n"
+     ]
+    }
+   ],
+   "source": [
+    "for name, param in net.named_parameters():\n",
+    "    if 'weight' in name:\n",
+    "        init.normal_(param, mean=0, std=0.01)\n",
+    "        print(name, param.data)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 8,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "0.bias tensor([0., 0., 0.])\n",
+      "2.bias tensor([0.])\n"
+     ]
+    }
+   ],
+   "source": [
+    "for name, param in net.named_parameters():\n",
+    "    if 'bias' in name:\n",
+    "        init.constant_(param, val=0)\n",
+    "        print(name, param.data)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 4.2.3 自定义初始化方法"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 9,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "def init_weight_(tensor):\n",
+    "    with torch.no_grad():\n",
+    "        tensor.uniform_(-10, 10)\n",
+    "        tensor *= (tensor.abs() >= 5).float()"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 10,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "0.weight tensor([[ 7.0403,  0.0000, -9.4569,  7.0111],\n",
+      "        [-0.0000, -0.0000,  0.0000,  0.0000],\n",
+      "        [ 9.8063, -0.0000,  0.0000, -9.7993]])\n",
+      "2.weight tensor([[-5.8198,  7.7558, -5.0293]])\n"
+     ]
+    }
+   ],
+   "source": [
+    "for name, param in net.named_parameters():\n",
+    "    if 'weight' in name:\n",
+    "        init_weight_(param)\n",
+    "        print(name, param.data)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 11,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "0.bias tensor([1., 1., 1.])\n",
+      "2.bias tensor([1.])\n"
+     ]
+    }
+   ],
+   "source": [
+    "for name, param in net.named_parameters():\n",
+    "    if 'bias' in name:\n",
+    "        param.data += 1\n",
+    "        print(name, param.data)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 4.2.4 共享模型参数"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 12,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Sequential(\n",
+      "  (0): Linear(in_features=1, out_features=1, bias=False)\n",
+      "  (1): Linear(in_features=1, out_features=1, bias=False)\n",
+      ")\n",
+      "0.weight tensor([[3.]])\n"
+     ]
+    }
+   ],
+   "source": [
+    "linear = nn.Linear(1, 1, bias=False)\n",
+    "net = nn.Sequential(linear, linear) \n",
+    "print(net)\n",
+    "for name, param in net.named_parameters():\n",
+    "    init.constant_(param, val=3)\n",
+    "    print(name, param.data)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 13,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "True\n",
+      "True\n"
+     ]
+    }
+   ],
+   "source": [
+    "print(id(net[0]) == id(net[1]))\n",
+    "print(id(net[0].weight) == id(net[1].weight))"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 14,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "tensor(9., grad_fn=<SumBackward0>)\n",
+      "tensor([[6.]])\n"
+     ]
+    }
+   ],
+   "source": [
+    "x = torch.ones(1, 1)\n",
+    "y = net(x).sum()\n",
+    "print(y)\n",
+    "y.backward()\n",
+    "print(net[0].weight.grad)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "collapsed": true
+   },
+   "outputs": [],
+   "source": []
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python [default]",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.6.3"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}
--- a/code/chapter04_DL_computation/4.4_custom-layer.ipynb
+++ b/code/chapter04_DL_computation/4.4_custom-layer.ipynb
@ -0,0 +1,269 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# 4.4 自定义层\n",
+    "## 4.4.1 不含模型参数的自定义层"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 1,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "0.4.1\n"
+     ]
+    }
+   ],
+   "source": [
+    "import torch\n",
+    "from torch import nn\n",
+    "\n",
+    "print(torch.__version__)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 2,
+   "metadata": {
+    "collapsed": true
+   },
+   "outputs": [],
+   "source": [
+    "class CenteredLayer(nn.Module):\n",
+    "    def __init__(self, **kwargs):\n",
+    "        super(CenteredLayer, self).__init__(**kwargs)\n",
+    "    def forward(self, x):\n",
+    "        return x - x.mean()"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 3,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "tensor([-2., -1.,  0.,  1.,  2.])"
+      ]
+     },
+     "execution_count": 3,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "layer = CenteredLayer()\n",
+    "layer(torch.tensor([1, 2, 3, 4, 5], dtype=torch.float))"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 4,
+   "metadata": {
+    "collapsed": true
+   },
+   "outputs": [],
+   "source": [
+    "net = nn.Sequential(nn.Linear(8, 128), CenteredLayer())"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 5,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "0.0"
+      ]
+     },
+     "execution_count": 5,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "y = net(torch.rand(4, 8))\n",
+    "y.mean().item()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 4.4.2 含模型参数的自定义层"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 6,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "MyListDense(\n",
+      "  (params): ParameterList(\n",
+      "      (0): Parameter containing: [torch.FloatTensor of size 4x4]\n",
+      "      (1): Parameter containing: [torch.FloatTensor of size 4x4]\n",
+      "      (2): Parameter containing: [torch.FloatTensor of size 4x4]\n",
+      "      (3): Parameter containing: [torch.FloatTensor of size 4x1]\n",
+      "  )\n",
+      ")\n"
+     ]
+    }
+   ],
+   "source": [
+    "class MyListDense(nn.Module):\n",
+    "    def __init__(self):\n",
+    "        super(MyListDense, self).__init__()\n",
+    "        self.params = nn.ParameterList([nn.Parameter(torch.randn(4, 4)) for i in range(3)])\n",
+    "        self.params.append(nn.Parameter(torch.randn(4, 1)))\n",
+    "\n",
+    "    def forward(self, x):\n",
+    "        for i in range(len(self.params)):\n",
+    "            x = torch.mm(x, self.params[i])\n",
+    "        return x\n",
+    "net = MyListDense()\n",
+    "print(net)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 7,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "MyDictDense(\n",
+      "  (params): ParameterDict(\n",
+      "      (linear1): Parameter containing: [torch.FloatTensor of size 4x4]\n",
+      "      (linear2): Parameter containing: [torch.FloatTensor of size 4x1]\n",
+      "      (linear3): Parameter containing: [torch.FloatTensor of size 4x2]\n",
+      "  )\n",
+      ")\n"
+     ]
+    }
+   ],
+   "source": [
+    "class MyDictDense(nn.Module):\n",
+    "    def __init__(self):\n",
+    "        super(MyDictDense, self).__init__()\n",
+    "        self.params = nn.ParameterDict({\n",
+    "                'linear1': nn.Parameter(torch.randn(4, 4)),\n",
+    "                'linear2': nn.Parameter(torch.randn(4, 1))\n",
+    "        })\n",
+    "        self.params.update({'linear3': nn.Parameter(torch.randn(4, 2))}) # 新增\n",
+    "\n",
+    "    def forward(self, x, choice='linear1'):\n",
+    "        return torch.mm(x, self.params[choice])\n",
+    "\n",
+    "net = MyDictDense()\n",
+    "print(net)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 8,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "tensor([[1.5082, 1.5574, 2.1651, 1.2409]], grad_fn=<MmBackward>)\n",
+      "tensor([[-0.8783]], grad_fn=<MmBackward>)\n",
+      "tensor([[ 2.2193, -1.6539]], grad_fn=<MmBackward>)\n"
+     ]
+    }
+   ],
+   "source": [
+    "x = torch.ones(1, 4)\n",
+    "print(net(x, 'linear1'))\n",
+    "print(net(x, 'linear2'))\n",
+    "print(net(x, 'linear3'))"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 10,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Sequential(\n",
+      "  (0): MyDictDense(\n",
+      "    (params): ParameterDict(\n",
+      "        (linear1): Parameter containing: [torch.FloatTensor of size 4x4]\n",
+      "        (linear2): Parameter containing: [torch.FloatTensor of size 4x1]\n",
+      "        (linear3): Parameter containing: [torch.FloatTensor of size 4x2]\n",
+      "    )\n",
+      "  )\n",
+      "  (1): MyListDense(\n",
+      "    (params): ParameterList(\n",
+      "        (0): Parameter containing: [torch.FloatTensor of size 4x4]\n",
+      "        (1): Parameter containing: [torch.FloatTensor of size 4x4]\n",
+      "        (2): Parameter containing: [torch.FloatTensor of size 4x4]\n",
+      "        (3): Parameter containing: [torch.FloatTensor of size 4x1]\n",
+      "    )\n",
+      "  )\n",
+      ")\n",
+      "tensor([[-101.2394]], grad_fn=<MmBackward>)\n"
+     ]
+    }
+   ],
+   "source": [
+    "net = nn.Sequential(\n",
+    "    MyDictDense(),\n",
+    "    MyListDense(),\n",
+    ")\n",
+    "print(net)\n",
+    "print(net(x))"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "collapsed": true
+   },
+   "outputs": [],
+   "source": []
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python [default]",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.6.3"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}
--- a/code/chapter04_DL_computation/4.5_read-write.ipynb
+++ b/code/chapter04_DL_computation/4.5_read-write.ipynb
@ -0,0 +1,254 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# 4.5 读取和存储"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 1,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "0.4.1\n"
+     ]
+    }
+   ],
+   "source": [
+    "import torch\n",
+    "from torch import nn\n",
+    "\n",
+    "print(torch.__version__)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 4.5.1 读写`Tensor`"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 2,
+   "metadata": {
+    "collapsed": true
+   },
+   "outputs": [],
+   "source": [
+    "x = torch.ones(3)\n",
+    "torch.save(x, 'x.pt')"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 3,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "tensor([1., 1., 1.])"
+      ]
+     },
+     "execution_count": 3,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "x2 = torch.load('x.pt')\n",
+    "x2"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 4,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "[tensor([1., 1., 1.]), tensor([0., 0., 0., 0.])]"
+      ]
+     },
+     "execution_count": 4,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "y = torch.zeros(4)\n",
+    "torch.save([x, y], 'xy.pt')\n",
+    "xy_list = torch.load('xy.pt')\n",
+    "xy_list"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 5,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "{'x': tensor([1., 1., 1.]), 'y': tensor([0., 0., 0., 0.])}"
+      ]
+     },
+     "execution_count": 5,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "torch.save({'x': x, 'y': y}, 'xy_dict.pt')\n",
+    "xy = torch.load('xy_dict.pt')\n",
+    "xy"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 4.5.2 读写模型\n",
+    "### 4.5.2.1 `state_dict`"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 6,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "OrderedDict([('hidden.weight', tensor([[ 0.1836, -0.1812, -0.1681],\n",
+       "                      [ 0.0406,  0.3061,  0.4599]])),\n",
+       "             ('hidden.bias', tensor([-0.3384,  0.1910])),\n",
+       "             ('output.weight', tensor([[0.0380, 0.4919]])),\n",
+       "             ('output.bias', tensor([0.1451]))])"
+      ]
+     },
+     "execution_count": 6,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "class MLP(nn.Module):\n",
+    "    def __init__(self):\n",
+    "        super(MLP, self).__init__()\n",
+    "        self.hidden = nn.Linear(3, 2)\n",
+    "        self.act = nn.ReLU()\n",
+    "        self.output = nn.Linear(2, 1)\n",
+    "\n",
+    "    def forward(self, x):\n",
+    "        a = self.act(self.hidden(x))\n",
+    "        return self.output(a)\n",
+    "\n",
+    "net = MLP()\n",
+    "net.state_dict()"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 7,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "{'param_groups': [{'dampening': 0,\n",
+       "   'lr': 0.001,\n",
+       "   'momentum': 0.9,\n",
+       "   'nesterov': False,\n",
+       "   'params': [4624483024, 4624484608, 4624484680, 4624484752],\n",
+       "   'weight_decay': 0}],\n",
+       " 'state': {}}"
+      ]
+     },
+     "execution_count": 7,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "optimizer = torch.optim.SGD(net.parameters(), lr=0.001, momentum=0.9)\n",
+    "optimizer.state_dict()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### 4.5.2.2 保存和加载模型"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 8,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "tensor([[1],\n",
+       "        [1]], dtype=torch.uint8)"
+      ]
+     },
+     "execution_count": 8,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "X = torch.randn(2, 3)\n",
+    "Y = net(X)\n",
+    "\n",
+    "PATH = \"./net.pt\"\n",
+    "torch.save(net.state_dict(), PATH)\n",
+    "\n",
+    "net2 = MLP()\n",
+    "net2.load_state_dict(torch.load(PATH))\n",
+    "Y2 = net2(X)\n",
+    "Y2 == Y"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "collapsed": true
+   },
+   "outputs": [],
+   "source": []
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python [default]",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.6.3"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}
--- a/code/chapter04_DL_computation/4.6_use-gpu.ipynb
+++ b/code/chapter04_DL_computation/4.6_use-gpu.ipynb
@ -0,0 +1,482 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# 4.6 GPU计算"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 1,
+   "metadata": {
+    "ExecuteTime": {
+     "end_time": "2019-03-17T08:12:15.123349Z",
+     "start_time": "2019-03-17T08:12:14.979997Z"
+    }
+   },
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Sun Mar 17 16:12:15 2019       \r\n",
+      "+-----------------------------------------------------------------------------+\r\n",
+      "| NVIDIA-SMI 390.48                 Driver Version: 390.48                    |\r\n",
+      "|-------------------------------+----------------------+----------------------+\r\n",
+      "| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |\r\n",
+      "| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |\r\n",
+      "|===============================+======================+======================|\r\n",
+      "|   0  GeForce GTX 1050    Off  | 00000000:01:00.0 Off |                  N/A |\r\n",
+      "| 20%   40C    P5    N/A /  75W |   1213MiB /  2000MiB |     23%      Default |\r\n",
+      "+-------------------------------+----------------------+----------------------+\r\n",
+      "                                                                               \r\n",
+      "+-----------------------------------------------------------------------------+\r\n",
+      "| Processes:                                                       GPU Memory |\r\n",
+      "|  GPU       PID   Type   Process name                             Usage      |\r\n",
+      "|=============================================================================|\r\n",
+      "|    0      1235      G   /usr/lib/xorg/Xorg                           434MiB |\r\n",
+      "|    0      2095      G   compiz                                       171MiB |\r\n",
+      "|    0      2660      G   /opt/teamviewer/tv_bin/TeamViewer              5MiB |\r\n",
+      "|    0      4166      G   /proc/self/exe                               397MiB |\r\n",
+      "|    0     13274      C   /home/tss/anaconda3/bin/python               191MiB |\r\n",
+      "+-----------------------------------------------------------------------------+\r\n"
+     ]
+    }
+   ],
+   "source": [
+    "!nvidia-smi # 对Linux/macOS用户有效"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 2,
+   "metadata": {
+    "ExecuteTime": {
+     "end_time": "2019-03-17T08:12:15.512222Z",
+     "start_time": "2019-03-17T08:12:15.124792Z"
+    }
+   },
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "0.4.1\n"
+     ]
+    }
+   ],
+   "source": [
+    "import torch\n",
+    "from torch import nn\n",
+    "\n",
+    "print(torch.__version__)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 4.6.1 计算设备"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 3,
+   "metadata": {
+    "ExecuteTime": {
+     "end_time": "2019-03-17T08:12:15.539276Z",
+     "start_time": "2019-03-17T08:12:15.513205Z"
+    }
+   },
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "True"
+      ]
+     },
+     "execution_count": 3,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "torch.cuda.is_available() # cuda是否可用"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 4,
+   "metadata": {
+    "ExecuteTime": {
+     "end_time": "2019-03-17T08:12:15.543795Z",
+     "start_time": "2019-03-17T08:12:15.540338Z"
+    }
+   },
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "1"
+      ]
+     },
+     "execution_count": 4,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "torch.cuda.device_count() # gpu数量"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 5,
+   "metadata": {
+    "ExecuteTime": {
+     "end_time": "2019-03-17T08:12:15.551451Z",
+     "start_time": "2019-03-17T08:12:15.544964Z"
+    }
+   },
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "0"
+      ]
+     },
+     "execution_count": 5,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "torch.cuda.current_device() # 当前设备索引, 从0开始"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 6,
+   "metadata": {
+    "ExecuteTime": {
+     "end_time": "2019-03-17T08:12:15.555020Z",
+     "start_time": "2019-03-17T08:12:15.552387Z"
+    }
+   },
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "'GeForce GTX 1050'"
+      ]
+     },
+     "execution_count": 6,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "torch.cuda.get_device_name(0) # 返回gpu名字"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 4.6.2 `Tensor`的GPU计算"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 7,
+   "metadata": {
+    "ExecuteTime": {
+     "end_time": "2019-03-17T08:12:15.562186Z",
+     "start_time": "2019-03-17T08:12:15.556621Z"
+    }
+   },
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "tensor([1, 2, 3])"
+      ]
+     },
+     "execution_count": 7,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "x = torch.tensor([1, 2, 3])\n",
+    "x"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 8,
+   "metadata": {
+    "ExecuteTime": {
+     "end_time": "2019-03-17T08:12:17.441336Z",
+     "start_time": "2019-03-17T08:12:15.563813Z"
+    }
+   },
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "tensor([1, 2, 3], device='cuda:0')"
+      ]
+     },
+     "execution_count": 8,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "x = x.cuda(0)\n",
+    "x"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 9,
+   "metadata": {
+    "ExecuteTime": {
+     "end_time": "2019-03-17T08:12:17.449383Z",
+     "start_time": "2019-03-17T08:12:17.445193Z"
+    }
+   },
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "device(type='cuda', index=0)"
+      ]
+     },
+     "execution_count": 9,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "x.device"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 10,
+   "metadata": {
+    "ExecuteTime": {
+     "end_time": "2019-03-17T08:12:17.454548Z",
+     "start_time": "2019-03-17T08:12:17.450268Z"
+    }
+   },
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "tensor([1, 2, 3], device='cuda:0')"
+      ]
+     },
+     "execution_count": 10,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')\n",
+    "\n",
+    "x = torch.tensor([1, 2, 3], device=device)\n",
+    "# or\n",
+    "x = torch.tensor([1, 2, 3]).to(device)\n",
+    "x"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 11,
+   "metadata": {
+    "ExecuteTime": {
+     "end_time": "2019-03-17T08:12:17.467441Z",
+     "start_time": "2019-03-17T08:12:17.455495Z"
+    }
+   },
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "tensor([1, 4, 9], device='cuda:0')"
+      ]
+     },
+     "execution_count": 11,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "y = x**2\n",
+    "y"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 12,
+   "metadata": {
+    "ExecuteTime": {
+     "end_time": "2019-03-17T08:12:17.470297Z",
+     "start_time": "2019-03-17T08:12:17.468866Z"
+    }
+   },
+   "outputs": [],
+   "source": [
+    "# z = y + x.cpu()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 4.6.3 模型的GPU计算"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 13,
+   "metadata": {
+    "ExecuteTime": {
+     "end_time": "2019-03-17T08:12:17.474763Z",
+     "start_time": "2019-03-17T08:12:17.471348Z"
+    }
+   },
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "device(type='cpu')"
+      ]
+     },
+     "execution_count": 13,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "net = nn.Linear(3, 1)\n",
+    "list(net.parameters())[0].device"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 14,
+   "metadata": {
+    "ExecuteTime": {
+     "end_time": "2019-03-17T08:12:17.478553Z",
+     "start_time": "2019-03-17T08:12:17.475677Z"
+    }
+   },
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "device(type='cuda', index=0)"
+      ]
+     },
+     "execution_count": 14,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "net.cuda()\n",
+    "list(net.parameters())[0].device"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 15,
+   "metadata": {
+    "ExecuteTime": {
+     "end_time": "2019-03-17T08:12:17.957448Z",
+     "start_time": "2019-03-17T08:12:17.479843Z"
+    }
+   },
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "tensor([[-0.5574],\n",
+       "        [-0.3792]], device='cuda:0', grad_fn=<ThAddmmBackward>)"
+      ]
+     },
+     "execution_count": 15,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "x = torch.rand(2,3).cuda()\n",
+    "net(x)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": []
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python [default]",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.6.4"
+  },
+  "varInspector": {
+   "cols": {
+    "lenName": 16,
+    "lenType": 16,
+    "lenVar": 40
+   },
+   "kernels_config": {
+    "python": {
+     "delete_cmd_postfix": "",
+     "delete_cmd_prefix": "del ",
+     "library": "var_list.py",
+     "varRefreshCmd": "print(var_dic_list())"
+    },
+    "r": {
+     "delete_cmd_postfix": ") ",
+     "delete_cmd_prefix": "rm(",
+     "library": "var_list.r",
+     "varRefreshCmd": "cat(var_dic_list()) "
+    }
+   },
+   "types_to_exclude": [
+    "module",
+    "function",
+    "builtin_function_or_method",
+    "instance",
+    "_Feature"
+   ],
+   "window_display": false
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}
--- a/code/chapter05_CNN/5.10_batch-norm.ipynb
+++ b/code/chapter05_CNN/5.10_batch-norm.ipynb
@ -0,0 +1,272 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# 5.10 批量归一化"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 1,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "0.4.0\n",
+      "cuda\n"
+     ]
+    }
+   ],
+   "source": [
+    "import time\n",
+    "import torch\n",
+    "from torch import nn, optim\n",
+    "import torch.nn.functional as F\n",
+    "\n",
+    "import sys\n",
+    "sys.path.append(\"..\") \n",
+    "import d2lzh_pytorch as d2l\n",
+    "device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')\n",
+    "\n",
+    "print(torch.__version__)\n",
+    "print(device)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 5.10.2 从零开始实现"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 2,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "def batch_norm(is_training, X, gamma, beta, moving_mean, moving_var, eps, momentum):\n",
+    "    # 判断当前模式是训练模式还是预测模式\n",
+    "    if not is_training:\n",
+    "        # 如果是在预测模式下，直接使用传入的移动平均所得的均值和方差\n",
+    "        X_hat = (X - moving_mean) / torch.sqrt(moving_var + eps)\n",
+    "    else:\n",
+    "        assert len(X.shape) in (2, 4)\n",
+    "        if len(X.shape) == 2:\n",
+    "            # 使用全连接层的情况，计算特征维上的均值和方差\n",
+    "            mean = X.mean(dim=0)\n",
+    "            var = ((X - mean) ** 2).mean(dim=0)\n",
+    "        else:\n",
+    "            # 使用二维卷积层的情况，计算通道维上（axis=1）的均值和方差。这里我们需要保持\n",
+    "            # X的形状以便后面可以做广播运算\n",
+    "            mean = X.mean(dim=0, keepdim=True).mean(dim=2, keepdim=True).mean(dim=3, keepdim=True)\n",
+    "            var = ((X - mean) ** 2).mean(dim=0, keepdim=True).mean(dim=2, keepdim=True).mean(dim=3, keepdim=True)\n",
+    "        # 训练模式下用当前的均值和方差做标准化\n",
+    "        X_hat = (X - mean) / torch.sqrt(var + eps)\n",
+    "        # 更新移动平均的均值和方差\n",
+    "        moving_mean = momentum * moving_mean + (1.0 - momentum) * mean\n",
+    "        moving_var = momentum * moving_var + (1.0 - momentum) * var\n",
+    "    Y = gamma * X_hat + beta  # 拉伸和偏移\n",
+    "    return Y, moving_mean, moving_var"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 3,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "class BatchNorm(nn.Module):\n",
+    "    def __init__(self, num_features, num_dims):\n",
+    "        super(BatchNorm, self).__init__()\n",
+    "        if num_dims == 2:\n",
+    "            shape = (1, num_features)\n",
+    "        else:\n",
+    "            shape = (1, num_features, 1, 1)\n",
+    "        # 参与求梯度和迭代的拉伸和偏移参数，分别初始化成0和1\n",
+    "        self.gamma = nn.Parameter(torch.ones(shape))\n",
+    "        self.beta = nn.Parameter(torch.zeros(shape))\n",
+    "        # 不参与求梯度和迭代的变量，全在内存上初始化成0\n",
+    "        self.moving_mean = torch.zeros(shape)\n",
+    "        self.moving_var = torch.zeros(shape)\n",
+    "\n",
+    "    def forward(self, X):\n",
+    "        # 如果X不在内存上，将moving_mean和moving_var复制到X所在显存上\n",
+    "        if self.moving_mean.device != X.device:\n",
+    "            self.moving_mean = self.moving_mean.to(X.device)\n",
+    "            self.moving_var = self.moving_var.to(X.device)\n",
+    "        # 保存更新过的moving_mean和moving_var, Module实例的traning属性默认为true, 调用.eval()后设成false\n",
+    "        Y, self.moving_mean, self.moving_var = batch_norm(self.training, \n",
+    "            X, self.gamma, self.beta, self.moving_mean,\n",
+    "            self.moving_var, eps=1e-5, momentum=0.9)\n",
+    "        return Y"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### 5.10.2.1 使用批量归一化层的LeNet"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 4,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "net = nn.Sequential(\n",
+    "            nn.Conv2d(1, 6, 5), # in_channels, out_channels, kernel_size\n",
+    "            BatchNorm(6, num_dims=4),\n",
+    "            nn.Sigmoid(),\n",
+    "            nn.MaxPool2d(2, 2), # kernel_size, stride\n",
+    "            nn.Conv2d(6, 16, 5),\n",
+    "            BatchNorm(16, num_dims=4),\n",
+    "            nn.Sigmoid(),\n",
+    "            nn.MaxPool2d(2, 2),\n",
+    "            d2l.FlattenLayer(),\n",
+    "            nn.Linear(16*4*4, 120),\n",
+    "            BatchNorm(120, num_dims=2),\n",
+    "            nn.Sigmoid(),\n",
+    "            nn.Linear(120, 84),\n",
+    "            BatchNorm(84, num_dims=2),\n",
+    "            nn.Sigmoid(),\n",
+    "            nn.Linear(84, 10)\n",
+    "        )"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 5,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "training on  cuda\n",
+      "epoch 1, loss 0.0039, train acc 0.790, test acc 0.835, time 2.9 sec\n",
+      "epoch 2, loss 0.0018, train acc 0.866, test acc 0.821, time 3.2 sec\n",
+      "epoch 3, loss 0.0014, train acc 0.879, test acc 0.857, time 2.6 sec\n",
+      "epoch 4, loss 0.0013, train acc 0.886, test acc 0.820, time 2.7 sec\n",
+      "epoch 5, loss 0.0012, train acc 0.891, test acc 0.859, time 2.8 sec\n"
+     ]
+    }
+   ],
+   "source": [
+    "batch_size = 256\n",
+    "train_iter, test_iter = d2l.load_data_fashion_mnist(batch_size=batch_size)\n",
+    "\n",
+    "lr, num_epochs = 0.001, 5\n",
+    "optimizer = torch.optim.Adam(net.parameters(), lr=lr)\n",
+    "d2l.train_ch5(net, train_iter, test_iter, batch_size, optimizer, device, num_epochs)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 6,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "(tensor([ 1.2537,  1.2284,  1.0100,  1.0171,  0.9809,  1.1870], device='cuda:0'),\n",
+       " tensor([ 0.0962,  0.3299, -0.5506,  0.1522, -0.1556,  0.2240], device='cuda:0'))"
+      ]
+     },
+     "execution_count": 6,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "net[1].gamma.view((-1,)), net[1].beta.view((-1,))"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 5.10.3 简洁实现"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 7,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "net = nn.Sequential(\n",
+    "            nn.Conv2d(1, 6, 5), # in_channels, out_channels, kernel_size\n",
+    "            nn.BatchNorm2d(6),\n",
+    "            nn.Sigmoid(),\n",
+    "            nn.MaxPool2d(2, 2), # kernel_size, stride\n",
+    "            nn.Conv2d(6, 16, 5),\n",
+    "            nn.BatchNorm2d(16),\n",
+    "            nn.Sigmoid(),\n",
+    "            nn.MaxPool2d(2, 2),\n",
+    "            d2l.FlattenLayer(),\n",
+    "            nn.Linear(16*4*4, 120),\n",
+    "            nn.BatchNorm1d(120),\n",
+    "            nn.Sigmoid(),\n",
+    "            nn.Linear(120, 84),\n",
+    "            nn.BatchNorm1d(84),\n",
+    "            nn.Sigmoid(),\n",
+    "            nn.Linear(84, 10)\n",
+    "        )"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 8,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "training on  cuda\n",
+      "epoch 1, loss 0.0054, train acc 0.767, test acc 0.795, time 2.0 sec\n",
+      "epoch 2, loss 0.0024, train acc 0.851, test acc 0.748, time 2.0 sec\n",
+      "epoch 3, loss 0.0017, train acc 0.872, test acc 0.814, time 2.2 sec\n",
+      "epoch 4, loss 0.0014, train acc 0.883, test acc 0.818, time 2.1 sec\n",
+      "epoch 5, loss 0.0013, train acc 0.889, test acc 0.734, time 1.8 sec\n"
+     ]
+    }
+   ],
+   "source": [
+    "batch_size = 256\n",
+    "train_iter, test_iter = d2l.load_data_fashion_mnist(batch_size=batch_size)\n",
+    "\n",
+    "lr, num_epochs = 0.001, 5\n",
+    "optimizer = torch.optim.Adam(net.parameters(), lr=lr)\n",
+    "d2l.train_ch5(net, train_iter, test_iter, batch_size, optimizer, device, num_epochs)"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python [default]",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.6.4"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}
--- a/code/chapter05_CNN/5.11_resnet.ipynb
+++ b/code/chapter05_CNN/5.11_resnet.ipynb
@ -0,0 +1,261 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# 5.11 残差网络（ResNet）"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 1,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "0.4.0\n",
+      "cuda\n"
+     ]
+    }
+   ],
+   "source": [
+    "import time\n",
+    "import torch\n",
+    "from torch import nn, optim\n",
+    "import torch.nn.functional as F\n",
+    "\n",
+    "import sys\n",
+    "sys.path.append(\"..\") \n",
+    "import d2lzh_pytorch as d2l\n",
+    "device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')\n",
+    "\n",
+    "print(torch.__version__)\n",
+    "print(device)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 5.11.2 残差块"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 2,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "class Residual(nn.Module):  # 本类已保存在d2lzh_pytorch包中方便以后使用\n",
+    "    def __init__(self, in_channels, out_channels, use_1x1conv=False, stride=1):\n",
+    "        super(Residual, self).__init__()\n",
+    "        self.conv1 = nn.Conv2d(in_channels, out_channels, kernel_size=3, padding=1, stride=stride)\n",
+    "        self.conv2 = nn.Conv2d(out_channels, out_channels, kernel_size=3, padding=1)\n",
+    "        if use_1x1conv:\n",
+    "            self.conv3 = nn.Conv2d(in_channels, out_channels, kernel_size=1, stride=stride)\n",
+    "        else:\n",
+    "            self.conv3 = None\n",
+    "        self.bn1 = nn.BatchNorm2d(out_channels)\n",
+    "        self.bn2 = nn.BatchNorm2d(out_channels)\n",
+    "\n",
+    "    def forward(self, X):\n",
+    "        Y = F.relu(self.bn1(self.conv1(X)))\n",
+    "        Y = self.bn2(self.conv2(Y))\n",
+    "        if self.conv3:\n",
+    "            X = self.conv3(X)\n",
+    "        return F.relu(Y + X)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 3,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "torch.Size([4, 3, 6, 6])"
+      ]
+     },
+     "execution_count": 3,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "blk = Residual(3, 3)\n",
+    "X = torch.rand((4, 3, 6, 6))\n",
+    "blk(X).shape"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 4,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "torch.Size([4, 6, 3, 3])"
+      ]
+     },
+     "execution_count": 4,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "blk = Residual(3, 6, use_1x1conv=True, stride=2)\n",
+    "blk(X).shape"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 5.11.2 ResNet模型"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 5,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "net = nn.Sequential(\n",
+    "        nn.Conv2d(1, 64, kernel_size=7, stride=2, padding=3),\n",
+    "        nn.BatchNorm2d(64), \n",
+    "        nn.ReLU(),\n",
+    "        nn.MaxPool2d(kernel_size=3, stride=2, padding=1))"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 6,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "def resnet_block(in_channels, out_channels, num_residuals, first_block=False):\n",
+    "    if first_block:\n",
+    "        assert in_channels == out_channels # 第一个模块的通道数同输入通道数一致\n",
+    "    blk = []\n",
+    "    for i in range(num_residuals):\n",
+    "        if i == 0 and not first_block:\n",
+    "            blk.append(Residual(in_channels, out_channels, use_1x1conv=True, stride=2))\n",
+    "        else:\n",
+    "            blk.append(Residual(out_channels, out_channels))\n",
+    "    return nn.Sequential(*blk)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 7,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "net.add_module(\"resnet_block1\", resnet_block(64, 64, 2, first_block=True))\n",
+    "net.add_module(\"resnet_block2\", resnet_block(64, 128, 2))\n",
+    "net.add_module(\"resnet_block3\", resnet_block(128, 256, 2))\n",
+    "net.add_module(\"resnet_block4\", resnet_block(256, 512, 2))"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 8,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "net.add_module(\"global_avg_pool\", d2l.GlobalAvgPool2d()) # GlobalAvgPool2d的输出: (Batch, 512, 1, 1)\n",
+    "net.add_module(\"fc\", nn.Sequential(d2l.FlattenLayer(), nn.Linear(512, 10))) "
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 9,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "0  output shape:\t torch.Size([1, 64, 112, 112])\n",
+      "1  output shape:\t torch.Size([1, 64, 112, 112])\n",
+      "2  output shape:\t torch.Size([1, 64, 112, 112])\n",
+      "3  output shape:\t torch.Size([1, 64, 56, 56])\n",
+      "resnet_block1  output shape:\t torch.Size([1, 64, 56, 56])\n",
+      "resnet_block2  output shape:\t torch.Size([1, 128, 28, 28])\n",
+      "resnet_block3  output shape:\t torch.Size([1, 256, 14, 14])\n",
+      "resnet_block4  output shape:\t torch.Size([1, 512, 7, 7])\n",
+      "global_avg_pool  output shape:\t torch.Size([1, 512, 1, 1])\n",
+      "fc  output shape:\t torch.Size([1, 10])\n"
+     ]
+    }
+   ],
+   "source": [
+    "X = torch.rand((1, 1, 224, 224))\n",
+    "for name, layer in net.named_children():\n",
+    "    X = layer(X)\n",
+    "    print(name, ' output shape:\\t', X.shape)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 5.11.3 获取数据和训练模型"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 10,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "training on  cuda\n",
+      "epoch 1, loss 0.0015, train acc 0.853, test acc 0.885, time 31.0 sec\n",
+      "epoch 2, loss 0.0010, train acc 0.910, test acc 0.899, time 31.8 sec\n",
+      "epoch 3, loss 0.0008, train acc 0.926, test acc 0.911, time 31.6 sec\n",
+      "epoch 4, loss 0.0007, train acc 0.936, test acc 0.916, time 31.8 sec\n",
+      "epoch 5, loss 0.0006, train acc 0.944, test acc 0.926, time 31.5 sec\n"
+     ]
+    }
+   ],
+   "source": [
+    "batch_size = 256\n",
+    "# 如出现“out of memory”的报错信息，可减小batch_size或resize\n",
+    "train_iter, test_iter = d2l.load_data_fashion_mnist(batch_size, resize=96)\n",
+    "\n",
+    "lr, num_epochs = 0.001, 5\n",
+    "optimizer = torch.optim.Adam(net.parameters(), lr=lr)\n",
+    "d2l.train_ch5(net, train_iter, test_iter, batch_size, optimizer, device, num_epochs)"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python [default]",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.6.4"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}
--- a/code/chapter05_CNN/5.12_densenet.ipynb
+++ b/code/chapter05_CNN/5.12_densenet.ipynb
@ -0,0 +1,291 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# 5.12 稠密连接网络（DenseNet）"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 1,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "0.4.0\n",
+      "cuda\n"
+     ]
+    }
+   ],
+   "source": [
+    "import time\n",
+    "import torch\n",
+    "from torch import nn, optim\n",
+    "import torch.nn.functional as F\n",
+    "\n",
+    "import sys\n",
+    "sys.path.append(\"..\") \n",
+    "import d2lzh_pytorch as d2l\n",
+    "device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')\n",
+    "\n",
+    "print(torch.__version__)\n",
+    "print(device)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 5.12.1 稠密块"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 2,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "def conv_block(in_channels, out_channels):\n",
+    "    blk = nn.Sequential(nn.BatchNorm2d(in_channels), \n",
+    "                        nn.ReLU(),\n",
+    "                        nn.Conv2d(in_channels, out_channels, kernel_size=3, padding=1))\n",
+    "    return blk"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 3,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "class DenseBlock(nn.Module):\n",
+    "    def __init__(self, num_convs, in_channels, out_channels):\n",
+    "        super(DenseBlock, self).__init__()\n",
+    "        net = []\n",
+    "        for i in range(num_convs):\n",
+    "            in_c = in_channels + i * out_channels\n",
+    "            net.append(conv_block(in_c, out_channels))\n",
+    "        self.net = nn.ModuleList(net)\n",
+    "        self.out_channels = in_channels + num_convs * out_channels # 计算输出通道数\n",
+    "\n",
+    "    def forward(self, X):\n",
+    "        for blk in self.net:\n",
+    "            Y = blk(X)\n",
+    "            X = torch.cat((X, Y), dim=1)  # 在通道维上将输入和输出连结\n",
+    "        return X"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 4,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "torch.Size([4, 23, 8, 8])"
+      ]
+     },
+     "execution_count": 4,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "blk = DenseBlock(2, 3, 10)\n",
+    "X = torch.rand(4, 3, 8, 8)\n",
+    "Y = blk(X)\n",
+    "Y.shape"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 5.12.2 过渡层"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 5,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "def transition_block(in_channels, out_channels):\n",
+    "    blk = nn.Sequential(\n",
+    "            nn.BatchNorm2d(in_channels), \n",
+    "            nn.ReLU(),\n",
+    "            nn.Conv2d(in_channels, out_channels, kernel_size=1),\n",
+    "            nn.AvgPool2d(kernel_size=2, stride=2))\n",
+    "    return blk"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 6,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "torch.Size([4, 10, 4, 4])"
+      ]
+     },
+     "execution_count": 6,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "blk = transition_block(23, 10)\n",
+    "blk(Y).shape"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 5.12.3 DenseNet模型"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 7,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "net = nn.Sequential(\n",
+    "        nn.Conv2d(1, 64, kernel_size=7, stride=2, padding=3),\n",
+    "        nn.BatchNorm2d(64), \n",
+    "        nn.ReLU(),\n",
+    "        nn.MaxPool2d(kernel_size=3, stride=2, padding=1))"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 8,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "num_channels, growth_rate = 64, 32  # num_channels为当前的通道数\n",
+    "num_convs_in_dense_blocks = [4, 4, 4, 4]\n",
+    "\n",
+    "for i, num_convs in enumerate(num_convs_in_dense_blocks):\n",
+    "    DB = DenseBlock(num_convs, num_channels, growth_rate)\n",
+    "    net.add_module(\"DenseBlosk_%d\" % i, DB)\n",
+    "    # 上一个稠密块的输出通道数\n",
+    "    num_channels = DB.out_channels\n",
+    "    # 在稠密块之间加入通道数减半的过渡层\n",
+    "    if i != len(num_convs_in_dense_blocks) - 1:\n",
+    "        net.add_module(\"transition_block_%d\" % i, transition_block(num_channels, num_channels // 2))\n",
+    "        num_channels = num_channels // 2"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 9,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "net.add_module(\"BN\", nn.BatchNorm2d(num_channels))\n",
+    "net.add_module(\"relu\", nn.ReLU())\n",
+    "net.add_module(\"global_avg_pool\", d2l.GlobalAvgPool2d()) # GlobalAvgPool2d的输出: (Batch, num_channels, 1, 1)\n",
+    "net.add_module(\"fc\", nn.Sequential(d2l.FlattenLayer(), nn.Linear(num_channels, 10))) "
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 10,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "0  output shape:\t torch.Size([1, 64, 48, 48])\n",
+      "1  output shape:\t torch.Size([1, 64, 48, 48])\n",
+      "2  output shape:\t torch.Size([1, 64, 48, 48])\n",
+      "3  output shape:\t torch.Size([1, 64, 24, 24])\n",
+      "DenseBlosk_0  output shape:\t torch.Size([1, 192, 24, 24])\n",
+      "transition_block_0  output shape:\t torch.Size([1, 96, 12, 12])\n",
+      "DenseBlosk_1  output shape:\t torch.Size([1, 224, 12, 12])\n",
+      "transition_block_1  output shape:\t torch.Size([1, 112, 6, 6])\n",
+      "DenseBlosk_2  output shape:\t torch.Size([1, 240, 6, 6])\n",
+      "transition_block_2  output shape:\t torch.Size([1, 120, 3, 3])\n",
+      "DenseBlosk_3  output shape:\t torch.Size([1, 248, 3, 3])\n",
+      "BN  output shape:\t torch.Size([1, 248, 3, 3])\n",
+      "relu  output shape:\t torch.Size([1, 248, 3, 3])\n",
+      "global_avg_pool  output shape:\t torch.Size([1, 248, 1, 1])\n",
+      "fc  output shape:\t torch.Size([1, 10])\n"
+     ]
+    }
+   ],
+   "source": [
+    "X = torch.rand((1, 1, 96, 96))\n",
+    "for name, layer in net.named_children():\n",
+    "    X = layer(X)\n",
+    "    print(name, ' output shape:\\t', X.shape)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 5.12.4 获取数据并训练模型"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 11,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "training on  cuda\n",
+      "epoch 1, loss 0.0020, train acc 0.834, test acc 0.749, time 27.7 sec\n",
+      "epoch 2, loss 0.0011, train acc 0.900, test acc 0.824, time 25.5 sec\n",
+      "epoch 3, loss 0.0009, train acc 0.913, test acc 0.839, time 23.8 sec\n",
+      "epoch 4, loss 0.0008, train acc 0.921, test acc 0.889, time 24.9 sec\n",
+      "epoch 5, loss 0.0008, train acc 0.929, test acc 0.884, time 24.3 sec\n"
+     ]
+    }
+   ],
+   "source": [
+    "batch_size = 256\n",
+    "# 如出现“out of memory”的报错信息，可减小batch_size或resize\n",
+    "train_iter, test_iter = d2l.load_data_fashion_mnist(batch_size, resize=96)\n",
+    "\n",
+    "lr, num_epochs = 0.001, 5\n",
+    "optimizer = torch.optim.Adam(net.parameters(), lr=lr)\n",
+    "d2l.train_ch5(net, train_iter, test_iter, batch_size, optimizer, device, num_epochs)"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python [default]",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.6.4"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}
--- a/code/chapter05_CNN/5.1_conv-layer.ipynb
+++ b/code/chapter05_CNN/5.1_conv-layer.ipynb
@ -0,0 +1,266 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# 5.1 二维卷积层\n",
+    "## 5.1.1 二维互相关运算"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [
+    {
+     "ename": "",
+     "evalue": "",
+     "output_type": "error",
+     "traceback": [
+      "\u001b[1;31mPython 3.6.13 ('deepsort') 需要安装 ipykernel。\n",
+      "Run the following command to install 'ipykernel' into the Python environment. \n",
+      "Command: 'conda install -n deepsort ipykernel --update-deps --force-reinstall'"
+     ]
+    }
+   ],
+   "source": [
+    "import torch \n",
+    "from torch import nn\n",
+    "\n",
+    "print(torch.__version__)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "collapsed": true
+   },
+   "outputs": [],
+   "source": [
+    "def corr2d(X, K):  # 本函数已保存在d2lzh_pytorch包中方便以后使用\n",
+    "    h, w = K.shape\n",
+    "    X, K = X.float(), K.float()\n",
+    "    Y = torch.zeros((X.shape[0] - h + 1, X.shape[1] - w + 1))\n",
+    "    for i in range(Y.shape[0]):\n",
+    "        for j in range(Y.shape[1]):\n",
+    "            Y[i, j] = (X[i: i + h, j: j + w] * K).sum()\n",
+    "    return Y"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "tensor([[19., 25.],\n",
+       "        [37., 43.]])"
+      ]
+     },
+     "execution_count": 3,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "X = torch.tensor([[0, 1, 2], [3, 4, 5], [6, 7, 8]])\n",
+    "K = torch.tensor([[0, 1], [2, 3]])\n",
+    "corr2d(X, K)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 5.1.2 二维卷积层"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "collapsed": true
+   },
+   "outputs": [],
+   "source": [
+    "class Conv2D(nn.Module):\n",
+    "    def __init__(self, kernel_size):\n",
+    "        super(Conv2D, self).__init__()\n",
+    "        self.weight = nn.Parameter(torch.randn(kernel_size))\n",
+    "        self.bias = nn.Parameter(torch.randn(1))\n",
+    "\n",
+    "    def forward(self, x):\n",
+    "        return corr2d(x, self.weight) + self.bias"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 5.1.3 图像中物体边缘检测"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "tensor([[1., 1., 0., 0., 0., 0., 1., 1.],\n",
+       "        [1., 1., 0., 0., 0., 0., 1., 1.],\n",
+       "        [1., 1., 0., 0., 0., 0., 1., 1.],\n",
+       "        [1., 1., 0., 0., 0., 0., 1., 1.],\n",
+       "        [1., 1., 0., 0., 0., 0., 1., 1.],\n",
+       "        [1., 1., 0., 0., 0., 0., 1., 1.]])"
+      ]
+     },
+     "execution_count": 5,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "X = torch.ones(6, 8)\n",
+    "X[:, 2:6] = 0\n",
+    "X"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "collapsed": true
+   },
+   "outputs": [],
+   "source": [
+    "K = torch.tensor([[1, -1]])"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "tensor([[ 0.,  1.,  0.,  0.,  0., -1.,  0.],\n",
+       "        [ 0.,  1.,  0.,  0.,  0., -1.,  0.],\n",
+       "        [ 0.,  1.,  0.,  0.,  0., -1.,  0.],\n",
+       "        [ 0.,  1.,  0.,  0.,  0., -1.,  0.],\n",
+       "        [ 0.,  1.,  0.,  0.,  0., -1.,  0.],\n",
+       "        [ 0.,  1.,  0.,  0.,  0., -1.,  0.]])"
+      ]
+     },
+     "execution_count": 7,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "Y = corr2d(X, K)\n",
+    "Y"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 5.1.4 通过数据学习核数组"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Step 5, loss 1.844\n",
+      "Step 10, loss 0.206\n",
+      "Step 15, loss 0.023\n",
+      "Step 20, loss 0.003\n"
+     ]
+    }
+   ],
+   "source": [
+    "# 构造一个核数组形状是(1, 2)的二维卷积层\n",
+    "conv2d = Conv2D(kernel_size=(1, 2))\n",
+    "\n",
+    "step = 20\n",
+    "lr = 0.01\n",
+    "for i in range(step):\n",
+    "    Y_hat = conv2d(X)\n",
+    "    l = ((Y_hat - Y) ** 2).sum()\n",
+    "    l.backward()\n",
+    "    \n",
+    "    # 梯度下降\n",
+    "    conv2d.weight.data -= lr * conv2d.weight.grad\n",
+    "    conv2d.bias.data -= lr * conv2d.bias.grad\n",
+    "    \n",
+    "    # 梯度清0\n",
+    "    conv2d.weight.grad.fill_(0)\n",
+    "    conv2d.bias.grad.fill_(0)\n",
+    "    if (i + 1) % 5 == 0:\n",
+    "        print('Step %d, loss %.3f' % (i + 1, l.item()))"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "weight:  tensor([[ 0.9948, -1.0092]])\n",
+      "bias:  tensor([0.0080])\n"
+     ]
+    }
+   ],
+   "source": [
+    "print(\"weight: \", conv2d.weight.data)\n",
+    "print(\"bias: \", conv2d.bias.data)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "collapsed": true
+   },
+   "outputs": [],
+   "source": []
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python [default]",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.6.13"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}
--- a/code/chapter05_CNN/5.2_padding-and-strides.ipynb
+++ b/code/chapter05_CNN/5.2_padding-and-strides.ipynb
@ -0,0 +1,170 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# 5.2 填充和步幅"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 1,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "0.4.1\n"
+     ]
+    }
+   ],
+   "source": [
+    "import torch\n",
+    "from torch import nn\n",
+    "\n",
+    "print(torch.__version__)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 5.2.1 填充"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 2,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "torch.Size([8, 8])"
+      ]
+     },
+     "execution_count": 2,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "# 定义一个函数来计算卷积层。它对输入和输出做相应的升维和降维\n",
+    "def comp_conv2d(conv2d, X):\n",
+    "    # (1, 1)代表批量大小和通道数（“多输入通道和多输出通道”一节将介绍）均为1\n",
+    "    X = X.view((1, 1) + X.shape)\n",
+    "    Y = conv2d(X)\n",
+    "    return Y.view(Y.shape[2:])  # 排除不关心的前两维：批量和通道\n",
+    "\n",
+    "# 注意这里是两侧分别填充1行或列，所以在两侧一共填充2行或列\n",
+    "conv2d = nn.Conv2d(in_channels=1, out_channels=1, kernel_size=3, padding=1)\n",
+    "\n",
+    "X = torch.rand(8, 8)\n",
+    "comp_conv2d(conv2d, X).shape"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 3,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "torch.Size([8, 8])"
+      ]
+     },
+     "execution_count": 3,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "# 使用高为5、宽为3的卷积核。在高和宽两侧的填充数分别为2和1\n",
+    "conv2d = nn.Conv2d(in_channels=1, out_channels=1, kernel_size=(5, 3), padding=(2, 1))\n",
+    "comp_conv2d(conv2d, X).shape"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 5.2.2 步幅"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 4,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "torch.Size([4, 4])"
+      ]
+     },
+     "execution_count": 4,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "conv2d = nn.Conv2d(1, 1, kernel_size=3, padding=1, stride=2)\n",
+    "comp_conv2d(conv2d, X).shape"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 5,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "torch.Size([2, 2])"
+      ]
+     },
+     "execution_count": 5,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "conv2d = nn.Conv2d(1, 1, kernel_size=(3, 5), padding=(0, 1), stride=(3, 4))\n",
+    "comp_conv2d(conv2d, X).shape"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "collapsed": true
+   },
+   "outputs": [],
+   "source": []
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python [default]",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.6.3"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}
--- a/code/chapter05_CNN/5.3_channels.ipynb
+++ b/code/chapter05_CNN/5.3_channels.ipynb
@ -0,0 +1,224 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# 5.3 多输入通道和多输出通道\n",
+    "## 5.3.1 多输入通道"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 1,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "0.4.1\n"
+     ]
+    }
+   ],
+   "source": [
+    "import torch\n",
+    "from torch import nn\n",
+    "import sys\n",
+    "sys.path.append(\"..\") \n",
+    "import d2lzh_pytorch as d2l\n",
+    "\n",
+    "print(torch.__version__)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 2,
+   "metadata": {
+    "collapsed": true
+   },
+   "outputs": [],
+   "source": [
+    "def corr2d_multi_in(X, K):\n",
+    "    # 沿着X和K的第0维（通道维）分别计算再相加\n",
+    "    res = d2l.corr2d(X[0, :, :], K[0, :, :])\n",
+    "    for i in range(1, X.shape[0]):\n",
+    "        res += d2l.corr2d(X[i, :, :], K[i, :, :])\n",
+    "    return res"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 3,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "tensor([[ 56.,  72.],\n",
+       "        [104., 120.]])"
+      ]
+     },
+     "execution_count": 3,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "X = torch.tensor([[[0, 1, 2], [3, 4, 5], [6, 7, 8]],\n",
+    "              [[1, 2, 3], [4, 5, 6], [7, 8, 9]]])\n",
+    "K = torch.tensor([[[0, 1], [2, 3]], [[1, 2], [3, 4]]])\n",
+    "\n",
+    "corr2d_multi_in(X, K)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 5.3.2 多输出通道"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 4,
+   "metadata": {
+    "collapsed": true
+   },
+   "outputs": [],
+   "source": [
+    "def corr2d_multi_in_out(X, K):\n",
+    "    # 对K的第0维遍历，每次同输入X做互相关计算。所有结果使用stack函数合并在一起\n",
+    "    return torch.stack([corr2d_multi_in(X, k) for k in K])"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 5,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "torch.Size([3, 2, 2, 2])"
+      ]
+     },
+     "execution_count": 5,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "K = torch.stack([K, K + 1, K + 2])\n",
+    "K.shape"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 6,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "tensor([[[ 56.,  72.],\n",
+       "         [104., 120.]],\n",
+       "\n",
+       "        [[ 76., 100.],\n",
+       "         [148., 172.]],\n",
+       "\n",
+       "        [[ 96., 128.],\n",
+       "         [192., 224.]]])"
+      ]
+     },
+     "execution_count": 6,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "corr2d_multi_in_out(X, K)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 5.3.3 $1\\times 1$卷积层"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 7,
+   "metadata": {
+    "collapsed": true
+   },
+   "outputs": [],
+   "source": [
+    "def corr2d_multi_in_out_1x1(X, K):\n",
+    "    c_i, h, w = X.shape\n",
+    "    c_o = K.shape[0]\n",
+    "    X = X.view(c_i, h * w)\n",
+    "    K = K.view(c_o, c_i)\n",
+    "    Y = torch.mm(K, X)  # 全连接层的矩阵乘法\n",
+    "    return Y.view(c_o, h, w)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 8,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "True"
+      ]
+     },
+     "execution_count": 8,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "X = torch.rand(3, 3, 3)\n",
+    "K = torch.rand(2, 3, 1, 1)\n",
+    "\n",
+    "Y1 = corr2d_multi_in_out_1x1(X, K)\n",
+    "Y2 = corr2d_multi_in_out(X, K)\n",
+    "\n",
+    "(Y1 - Y2).norm().item() < 1e-6"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "collapsed": true
+   },
+   "outputs": [],
+   "source": []
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python [default]",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.6.3"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}
--- a/code/chapter05_CNN/5.4_pooling.ipynb
+++ b/code/chapter05_CNN/5.4_pooling.ipynb
@ -0,0 +1,290 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# 5.4 池化层"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 1,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "0.4.1\n"
+     ]
+    }
+   ],
+   "source": [
+    "import torch\n",
+    "from torch import nn\n",
+    "\n",
+    "print(torch.__version__)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 5.4.1 二维最大池化层和平均池化层"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 2,
+   "metadata": {
+    "collapsed": true
+   },
+   "outputs": [],
+   "source": [
+    "def pool2d(X, pool_size, mode='max'):\n",
+    "    X = X.float()\n",
+    "    p_h, p_w = pool_size\n",
+    "    Y = torch.zeros(X.shape[0] - p_h + 1, X.shape[1] - p_w + 1)\n",
+    "    for i in range(Y.shape[0]):\n",
+    "        for j in range(Y.shape[1]):\n",
+    "            if mode == 'max':\n",
+    "                Y[i, j] = X[i: i + p_h, j: j + p_w].max()\n",
+    "            elif mode == 'avg':\n",
+    "                Y[i, j] = X[i: i + p_h, j: j + p_w].mean()       \n",
+    "    return Y"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 3,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "tensor([[4., 5.],\n",
+       "        [7., 8.]])"
+      ]
+     },
+     "execution_count": 3,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "X = torch.tensor([[0, 1, 2], [3, 4, 5], [6, 7, 8]])\n",
+    "pool2d(X, (2, 2))"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 4,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "tensor([[2., 3.],\n",
+       "        [5., 6.]])"
+      ]
+     },
+     "execution_count": 4,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "pool2d(X, (2, 2), 'avg')"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 5.4.2 填充和步幅"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 5,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "tensor([[[[ 0.,  1.,  2.,  3.],\n",
+       "          [ 4.,  5.,  6.,  7.],\n",
+       "          [ 8.,  9., 10., 11.],\n",
+       "          [12., 13., 14., 15.]]]])"
+      ]
+     },
+     "execution_count": 5,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "X = torch.arange(16, dtype=torch.float).view((1, 1, 4, 4))\n",
+    "X"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 6,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "tensor([[[[10.]]]])"
+      ]
+     },
+     "execution_count": 6,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "pool2d = nn.MaxPool2d(3)\n",
+    "pool2d(X) "
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 7,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "tensor([[[[ 5.,  7.],\n",
+       "          [13., 15.]]]])"
+      ]
+     },
+     "execution_count": 7,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "pool2d = nn.MaxPool2d(3, padding=1, stride=2)\n",
+    "pool2d(X)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 8,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "tensor([[[[ 1.,  3.],\n",
+       "          [ 9., 11.],\n",
+       "          [13., 15.]]]])"
+      ]
+     },
+     "execution_count": 8,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "pool2d = nn.MaxPool2d((2, 4), padding=(1, 2), stride=(2, 3))\n",
+    "pool2d(X)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 5.4.3 多通道"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 9,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "tensor([[[[ 0.,  1.,  2.,  3.],\n",
+       "          [ 4.,  5.,  6.,  7.],\n",
+       "          [ 8.,  9., 10., 11.],\n",
+       "          [12., 13., 14., 15.]],\n",
+       "\n",
+       "         [[ 1.,  2.,  3.,  4.],\n",
+       "          [ 5.,  6.,  7.,  8.],\n",
+       "          [ 9., 10., 11., 12.],\n",
+       "          [13., 14., 15., 16.]]]])"
+      ]
+     },
+     "execution_count": 9,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "X = torch.cat((X, X + 1), dim=1)\n",
+    "X"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 10,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "tensor([[[[ 5.,  7.],\n",
+       "          [13., 15.]],\n",
+       "\n",
+       "         [[ 6.,  8.],\n",
+       "          [14., 16.]]]])"
+      ]
+     },
+     "execution_count": 10,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "pool2d = nn.MaxPool2d(3, padding=1, stride=2)\n",
+    "pool2d(X)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "collapsed": true
+   },
+   "outputs": [],
+   "source": []
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python [default]",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.6.3"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}
--- a/code/chapter05_CNN/5.5_lenet.ipynb
+++ b/code/chapter05_CNN/5.5_lenet.ipynb
@ -0,0 +1,306 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# 5.5 卷积神经网络(LeNet)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 1,
+   "metadata": {
+    "ExecuteTime": {
+     "end_time": "2019-05-29T13:57:37.383972Z",
+     "start_time": "2019-05-29T13:57:34.520559Z"
+    }
+   },
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "1.0.0\n",
+      "cuda\n"
+     ]
+    }
+   ],
+   "source": [
+    "import os\n",
+    "import time\n",
+    "import torch\n",
+    "from torch import nn, optim\n",
+    "\n",
+    "import sys\n",
+    "sys.path.append(\"..\") \n",
+    "import d2lzh_pytorch as d2l\n",
+    "\n",
+    "os.environ[\"CUDA_VISIBLE_DEVICES\"] = \"0\"\n",
+    "device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')\n",
+    "\n",
+    "print(torch.__version__)\n",
+    "print(device)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 5.5.1 LeNet模型 "
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 2,
+   "metadata": {
+    "ExecuteTime": {
+     "end_time": "2019-05-29T13:57:37.394997Z",
+     "start_time": "2019-05-29T13:57:37.386720Z"
+    }
+   },
+   "outputs": [],
+   "source": [
+    "class LeNet(nn.Module):\n",
+    "    def __init__(self):\n",
+    "        super(LeNet, self).__init__()\n",
+    "        self.conv = nn.Sequential(\n",
+    "            nn.Conv2d(1, 6, 5), # in_channels, out_channels, kernel_size\n",
+    "            nn.Sigmoid(),\n",
+    "            nn.MaxPool2d(2, 2), # kernel_size, stride\n",
+    "            nn.Conv2d(6, 16, 5),\n",
+    "            nn.Sigmoid(),\n",
+    "            nn.MaxPool2d(2, 2)\n",
+    "        )\n",
+    "        self.fc = nn.Sequential(\n",
+    "            nn.Linear(16*4*4, 120),\n",
+    "            nn.Sigmoid(),\n",
+    "            nn.Linear(120, 84),\n",
+    "            nn.Sigmoid(),\n",
+    "            nn.Linear(84, 10)\n",
+    "        )\n",
+    "\n",
+    "    def forward(self, img):\n",
+    "        feature = self.conv(img)\n",
+    "        output = self.fc(feature.view(img.shape[0], -1))\n",
+    "        return output"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 3,
+   "metadata": {
+    "ExecuteTime": {
+     "end_time": "2019-05-29T13:57:37.450484Z",
+     "start_time": "2019-05-29T13:57:37.397357Z"
+    }
+   },
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "LeNet(\n",
+      "  (conv): Sequential(\n",
+      "    (0): Conv2d(1, 6, kernel_size=(5, 5), stride=(1, 1))\n",
+      "    (1): Sigmoid()\n",
+      "    (2): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)\n",
+      "    (3): Conv2d(6, 16, kernel_size=(5, 5), stride=(1, 1))\n",
+      "    (4): Sigmoid()\n",
+      "    (5): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)\n",
+      "  )\n",
+      "  (fc): Sequential(\n",
+      "    (0): Linear(in_features=256, out_features=120, bias=True)\n",
+      "    (1): Sigmoid()\n",
+      "    (2): Linear(in_features=120, out_features=84, bias=True)\n",
+      "    (3): Sigmoid()\n",
+      "    (4): Linear(in_features=84, out_features=10, bias=True)\n",
+      "  )\n",
+      ")\n"
+     ]
+    }
+   ],
+   "source": [
+    "net = LeNet()\n",
+    "print(net)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 5.5.2 获取数据和训练模型"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 4,
+   "metadata": {
+    "ExecuteTime": {
+     "end_time": "2019-05-29T13:57:38.432567Z",
+     "start_time": "2019-05-29T13:57:37.452521Z"
+    }
+   },
+   "outputs": [],
+   "source": [
+    "batch_size = 256\n",
+    "train_iter, test_iter = d2l.load_data_fashion_mnist(batch_size=batch_size)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 5,
+   "metadata": {
+    "ExecuteTime": {
+     "end_time": "2019-05-29T13:57:38.442887Z",
+     "start_time": "2019-05-29T13:57:38.435111Z"
+    }
+   },
+   "outputs": [],
+   "source": [
+    "# 本函数已保存在d2lzh_pytorch包中方便以后使用。该函数将被逐步改进：它的完整实现将在“图像增广”一节中描述\n",
+    "def evaluate_accuracy(data_iter, net, device=None):\n",
+    "    if device is None and isinstance(net, torch.nn.Module):\n",
+    "        # 如果没指定device就使用net的device\n",
+    "        device = list(net.parameters())[0].device\n",
+    "    acc_sum, n = 0.0, 0\n",
+    "    with torch.no_grad():\n",
+    "        for X, y in data_iter:\n",
+    "            if isinstance(net, torch.nn.Module):\n",
+    "                net.eval() # 评估模式, 这会关闭dropout\n",
+    "                acc_sum += (net(X.to(device)).argmax(dim=1) == y.to(device)).float().sum().cpu().item()\n",
+    "                net.train() # 改回训练模式\n",
+    "            else: # 自定义的模型, 3.13节之后不会用到, 不考虑GPU\n",
+    "                if('is_training' in net.__code__.co_varnames): # 如果有is_training这个参数\n",
+    "                    # 将is_training设置成False\n",
+    "                    acc_sum += (net(X, is_training=False).argmax(dim=1) == y).float().sum().item() \n",
+    "                else:\n",
+    "                    acc_sum += (net(X).argmax(dim=1) == y).float().sum().item() \n",
+    "            n += y.shape[0]\n",
+    "    return acc_sum / n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 6,
+   "metadata": {
+    "ExecuteTime": {
+     "end_time": "2019-05-29T13:57:38.453480Z",
+     "start_time": "2019-05-29T13:57:38.445655Z"
+    }
+   },
+   "outputs": [],
+   "source": [
+    "# 本函数已保存在d2lzh_pytorch包中方便以后使用\n",
+    "def train_ch5(net, train_iter, test_iter, batch_size, optimizer, device, num_epochs):\n",
+    "    net = net.to(device)\n",
+    "    print(\"training on \", device)\n",
+    "    loss = torch.nn.CrossEntropyLoss()\n",
+    "    batch_count = 0\n",
+    "    for epoch in range(num_epochs):\n",
+    "        train_l_sum, train_acc_sum, n, start = 0.0, 0.0, 0, time.time()\n",
+    "        for X, y in train_iter:\n",
+    "            X = X.to(device)\n",
+    "            y = y.to(device)\n",
+    "            y_hat = net(X)\n",
+    "            l = loss(y_hat, y)\n",
+    "            optimizer.zero_grad()\n",
+    "            l.backward()\n",
+    "            optimizer.step()\n",
+    "            train_l_sum += l.cpu().item()\n",
+    "            train_acc_sum += (y_hat.argmax(dim=1) == y).sum().cpu().item()\n",
+    "            n += y.shape[0]\n",
+    "            batch_count += 1\n",
+    "        test_acc = evaluate_accuracy(test_iter, net)\n",
+    "        print('epoch %d, loss %.4f, train acc %.3f, test acc %.3f, time %.1f sec'\n",
+    "              % (epoch + 1, train_l_sum / batch_count, train_acc_sum / n, test_acc, time.time() - start))"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 7,
+   "metadata": {
+    "ExecuteTime": {
+     "end_time": "2019-05-29T13:58:00.333237Z",
+     "start_time": "2019-05-29T13:57:38.456012Z"
+    }
+   },
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "training on  cuda\n",
+      "epoch 1, loss 1.7885, train acc 0.337, test acc 0.584, time 2.4 sec\n",
+      "epoch 2, loss 0.4793, train acc 0.614, test acc 0.666, time 2.3 sec\n",
+      "epoch 3, loss 0.2637, train acc 0.704, test acc 0.720, time 2.3 sec\n",
+      "epoch 4, loss 0.1747, train acc 0.734, test acc 0.740, time 2.2 sec\n",
+      "epoch 5, loss 0.1282, train acc 0.751, test acc 0.749, time 2.2 sec\n"
+     ]
+    }
+   ],
+   "source": [
+    "lr, num_epochs = 0.001, 5\n",
+    "optimizer = torch.optim.Adam(net.parameters(), lr=lr)\n",
+    "train_ch5(net, train_iter, test_iter, batch_size, optimizer, device, num_epochs)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": []
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.7.3"
+  },
+  "varInspector": {
+   "cols": {
+    "lenName": 16,
+    "lenType": 16,
+    "lenVar": 40
+   },
+   "kernels_config": {
+    "python": {
+     "delete_cmd_postfix": "",
+     "delete_cmd_prefix": "del ",
+     "library": "var_list.py",
+     "varRefreshCmd": "print(var_dic_list())"
+    },
+    "r": {
+     "delete_cmd_postfix": ") ",
+     "delete_cmd_prefix": "rm(",
+     "library": "var_list.r",
+     "varRefreshCmd": "cat(var_dic_list()) "
+    }
+   },
+   "types_to_exclude": [
+    "module",
+    "function",
+    "builtin_function_or_method",
+    "instance",
+    "_Feature"
+   ],
+   "window_display": false
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}
--- a/code/chapter05_CNN/5.6_alexnet.ipynb
+++ b/code/chapter05_CNN/5.6_alexnet.ipynb
@ -0,0 +1,290 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# 5.6 深度卷积神经网络（AlexNet）"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 1,
+   "metadata": {
+    "ExecuteTime": {
+     "end_time": "2019-03-19T07:36:45.657048Z",
+     "start_time": "2019-03-19T07:36:45.285668Z"
+    }
+   },
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "0.4.0\n",
+      "0.2.1\n",
+      "cuda\n"
+     ]
+    }
+   ],
+   "source": [
+    "import time\n",
+    "import torch\n",
+    "from torch import nn, optim\n",
+    "import torchvision\n",
+    "\n",
+    "import sys\n",
+    "sys.path.append(\"..\") \n",
+    "import d2lzh_pytorch as d2l\n",
+    "device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')\n",
+    "\n",
+    "print(torch.__version__)\n",
+    "print(torchvision.__version__)\n",
+    "print(device)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 5.6.2 AlexNet"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 2,
+   "metadata": {
+    "ExecuteTime": {
+     "end_time": "2019-03-19T07:36:45.703036Z",
+     "start_time": "2019-03-19T07:36:45.658231Z"
+    }
+   },
+   "outputs": [],
+   "source": [
+    "class AlexNet(nn.Module):\n",
+    "    def __init__(self):\n",
+    "        super(AlexNet, self).__init__()\n",
+    "        self.conv = nn.Sequential(\n",
+    "            nn.Conv2d(1, 96, 11, 4), # in_channels, out_channels, kernel_size, stride, padding\n",
+    "            nn.ReLU(),\n",
+    "            nn.MaxPool2d(3, 2), # kernel_size, stride\n",
+    "            # 减小卷积窗口，使用填充为2来使得输入与输出的高和宽一致，且增大输出通道数\n",
+    "            nn.Conv2d(96, 256, 5, 1, 2),\n",
+    "            nn.ReLU(),\n",
+    "            nn.MaxPool2d(3, 2),\n",
+    "            # 连续3个卷积层，且使用更小的卷积窗口。除了最后的卷积层外，进一步增大了输出通道数。\n",
+    "            # 前两个卷积层后不使用池化层来减小输入的高和宽\n",
+    "            nn.Conv2d(256, 384, 3, 1, 1),\n",
+    "            nn.ReLU(),\n",
+    "            nn.Conv2d(384, 384, 3, 1, 1),\n",
+    "            nn.ReLU(),\n",
+    "            nn.Conv2d(384, 256, 3, 1, 1),\n",
+    "            nn.ReLU(),\n",
+    "            nn.MaxPool2d(3, 2)\n",
+    "        )\n",
+    "         # 这里全连接层的输出个数比LeNet中的大数倍。使用丢弃层来缓解过拟合\n",
+    "        self.fc = nn.Sequential(\n",
+    "            nn.Linear(256*5*5, 4096),\n",
+    "            nn.ReLU(),\n",
+    "            nn.Dropout(0.5),\n",
+    "            nn.Linear(4096, 4096),\n",
+    "            nn.ReLU(),\n",
+    "            nn.Dropout(0.5),\n",
+    "            # 输出层。由于这里使用Fashion-MNIST，所以用类别数为10，而非论文中的1000\n",
+    "            nn.Linear(4096, 10),\n",
+    "        )\n",
+    "\n",
+    "    def forward(self, img):\n",
+    "        feature = self.conv(img)\n",
+    "        output = self.fc(feature.view(img.shape[0], -1))\n",
+    "        return output"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 3,
+   "metadata": {
+    "ExecuteTime": {
+     "end_time": "2019-03-19T07:36:46.053598Z",
+     "start_time": "2019-03-19T07:36:45.704356Z"
+    }
+   },
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "AlexNet(\n",
+      "  (conv): Sequential(\n",
+      "    (0): Conv2d(1, 96, kernel_size=(11, 11), stride=(4, 4))\n",
+      "    (1): ReLU()\n",
+      "    (2): MaxPool2d(kernel_size=3, stride=2, padding=0, dilation=1, ceil_mode=False)\n",
+      "    (3): Conv2d(96, 256, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2))\n",
+      "    (4): ReLU()\n",
+      "    (5): MaxPool2d(kernel_size=3, stride=2, padding=0, dilation=1, ceil_mode=False)\n",
+      "    (6): Conv2d(256, 384, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))\n",
+      "    (7): ReLU()\n",
+      "    (8): Conv2d(384, 384, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))\n",
+      "    (9): ReLU()\n",
+      "    (10): Conv2d(384, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))\n",
+      "    (11): ReLU()\n",
+      "    (12): MaxPool2d(kernel_size=3, stride=2, padding=0, dilation=1, ceil_mode=False)\n",
+      "  )\n",
+      "  (fc): Sequential(\n",
+      "    (0): Linear(in_features=6400, out_features=4096, bias=True)\n",
+      "    (1): ReLU()\n",
+      "    (2): Dropout(p=0.5)\n",
+      "    (3): Linear(in_features=4096, out_features=4096, bias=True)\n",
+      "    (4): ReLU()\n",
+      "    (5): Dropout(p=0.5)\n",
+      "    (6): Linear(in_features=4096, out_features=10, bias=True)\n",
+      "  )\n",
+      ")\n"
+     ]
+    }
+   ],
+   "source": [
+    "net = AlexNet()\n",
+    "print(net)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 5.6.3 读取数据"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 4,
+   "metadata": {
+    "ExecuteTime": {
+     "end_time": "2019-03-19T07:36:46.066761Z",
+     "start_time": "2019-03-19T07:36:46.054928Z"
+    }
+   },
+   "outputs": [],
+   "source": [
+    "# 本函数已保存在d2lzh_pytorch包中方便以后使用\n",
+    "def load_data_fashion_mnist(batch_size, resize=None, root='~/Datasets/FashionMNIST'):\n",
+    "    \"\"\"Download the fashion mnist dataset and then load into memory.\"\"\"\n",
+    "    trans = []\n",
+    "    if resize:\n",
+    "        trans.append(torchvision.transforms.Resize(size=resize))\n",
+    "    trans.append(torchvision.transforms.ToTensor())\n",
+    "    \n",
+    "    transform = torchvision.transforms.Compose(trans)\n",
+    "    mnist_train = torchvision.datasets.FashionMNIST(root=root, train=True, download=True, transform=transform)\n",
+    "    mnist_test = torchvision.datasets.FashionMNIST(root=root, train=False, download=True, transform=transform)\n",
+    "\n",
+    "    train_iter = torch.utils.data.DataLoader(mnist_train, batch_size=batch_size, shuffle=True, num_workers=4)\n",
+    "    test_iter = torch.utils.data.DataLoader(mnist_test, batch_size=batch_size, shuffle=False, num_workers=4)\n",
+    "\n",
+    "    return train_iter, test_iter"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 5,
+   "metadata": {
+    "ExecuteTime": {
+     "end_time": "2019-03-19T07:36:46.091524Z",
+     "start_time": "2019-03-19T07:36:46.067835Z"
+    }
+   },
+   "outputs": [],
+   "source": [
+    "batch_size = 128\n",
+    "# 如出现“out of memory”的报错信息，可减小batch_size或resize\n",
+    "train_iter, test_iter = load_data_fashion_mnist(batch_size, resize=224)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 5.6.4 训练"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 6,
+   "metadata": {
+    "ExecuteTime": {
+     "end_time": "2019-03-19T07:36:47.850402Z",
+     "start_time": "2019-03-19T07:36:46.092485Z"
+    }
+   },
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "training on  cuda\n",
+      "epoch 1, loss 0.0047, train acc 0.770, test acc 0.865, time 128.3 sec\n",
+      "epoch 2, loss 0.0025, train acc 0.879, test acc 0.889, time 128.8 sec\n",
+      "epoch 3, loss 0.0022, train acc 0.898, test acc 0.901, time 130.4 sec\n",
+      "epoch 4, loss 0.0019, train acc 0.908, test acc 0.900, time 131.4 sec\n",
+      "epoch 5, loss 0.0018, train acc 0.913, test acc 0.902, time 129.9 sec\n"
+     ]
+    }
+   ],
+   "source": [
+    "lr, num_epochs = 0.001, 5\n",
+    "optimizer = torch.optim.Adam(net.parameters(), lr=lr)\n",
+    "d2l.train_ch5(net, train_iter, test_iter, batch_size, optimizer, device, num_epochs)"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python [default]",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.6.4"
+  },
+  "varInspector": {
+   "cols": {
+    "lenName": 16,
+    "lenType": 16,
+    "lenVar": 40
+   },
+   "kernels_config": {
+    "python": {
+     "delete_cmd_postfix": "",
+     "delete_cmd_prefix": "del ",
+     "library": "var_list.py",
+     "varRefreshCmd": "print(var_dic_list())"
+    },
+    "r": {
+     "delete_cmd_postfix": ") ",
+     "delete_cmd_prefix": "rm(",
+     "library": "var_list.r",
+     "varRefreshCmd": "cat(var_dic_list()) "
+    }
+   },
+   "types_to_exclude": [
+    "module",
+    "function",
+    "builtin_function_or_method",
+    "instance",
+    "_Feature"
+   ],
+   "window_display": false
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}
--- a/code/chapter05_CNN/5.7_vgg.ipynb
+++ b/code/chapter05_CNN/5.7_vgg.ipynb
@ -0,0 +1,260 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# 5.7 使用重复元素的网络（VGG）"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 1,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "0.4.0\n",
+      "cuda\n"
+     ]
+    }
+   ],
+   "source": [
+    "import time\n",
+    "import torch\n",
+    "from torch import nn, optim\n",
+    "\n",
+    "import sys\n",
+    "sys.path.append(\"..\") \n",
+    "import d2lzh_pytorch as d2l\n",
+    "device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')\n",
+    "\n",
+    "print(torch.__version__)\n",
+    "print(device)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 5.7.1 VGG块"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 2,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "def vgg_block(num_convs, in_channels, out_channels):\n",
+    "    blk = []\n",
+    "    for i in range(num_convs):\n",
+    "        if i == 0:\n",
+    "            blk.append(nn.Conv2d(in_channels, out_channels, kernel_size=3, padding=1))\n",
+    "        else:\n",
+    "            blk.append(nn.Conv2d(out_channels, out_channels, kernel_size=3, padding=1))\n",
+    "        blk.append(nn.ReLU())\n",
+    "    blk.append(nn.MaxPool2d(kernel_size=2, stride=2))\n",
+    "    return nn.Sequential(*blk)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 5.7.2 VGG网络"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 3,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "conv_arch = ((1, 1, 64), (1, 64, 128), (2, 128, 256), (2, 256, 512), (2, 512, 512))\n",
+    "fc_features = 512 * 7 * 7 # 根据卷积层的输出算出来的\n",
+    "fc_hidden_units = 4096 # 任意"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 4,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "def vgg(conv_arch, fc_features, fc_hidden_units=4096):\n",
+    "    net = nn.Sequential()\n",
+    "    # 卷积层部分\n",
+    "    for i, (num_convs, in_channels, out_channels) in enumerate(conv_arch):\n",
+    "        net.add_module(\"vgg_block_\" + str(i+1), vgg_block(num_convs, in_channels, out_channels))\n",
+    "    # 全连接层部分\n",
+    "    net.add_module(\"fc\", nn.Sequential(d2l.FlattenLayer(),\n",
+    "                                 nn.Linear(fc_features, fc_hidden_units),\n",
+    "                                 nn.ReLU(),\n",
+    "                                 nn.Dropout(0.5),\n",
+    "                                 nn.Linear(fc_hidden_units, fc_hidden_units),\n",
+    "                                 nn.ReLU(),\n",
+    "                                 nn.Dropout(0.5),\n",
+    "                                 nn.Linear(fc_hidden_units, 10)\n",
+    "                                ))\n",
+    "    return net"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 5,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "vgg_block_1 output shape:  torch.Size([1, 64, 112, 112])\n",
+      "vgg_block_2 output shape:  torch.Size([1, 128, 56, 56])\n",
+      "vgg_block_3 output shape:  torch.Size([1, 256, 28, 28])\n",
+      "vgg_block_4 output shape:  torch.Size([1, 512, 14, 14])\n",
+      "vgg_block_5 output shape:  torch.Size([1, 512, 7, 7])\n",
+      "fc output shape:  torch.Size([1, 10])\n"
+     ]
+    }
+   ],
+   "source": [
+    "net = vgg(conv_arch, fc_features, fc_hidden_units)\n",
+    "X = torch.rand(1, 1, 224, 224)\n",
+    "\n",
+    "# named_children获取一级子模块及其名字(named_modules会返回所有子模块,包括子模块的子模块)\n",
+    "for name, blk in net.named_children(): \n",
+    "    X = blk(X)\n",
+    "    print(name, 'output shape: ', X.shape)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 6,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Sequential(\n",
+      "  (vgg_block_1): Sequential(\n",
+      "    (0): Conv2d(1, 8, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))\n",
+      "    (1): ReLU()\n",
+      "    (2): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)\n",
+      "  )\n",
+      "  (vgg_block_2): Sequential(\n",
+      "    (0): Conv2d(8, 16, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))\n",
+      "    (1): ReLU()\n",
+      "    (2): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)\n",
+      "  )\n",
+      "  (vgg_block_3): Sequential(\n",
+      "    (0): Conv2d(16, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))\n",
+      "    (1): ReLU()\n",
+      "    (2): Conv2d(32, 32, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))\n",
+      "    (3): ReLU()\n",
+      "    (4): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)\n",
+      "  )\n",
+      "  (vgg_block_4): Sequential(\n",
+      "    (0): Conv2d(32, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))\n",
+      "    (1): ReLU()\n",
+      "    (2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))\n",
+      "    (3): ReLU()\n",
+      "    (4): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)\n",
+      "  )\n",
+      "  (vgg_block_5): Sequential(\n",
+      "    (0): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))\n",
+      "    (1): ReLU()\n",
+      "    (2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))\n",
+      "    (3): ReLU()\n",
+      "    (4): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)\n",
+      "  )\n",
+      "  (fc): Sequential(\n",
+      "    (0): FlattenLayer()\n",
+      "    (1): Linear(in_features=3136, out_features=512, bias=True)\n",
+      "    (2): ReLU()\n",
+      "    (3): Dropout(p=0.5)\n",
+      "    (4): Linear(in_features=512, out_features=512, bias=True)\n",
+      "    (5): ReLU()\n",
+      "    (6): Dropout(p=0.5)\n",
+      "    (7): Linear(in_features=512, out_features=10, bias=True)\n",
+      "  )\n",
+      ")\n"
+     ]
+    }
+   ],
+   "source": [
+    "ratio = 8\n",
+    "small_conv_arch = [(1, 1, 64//ratio), (1, 64//ratio, 128//ratio), (2, 128//ratio, 256//ratio), \n",
+    "                   (2, 256//ratio, 512//ratio), (2, 512//ratio, 512//ratio)]\n",
+    "net = vgg(small_conv_arch, fc_features // ratio, fc_hidden_units // ratio)\n",
+    "print(net)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 5.7.3 获取数据和训练模型"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 7,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "training on  cuda\n",
+      "epoch 1, loss 0.0101, train acc 0.755, test acc 0.859, time 255.9 sec\n",
+      "epoch 2, loss 0.0051, train acc 0.882, test acc 0.902, time 238.1 sec\n",
+      "epoch 3, loss 0.0043, train acc 0.900, test acc 0.908, time 225.5 sec\n",
+      "epoch 4, loss 0.0038, train acc 0.913, test acc 0.914, time 230.3 sec\n",
+      "epoch 5, loss 0.0035, train acc 0.919, test acc 0.918, time 153.9 sec\n"
+     ]
+    }
+   ],
+   "source": [
+    "batch_size = 64\n",
+    "# 如出现“out of memory”的报错信息，可减小batch_size或resize\n",
+    "train_iter, test_iter = d2l.load_data_fashion_mnist(batch_size, resize=224)\n",
+    "\n",
+    "lr, num_epochs = 0.001, 5\n",
+    "optimizer = torch.optim.Adam(net.parameters(), lr=lr)\n",
+    "d2l.train_ch5(net, train_iter, test_iter, batch_size, optimizer, device, num_epochs)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": []
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python [default]",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.6.4"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}
--- a/code/chapter05_CNN/5.8_nin.ipynb
+++ b/code/chapter05_CNN/5.8_nin.ipynb
@ -0,0 +1,177 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# 5.8 网络中的网络（NiN）"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 1,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "0.4.0\n",
+      "cuda\n"
+     ]
+    }
+   ],
+   "source": [
+    "import time\n",
+    "import torch\n",
+    "from torch import nn, optim\n",
+    "\n",
+    "import sys\n",
+    "sys.path.append(\"..\") \n",
+    "import d2lzh_pytorch as d2l\n",
+    "device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')\n",
+    "\n",
+    "print(torch.__version__)\n",
+    "print(device)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 5.8.1 NiN块"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 2,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "def nin_block(in_channels, out_channels, kernel_size, stride, padding):\n",
+    "    blk = nn.Sequential(nn.Conv2d(in_channels, out_channels, kernel_size, stride, padding),\n",
+    "                        nn.ReLU(),\n",
+    "                        nn.Conv2d(out_channels, out_channels, kernel_size=1),\n",
+    "                        nn.ReLU(),\n",
+    "                        nn.Conv2d(out_channels, out_channels, kernel_size=1),\n",
+    "                        nn.ReLU())\n",
+    "    return blk"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 5.8.2 NiN模型"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 3,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "net = nn.Sequential(\n",
+    "    nin_block(1, 96, kernel_size=11, stride=4, padding=0),\n",
+    "    nn.MaxPool2d(kernel_size=3, stride=2),\n",
+    "    nin_block(96, 256, kernel_size=5, stride=1, padding=2),\n",
+    "    nn.MaxPool2d(kernel_size=3, stride=2),\n",
+    "    nin_block(256, 384, kernel_size=3, stride=1, padding=1),\n",
+    "    nn.MaxPool2d(kernel_size=3, stride=2), \n",
+    "    nn.Dropout(0.5),\n",
+    "    # 标签类别数是10\n",
+    "    nin_block(384, 10, kernel_size=3, stride=1, padding=1),\n",
+    "    # 全局平均池化层可通过将窗口形状设置成输入的高和宽实现\n",
+    "    nn.AvgPool2d(kernel_size=5),\n",
+    "    # 将四维的输出转成二维的输出，其形状为(批量大小, 10)\n",
+    "    d2l.FlattenLayer())"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 4,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "0 output shape:  torch.Size([1, 96, 54, 54])\n",
+      "1 output shape:  torch.Size([1, 96, 26, 26])\n",
+      "2 output shape:  torch.Size([1, 256, 26, 26])\n",
+      "3 output shape:  torch.Size([1, 256, 12, 12])\n",
+      "4 output shape:  torch.Size([1, 384, 12, 12])\n",
+      "5 output shape:  torch.Size([1, 384, 5, 5])\n",
+      "6 output shape:  torch.Size([1, 384, 5, 5])\n",
+      "7 output shape:  torch.Size([1, 10, 5, 5])\n",
+      "8 output shape:  torch.Size([1, 10, 1, 1])\n",
+      "9 output shape:  torch.Size([1, 10])\n"
+     ]
+    }
+   ],
+   "source": [
+    "X = torch.rand(1, 1, 224, 224)\n",
+    "\n",
+    "for name, blk in net.named_children(): \n",
+    "    X = blk(X)\n",
+    "    print(name, 'output shape: ', X.shape)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 5.8.3 获取数据和训练模型"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 5,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "training on  cuda\n",
+      "epoch 1, loss 0.0101, train acc 0.513, test acc 0.734, time 260.9 sec\n",
+      "epoch 2, loss 0.0050, train acc 0.763, test acc 0.754, time 175.1 sec\n",
+      "epoch 3, loss 0.0041, train acc 0.808, test acc 0.826, time 151.0 sec\n",
+      "epoch 4, loss 0.0037, train acc 0.828, test acc 0.827, time 151.0 sec\n",
+      "epoch 5, loss 0.0034, train acc 0.839, test acc 0.831, time 151.0 sec\n"
+     ]
+    }
+   ],
+   "source": [
+    "batch_size = 128\n",
+    "# 如出现“out of memory”的报错信息，可减小batch_size或resize\n",
+    "train_iter, test_iter = d2l.load_data_fashion_mnist(batch_size, resize=224)\n",
+    "\n",
+    "lr, num_epochs = 0.002, 5\n",
+    "optimizer = torch.optim.Adam(net.parameters(), lr=lr)\n",
+    "d2l.train_ch5(net, train_iter, test_iter, batch_size, optimizer, device, num_epochs)"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python [default]",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.6.4"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}
--- a/code/chapter05_CNN/5.9_googlenet.ipynb
+++ b/code/chapter05_CNN/5.9_googlenet.ipynb
@ -0,0 +1,232 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# 5.9 含并行连结的网络（GoogLeNet）"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 1,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "0.4.0\n",
+      "cuda\n"
+     ]
+    }
+   ],
+   "source": [
+    "import time\n",
+    "import torch\n",
+    "from torch import nn, optim\n",
+    "import torch.nn.functional as F\n",
+    "\n",
+    "import sys\n",
+    "sys.path.append(\"..\") \n",
+    "import d2lzh_pytorch as d2l\n",
+    "device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')\n",
+    "\n",
+    "print(torch.__version__)\n",
+    "print(device)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 5.9.1 Inception 块"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 2,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "class Inception(nn.Module):\n",
+    "    # c1 - c4为每条线路里的层的输出通道数\n",
+    "    def __init__(self, in_c, c1, c2, c3, c4):\n",
+    "        super(Inception, self).__init__()\n",
+    "        # 线路1，单1 x 1卷积层\n",
+    "        self.p1_1 = nn.Conv2d(in_c, c1, kernel_size=1)\n",
+    "        # 线路2，1 x 1卷积层后接3 x 3卷积层\n",
+    "        self.p2_1 = nn.Conv2d(in_c, c2[0], kernel_size=1)\n",
+    "        self.p2_2 = nn.Conv2d(c2[0], c2[1], kernel_size=3, padding=1)\n",
+    "        # 线路3，1 x 1卷积层后接5 x 5卷积层\n",
+    "        self.p3_1 = nn.Conv2d(in_c, c3[0], kernel_size=1)\n",
+    "        self.p3_2 = nn.Conv2d(c3[0], c3[1], kernel_size=5, padding=2)\n",
+    "        # 线路4，3 x 3最大池化层后接1 x 1卷积层\n",
+    "        self.p4_1 = nn.MaxPool2d(kernel_size=3, stride=1, padding=1)\n",
+    "        self.p4_2 = nn.Conv2d(in_c, c4, kernel_size=1)\n",
+    "\n",
+    "    def forward(self, x):\n",
+    "        p1 = F.relu(self.p1_1(x))\n",
+    "        p2 = F.relu(self.p2_2(F.relu(self.p2_1(x))))\n",
+    "        p3 = F.relu(self.p3_2(F.relu(self.p3_1(x))))\n",
+    "        p4 = F.relu(self.p4_2(self.p4_1(x)))\n",
+    "        return torch.cat((p1, p2, p3, p4), dim=1)  # 在通道维上连结输出"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 5.9.2 GoogLeNet模型"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 3,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "b1 = nn.Sequential(nn.Conv2d(1, 64, kernel_size=7, stride=2, padding=3),\n",
+    "                   nn.ReLU(),\n",
+    "                   nn.MaxPool2d(kernel_size=3, stride=2, padding=1))"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 4,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "b2 = nn.Sequential(nn.Conv2d(64, 64, kernel_size=1),\n",
+    "                   nn.Conv2d(64, 192, kernel_size=3, padding=1),\n",
+    "                   nn.MaxPool2d(kernel_size=3, stride=2, padding=1))"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 5,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "b3 = nn.Sequential(Inception(192, 64, (96, 128), (16, 32), 32),\n",
+    "                   Inception(256, 128, (128, 192), (32, 96), 64),\n",
+    "                   nn.MaxPool2d(kernel_size=3, stride=2, padding=1))"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 6,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "b4 = nn.Sequential(Inception(480, 192, (96, 208), (16, 48), 64),\n",
+    "                   Inception(512, 160, (112, 224), (24, 64), 64),\n",
+    "                   Inception(512, 128, (128, 256), (24, 64), 64),\n",
+    "                   Inception(512, 112, (144, 288), (32, 64), 64),\n",
+    "                   Inception(528, 256, (160, 320), (32, 128), 128),\n",
+    "                   nn.MaxPool2d(kernel_size=3, stride=2, padding=1))"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 7,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "b5 = nn.Sequential(Inception(832, 256, (160, 320), (32, 128), 128),\n",
+    "                   Inception(832, 384, (192, 384), (48, 128), 128),\n",
+    "                   d2l.GlobalAvgPool2d())"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 8,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "output shape:  torch.Size([1, 64, 24, 24])\n",
+      "output shape:  torch.Size([1, 192, 12, 12])\n",
+      "output shape:  torch.Size([1, 480, 6, 6])\n",
+      "output shape:  torch.Size([1, 832, 3, 3])\n",
+      "output shape:  torch.Size([1, 1024, 1, 1])\n",
+      "output shape:  torch.Size([1, 1024])\n",
+      "output shape:  torch.Size([1, 10])\n"
+     ]
+    }
+   ],
+   "source": [
+    "net = nn.Sequential(b1, b2, b3, b4, b5, d2l.FlattenLayer(), nn.Linear(1024, 10))\n",
+    "X = torch.rand(1, 1, 96, 96)\n",
+    "for blk in net.children(): \n",
+    "    X = blk(X)\n",
+    "    print('output shape: ', X.shape)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 5.9.3 获取数据和训练模型"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 9,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "training on  cuda\n",
+      "epoch 1, loss 0.0087, train acc 0.570, test acc 0.831, time 45.5 sec\n",
+      "epoch 2, loss 0.0032, train acc 0.851, test acc 0.853, time 48.5 sec\n",
+      "epoch 3, loss 0.0026, train acc 0.880, test acc 0.883, time 45.4 sec\n",
+      "epoch 4, loss 0.0022, train acc 0.895, test acc 0.887, time 46.6 sec\n",
+      "epoch 5, loss 0.0020, train acc 0.906, test acc 0.896, time 43.5 sec\n"
+     ]
+    }
+   ],
+   "source": [
+    "batch_size = 128\n",
+    "# 如出现“out of memory”的报错信息，可减小batch_size或resize\n",
+    "train_iter, test_iter = d2l.load_data_fashion_mnist(batch_size, resize=96)\n",
+    "\n",
+    "lr, num_epochs = 0.001, 5\n",
+    "optimizer = torch.optim.Adam(net.parameters(), lr=lr)\n",
+    "d2l.train_ch5(net, train_iter, test_iter, batch_size, optimizer, device, num_epochs)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": []
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python [default]",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.6.4"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}
--- a/code/chapter06_RNN/6.2_rnn.ipynb
+++ b/code/chapter06_RNN/6.2_rnn.ipynb
@ -0,0 +1,106 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# 6.2 循环神经网络"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 1,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "0.4.1\n"
+     ]
+    }
+   ],
+   "source": [
+    "import torch\n",
+    "\n",
+    "print(torch.__version__)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 2,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "tensor([[ 5.2633, -3.2288,  0.6037, -1.3321],\n",
+       "        [ 9.4012, -6.7830,  1.0630, -0.1809],\n",
+       "        [ 7.0355, -2.2361,  0.7469, -3.4667]])"
+      ]
+     },
+     "execution_count": 2,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "X, W_xh = torch.randn(3, 1), torch.randn(1, 4)\n",
+    "H, W_hh = torch.randn(3, 4), torch.randn(4, 4)\n",
+    "torch.matmul(X, W_xh) + torch.matmul(H, W_hh)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 3,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "tensor([[ 5.2633, -3.2288,  0.6037, -1.3321],\n",
+       "        [ 9.4012, -6.7830,  1.0630, -0.1809],\n",
+       "        [ 7.0355, -2.2361,  0.7469, -3.4667]])"
+      ]
+     },
+     "execution_count": 3,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "torch.matmul(torch.cat((X, H), dim=1), torch.cat((W_xh, W_hh), dim=0))"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "collapsed": true
+   },
+   "outputs": [],
+   "source": []
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python [default]",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.6.3"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}
--- a/code/chapter06_RNN/6.3_lang-model-dataset.ipynb
+++ b/code/chapter06_RNN/6.3_lang-model-dataset.ipynb
@ -0,0 +1,283 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# 6.3 语言模型数据集（周杰伦专辑歌词）"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 1,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "0.4.1\n",
+      "cpu\n"
+     ]
+    }
+   ],
+   "source": [
+    "import torch\n",
+    "import random\n",
+    "import zipfile\n",
+    "\n",
+    "device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')\n",
+    "print(torch.__version__)\n",
+    "print(device)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 6.3.1 读取数据集"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 2,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "'想要有直升机\\n想要和你飞到宇宙去\\n想要和你融化在一起\\n融化在宇宙里\\n我每天每天每'"
+      ]
+     },
+     "execution_count": 2,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "with zipfile.ZipFile('../../data/jaychou_lyrics.txt.zip') as zin:\n",
+    "    with zin.open('jaychou_lyrics.txt') as f:\n",
+    "        corpus_chars = f.read().decode('utf-8')\n",
+    "corpus_chars[:40]"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 3,
+   "metadata": {
+    "collapsed": true
+   },
+   "outputs": [],
+   "source": [
+    "corpus_chars = corpus_chars.replace('\\n', ' ').replace('\\r', ' ')\n",
+    "corpus_chars = corpus_chars[0:10000]"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 6.3.2 建立字符索引"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 4,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "1027"
+      ]
+     },
+     "execution_count": 4,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "idx_to_char = list(set(corpus_chars))\n",
+    "char_to_idx = dict([(char, i) for i, char in enumerate(idx_to_char)])\n",
+    "vocab_size = len(char_to_idx)\n",
+    "vocab_size"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 5,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "chars: 想要有直升机 想要和你飞到宇宙去 想要和\n",
+      "indices: [981, 858, 519, 53, 577, 1005, 299, 981, 858, 856, 550, 956, 672, 948, 1003, 334, 299, 981, 858, 856]\n"
+     ]
+    }
+   ],
+   "source": [
+    "corpus_indices = [char_to_idx[char] for char in corpus_chars]\n",
+    "sample = corpus_indices[:20]\n",
+    "print('chars:', ''.join([idx_to_char[idx] for idx in sample]))\n",
+    "print('indices:', sample)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 6.3.3 时序数据的采样\n",
+    "### 6.3.3.1 随机采样"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 6,
+   "metadata": {
+    "collapsed": true
+   },
+   "outputs": [],
+   "source": [
+    "# 本函数已保存在d2lzh_pytorch包中方便以后使用\n",
+    "def data_iter_random(corpus_indices, batch_size, num_steps, device=None):\n",
+    "    # 减1是因为输出的索引x是相应输入的索引y加1\n",
+    "    num_examples = (len(corpus_indices) - 1) // num_steps\n",
+    "    epoch_size = num_examples // batch_size\n",
+    "    example_indices = list(range(num_examples))\n",
+    "    random.shuffle(example_indices)\n",
+    "\n",
+    "    # 返回从pos开始的长为num_steps的序列\n",
+    "    def _data(pos):\n",
+    "        return corpus_indices[pos: pos + num_steps]\n",
+    "    if device is None:\n",
+    "        device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')\n",
+    "    \n",
+    "    for i in range(epoch_size):\n",
+    "        # 每次读取batch_size个随机样本\n",
+    "        i = i * batch_size\n",
+    "        batch_indices = example_indices[i: i + batch_size]\n",
+    "        X = [_data(j * num_steps) for j in batch_indices]\n",
+    "        Y = [_data(j * num_steps + 1) for j in batch_indices]\n",
+    "        yield torch.tensor(X, dtype=torch.float32, device=device), torch.tensor(Y, dtype=torch.float32, device=device)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 7,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "X:  tensor([[18., 19., 20., 21., 22., 23.],\n",
+      "        [12., 13., 14., 15., 16., 17.]]) \n",
+      "Y: tensor([[19., 20., 21., 22., 23., 24.],\n",
+      "        [13., 14., 15., 16., 17., 18.]]) \n",
+      "\n",
+      "X:  tensor([[ 0.,  1.,  2.,  3.,  4.,  5.],\n",
+      "        [ 6.,  7.,  8.,  9., 10., 11.]]) \n",
+      "Y: tensor([[ 1.,  2.,  3.,  4.,  5.,  6.],\n",
+      "        [ 7.,  8.,  9., 10., 11., 12.]]) \n",
+      "\n"
+     ]
+    }
+   ],
+   "source": [
+    "my_seq = list(range(30))\n",
+    "for X, Y in data_iter_random(my_seq, batch_size=2, num_steps=6):\n",
+    "    print('X: ', X, '\\nY:', Y, '\\n')"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### 6.3.3.2 相邻采样"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 8,
+   "metadata": {
+    "collapsed": true
+   },
+   "outputs": [],
+   "source": [
+    "# 本函数已保存在d2lzh_pytorch包中方便以后使用\n",
+    "def data_iter_consecutive(corpus_indices, batch_size, num_steps, device=None):\n",
+    "    if device is None:\n",
+    "        device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')\n",
+    "    corpus_indices = torch.tensor(corpus_indices, dtype=torch.float32, device=device)\n",
+    "    data_len = len(corpus_indices)\n",
+    "    batch_len = data_len // batch_size\n",
+    "    indices = corpus_indices[0: batch_size*batch_len].view(batch_size, batch_len)\n",
+    "    epoch_size = (batch_len - 1) // num_steps\n",
+    "    for i in range(epoch_size):\n",
+    "        i = i * num_steps\n",
+    "        X = indices[:, i: i + num_steps]\n",
+    "        Y = indices[:, i + 1: i + num_steps + 1]\n",
+    "        yield X, Y"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 9,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "X:  tensor([[ 0.,  1.,  2.,  3.,  4.,  5.],\n",
+      "        [15., 16., 17., 18., 19., 20.]]) \n",
+      "Y: tensor([[ 1.,  2.,  3.,  4.,  5.,  6.],\n",
+      "        [16., 17., 18., 19., 20., 21.]]) \n",
+      "\n",
+      "X:  tensor([[ 6.,  7.,  8.,  9., 10., 11.],\n",
+      "        [21., 22., 23., 24., 25., 26.]]) \n",
+      "Y: tensor([[ 7.,  8.,  9., 10., 11., 12.],\n",
+      "        [22., 23., 24., 25., 26., 27.]]) \n",
+      "\n"
+     ]
+    }
+   ],
+   "source": [
+    "for X, Y in data_iter_consecutive(my_seq, batch_size=2, num_steps=6):\n",
+    "    print('X: ', X, '\\nY:', Y, '\\n')"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "collapsed": true
+   },
+   "outputs": [],
+   "source": []
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python [default]",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.6.3"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}
--- a/code/chapter06_RNN/6.4_rnn-scratch.ipynb
+++ b/code/chapter06_RNN/6.4_rnn-scratch.ipynb
@ -0,0 +1,481 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# 6.4 循环神经网络的从零开始实现"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 1,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "0.4.0\n",
+      "cuda\n"
+     ]
+    }
+   ],
+   "source": [
+    "import time\n",
+    "import math\n",
+    "import numpy as np\n",
+    "import torch\n",
+    "from torch import nn, optim\n",
+    "import torch.nn.functional as F\n",
+    "\n",
+    "import sys\n",
+    "sys.path.append(\"..\") \n",
+    "import d2lzh_pytorch as d2l\n",
+    "device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')\n",
+    "\n",
+    "print(torch.__version__)\n",
+    "print(device)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 2,
+   "metadata": {
+    "collapsed": true
+   },
+   "outputs": [],
+   "source": [
+    "(corpus_indices, char_to_idx, idx_to_char, vocab_size) = d2l.load_data_jay_lyrics()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 6.4.1 one-hot向量"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 3,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "tensor([[ 1.,  0.,  0.,  ...,  0.,  0.,  0.],\n",
+       "        [ 0.,  0.,  1.,  ...,  0.,  0.,  0.]])"
+      ]
+     },
+     "execution_count": 3,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "def one_hot(x, n_class, dtype=torch.float32): \n",
+    "    # X shape: (batch), output shape: (batch, n_class)\n",
+    "    x = x.long()\n",
+    "    res = torch.zeros(x.shape[0], n_class, dtype=dtype, device=x.device)\n",
+    "    res.scatter_(1, x.view(-1, 1), 1)\n",
+    "    return res\n",
+    "    \n",
+    "x = torch.tensor([0, 2])\n",
+    "one_hot(x, vocab_size)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 4,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "5 torch.Size([2, 1027])\n"
+     ]
+    }
+   ],
+   "source": [
+    "# 本函数已保存在d2lzh_pytorch包中方便以后使用\n",
+    "def to_onehot(X, n_class):  \n",
+    "    # X shape: (batch, seq_len), output: seq_len elements of (batch, n_class)\n",
+    "    return [one_hot(X[:, i], n_class) for i in range(X.shape[1])]\n",
+    "\n",
+    "X = torch.arange(10).view(2, 5)\n",
+    "inputs = to_onehot(X, vocab_size)\n",
+    "print(len(inputs), inputs[0].shape)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 6.4.2 初始化模型参数"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 5,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "will use cuda\n"
+     ]
+    }
+   ],
+   "source": [
+    "num_inputs, num_hiddens, num_outputs = vocab_size, 256, vocab_size\n",
+    "print('will use', device)\n",
+    "\n",
+    "def get_params():\n",
+    "    def _one(shape):\n",
+    "        ts = torch.tensor(np.random.normal(0, 0.01, size=shape), device=device, dtype=torch.float32)\n",
+    "        return torch.nn.Parameter(ts, requires_grad=True)\n",
+    "\n",
+    "    # 隐藏层参数\n",
+    "    W_xh = _one((num_inputs, num_hiddens))\n",
+    "    W_hh = _one((num_hiddens, num_hiddens))\n",
+    "    b_h = torch.nn.Parameter(torch.zeros(num_hiddens, device=device, requires_grad=True))\n",
+    "    # 输出层参数\n",
+    "    W_hq = _one((num_hiddens, num_outputs))\n",
+    "    b_q = torch.nn.Parameter(torch.zeros(num_outputs, device=device, requires_grad=True))\n",
+    "    return nn.ParameterList([W_xh, W_hh, b_h, W_hq, b_q])"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 6.4.3 定义模型"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 6,
+   "metadata": {
+    "collapsed": true
+   },
+   "outputs": [],
+   "source": [
+    "def init_rnn_state(batch_size, num_hiddens, device):\n",
+    "    return (torch.zeros((batch_size, num_hiddens), device=device), )"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 7,
+   "metadata": {
+    "collapsed": true
+   },
+   "outputs": [],
+   "source": [
+    "def rnn(inputs, state, params):\n",
+    "    # inputs和outputs皆为num_steps个形状为(batch_size, vocab_size)的矩阵\n",
+    "    W_xh, W_hh, b_h, W_hq, b_q = params\n",
+    "    H, = state\n",
+    "    outputs = []\n",
+    "    for X in inputs:\n",
+    "        H = torch.tanh(torch.matmul(X, W_xh) + torch.matmul(H, W_hh) + b_h)\n",
+    "        Y = torch.matmul(H, W_hq) + b_q\n",
+    "        outputs.append(Y)\n",
+    "    return outputs, (H,)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 8,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "5 torch.Size([2, 1027]) torch.Size([2, 256])\n"
+     ]
+    }
+   ],
+   "source": [
+    "state = init_rnn_state(X.shape[0], num_hiddens, device)\n",
+    "inputs = to_onehot(X.to(device), vocab_size)\n",
+    "params = get_params()\n",
+    "outputs, state_new = rnn(inputs, state, params)\n",
+    "print(len(outputs), outputs[0].shape, state_new[0].shape)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 6.4.4 定义预测函数"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 9,
+   "metadata": {
+    "collapsed": true
+   },
+   "outputs": [],
+   "source": [
+    "# 本函数已保存在d2lzh_pytorch包中方便以后使用\n",
+    "def predict_rnn(prefix, num_chars, rnn, params, init_rnn_state,\n",
+    "                num_hiddens, vocab_size, device, idx_to_char, char_to_idx):\n",
+    "    state = init_rnn_state(1, num_hiddens, device)\n",
+    "    output = [char_to_idx[prefix[0]]]\n",
+    "    for t in range(num_chars + len(prefix) - 1):\n",
+    "        # 将上一时间步的输出作为当前时间步的输入\n",
+    "        X = to_onehot(torch.tensor([[output[-1]]], device=device), vocab_size)\n",
+    "        # 计算输出和更新隐藏状态\n",
+    "        (Y, state) = rnn(X, state, params)\n",
+    "        # 下一个时间步的输入是prefix里的字符或者当前的最佳预测字符\n",
+    "        if t < len(prefix) - 1:\n",
+    "            output.append(char_to_idx[prefix[t + 1]])\n",
+    "        else:\n",
+    "            output.append(int(Y[0].argmax(dim=1).item()))\n",
+    "    return ''.join([idx_to_char[i] for i in output])"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 10,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "'分开西圈绪升王凝瓜必客映'"
+      ]
+     },
+     "execution_count": 10,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "predict_rnn('分开', 10, rnn, params, init_rnn_state, num_hiddens, vocab_size,\n",
+    "            device, idx_to_char, char_to_idx)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 6.4.5 裁剪梯度"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 11,
+   "metadata": {
+    "collapsed": true
+   },
+   "outputs": [],
+   "source": [
+    "# 本函数已保存在d2lzh_pytorch包中方便以后使用\n",
+    "def grad_clipping(params, theta, device):\n",
+    "    norm = torch.tensor([0.0], device=device)\n",
+    "    for param in params:\n",
+    "        norm += (param.grad.data ** 2).sum()\n",
+    "    norm = norm.sqrt().item()\n",
+    "    if norm > theta:\n",
+    "        for param in params:\n",
+    "            param.grad.data *= (theta / norm)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 6.4.6 困惑度\n",
+    "## 6.4.7 定义模型训练函数"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 12,
+   "metadata": {
+    "collapsed": true
+   },
+   "outputs": [],
+   "source": [
+    "# 本函数已保存在d2lzh_pytorch包中方便以后使用\n",
+    "def train_and_predict_rnn(rnn, get_params, init_rnn_state, num_hiddens,\n",
+    "                          vocab_size, device, corpus_indices, idx_to_char,\n",
+    "                          char_to_idx, is_random_iter, num_epochs, num_steps,\n",
+    "                          lr, clipping_theta, batch_size, pred_period,\n",
+    "                          pred_len, prefixes):\n",
+    "    if is_random_iter:\n",
+    "        data_iter_fn = d2l.data_iter_random\n",
+    "    else:\n",
+    "        data_iter_fn = d2l.data_iter_consecutive\n",
+    "    params = get_params()\n",
+    "    loss = nn.CrossEntropyLoss()\n",
+    "\n",
+    "    for epoch in range(num_epochs):\n",
+    "        if not is_random_iter:  # 如使用相邻采样，在epoch开始时初始化隐藏状态\n",
+    "            state = init_rnn_state(batch_size, num_hiddens, device)\n",
+    "        l_sum, n, start = 0.0, 0, time.time()\n",
+    "        data_iter = data_iter_fn(corpus_indices, batch_size, num_steps, device)\n",
+    "        for X, Y in data_iter:\n",
+    "            if is_random_iter:  # 如使用随机采样，在每个小批量更新前初始化隐藏状态\n",
+    "                state = init_rnn_state(batch_size, num_hiddens, device)\n",
+    "            else:  # 否则需要使用detach函数从计算图分离隐藏状态\n",
+    "                for s in state:\n",
+    "                    s.detach_()\n",
+    "            \n",
+    "            inputs = to_onehot(X, vocab_size)\n",
+    "            # outputs有num_steps个形状为(batch_size, vocab_size)的矩阵\n",
+    "            (outputs, state) = rnn(inputs, state, params)\n",
+    "            # 拼接之后形状为(num_steps * batch_size, vocab_size)\n",
+    "            outputs = torch.cat(outputs, dim=0)\n",
+    "            # Y的形状是(batch_size, num_steps)，转置后再变成长度为\n",
+    "            # batch * num_steps 的向量，这样跟输出的行一一对应\n",
+    "            y = torch.transpose(Y, 0, 1).contiguous().view(-1)\n",
+    "            # 使用交叉熵损失计算平均分类误差\n",
+    "            l = loss(outputs, y.long())\n",
+    "            \n",
+    "            # 梯度清0\n",
+    "            if params[0].grad is not None:\n",
+    "                for param in params:\n",
+    "                    param.grad.data.zero_()\n",
+    "            l.backward()\n",
+    "            grad_clipping(params, clipping_theta, device)  # 裁剪梯度\n",
+    "            d2l.sgd(params, lr, 1)  # 因为误差已经取过均值，梯度不用再做平均\n",
+    "            l_sum += l.item() * y.shape[0]\n",
+    "            n += y.shape[0]\n",
+    "\n",
+    "        if (epoch + 1) % pred_period == 0:\n",
+    "            print('epoch %d, perplexity %f, time %.2f sec' % (\n",
+    "                epoch + 1, math.exp(l_sum / n), time.time() - start))\n",
+    "            for prefix in prefixes:\n",
+    "                print(' -', predict_rnn(prefix, pred_len, rnn, params, init_rnn_state,\n",
+    "                    num_hiddens, vocab_size, device, idx_to_char, char_to_idx))"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 6.4.8 训练模型并创作歌词"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 13,
+   "metadata": {
+    "collapsed": true
+   },
+   "outputs": [],
+   "source": [
+    "num_epochs, num_steps, batch_size, lr, clipping_theta = 250, 35, 32, 1e2, 1e-2\n",
+    "pred_period, pred_len, prefixes = 50, 50, ['分开', '不分开']"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 14,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "epoch 50, perplexity 70.039647, time 0.11 sec\n",
+      " - 分开 我不要再想 我不能 想你的让我 我的可 你怎么 一颗四 一颗四 我不要 一颗两 一颗四 一颗四 我\n",
+      " - 不分开 我不要再 你你的外 在人  别你的让我 狂的可 语人两 我不要 一颗两 一颗四 一颗四 我不要 一\n",
+      "epoch 100, perplexity 9.726828, time 0.12 sec\n",
+      " - 分开 一直的美栈人 一起看 我不要好生活 你知不觉 我已好好生活 我知道好生活 后知不觉 我跟了这生活 \n",
+      " - 不分开堡 我不要再想 我不 我不 我不要再想你 不知不觉 你已经离开我 不知不觉 我跟了好生活 我知道好生\n",
+      "epoch 150, perplexity 2.864874, time 0.11 sec\n",
+      " - 分开 一只会停留 有不它元羞 这蝪什么奇怪的事都有 包括像猫的狗 印地安老斑鸠 平常话不多 除非是乌鸦抢\n",
+      " - 不分开扫 我不你再想 我不能再想 我不 我不 我不要再想你 不知不觉 你已经离开我 不知不觉 我跟了这节奏\n",
+      "epoch 200, perplexity 1.597790, time 0.11 sec\n",
+      " - 分开 有杰伦 干 载颗拳满的让空美空主 相爱还有个人 再狠狠忘记 你爱过我的证  有晶莹的手滴 让说些人\n",
+      " - 不分开扫 我叫你爸 你打我妈 这样对吗干嘛这样 何必让它牵鼻子走 瞎 说底牵打我妈要 难道球耳 快使用双截\n",
+      "epoch 250, perplexity 1.303903, time 0.12 sec\n",
+      " - 分开 有杰人开留 仙唱它怕羞 蜥蝪横著走 这里什么奇怪的事都有 包括像猫的狗 印地安老斑鸠 平常话不多 \n",
+      " - 不分开简 我不能再想 我不 我不 我不能 爱情走的太快就像龙卷风 不能承受我已无处可躲 我不要再想 我不能\n"
+     ]
+    }
+   ],
+   "source": [
+    "train_and_predict_rnn(rnn, get_params, init_rnn_state, num_hiddens,\n",
+    "                      vocab_size, device, corpus_indices, idx_to_char,\n",
+    "                      char_to_idx, True, num_epochs, num_steps, lr,\n",
+    "                      clipping_theta, batch_size, pred_period, pred_len,\n",
+    "                      prefixes)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 15,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "epoch 50, perplexity 59.514416, time 0.11 sec\n",
+      " - 分开 我想要这 我想了空 我想了空 我想了空 我想了空 我想了空 我想了空 我想了空 我想了空 我想了空\n",
+      " - 不分开 我不要这 全使了双 我想了这 我想了空 我想了空 我想了空 我想了空 我想了空 我想了空 我想了空\n",
+      "epoch 100, perplexity 6.801417, time 0.11 sec\n",
+      " - 分开 我说的这样笑 想你都 不着我 我想就这样牵 你你的回不笑多难的  它在云实 有一条事 全你了空  \n",
+      " - 不分开觉 你已经离开我 不知不觉 我跟好这节活 我该好好生活 不知不觉 你跟了离开我 不知不觉 我跟好这节\n",
+      "epoch 150, perplexity 2.063730, time 0.16 sec\n",
+      " - 分开 我有到这样牵着你的手不放开 爱可不可以简简单单没有伤  古有你烦 我有多烦恼向 你知带悄 回我的外\n",
+      " - 不分开觉 你已经很个我 不知不觉 我跟了这节奏 后知后觉 又过了一个秋 后哼哈兮 快使用双截棍 哼哼哈兮 \n",
+      "epoch 200, perplexity 1.300031, time 0.11 sec\n",
+      " - 分开 我想要这样牵着你的手不放开 爱能不能够永远单甜没有伤害 你 靠着我的肩膀 你 在我胸口睡著 像这样\n",
+      " - 不分开觉 你已经离开我 不知不觉 我跟了这节奏 后知后觉 又过了一个秋 后知后觉 我该好好生活 我该好好生\n",
+      "epoch 250, perplexity 1.164455, time 0.11 sec\n",
+      " - 分开 我有一这样布 对你依依不舍 连隔壁邻居都猜到我现在的感受 河边的风 在吹着头发飘动 牵着你的手 一\n",
+      " - 不分开觉 你已经离开我 不知不觉 我跟了这节奏 后知后觉 又过了一个秋 后知后觉 我该好好生活 我该好好生\n"
+     ]
+    }
+   ],
+   "source": [
+    "train_and_predict_rnn(rnn, get_params, init_rnn_state, num_hiddens,\n",
+    "                      vocab_size, device, corpus_indices, idx_to_char,\n",
+    "                      char_to_idx, False, num_epochs, num_steps, lr,\n",
+    "                      clipping_theta, batch_size, pred_period, pred_len,\n",
+    "                      prefixes)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "collapsed": true
+   },
+   "outputs": [],
+   "source": []
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python [default]",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.6.3"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}
--- a/code/chapter06_RNN/6.5_rnn-pytorch.ipynb
+++ b/code/chapter06_RNN/6.5_rnn-pytorch.ipynb
@ -0,0 +1,292 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# 6.5 循环神经网络的简洁实现"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 1,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "1.0.0 cuda\n"
+     ]
+    }
+   ],
+   "source": [
+    "import time\n",
+    "import math\n",
+    "import numpy as np\n",
+    "import torch\n",
+    "from torch import nn, optim\n",
+    "import torch.nn.functional as F\n",
+    "\n",
+    "import sys\n",
+    "sys.path.append(\"..\") \n",
+    "import d2lzh_pytorch as d2l\n",
+    "device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')\n",
+    "\n",
+    "(corpus_indices, char_to_idx, idx_to_char, vocab_size) = d2l.load_data_jay_lyrics()\n",
+    "\n",
+    "print(torch.__version__, device)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 6.5.1 定义模型"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 2,
+   "metadata": {
+    "collapsed": true
+   },
+   "outputs": [],
+   "source": [
+    "num_hiddens = 256\n",
+    "# rnn_layer = nn.LSTM(input_size=vocab_size, hidden_size=num_hiddens) # 已测试\n",
+    "rnn_layer = nn.RNN(input_size=vocab_size, hidden_size=num_hiddens)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 3,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "torch.Size([35, 2, 256]) 1 torch.Size([2, 256])\n"
+     ]
+    }
+   ],
+   "source": [
+    "num_steps = 35\n",
+    "batch_size = 2\n",
+    "state = None\n",
+    "X = torch.rand(num_steps, batch_size, vocab_size)\n",
+    "Y, state_new = rnn_layer(X, state)\n",
+    "print(Y.shape, len(state_new), state_new[0].shape)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 4,
+   "metadata": {
+    "collapsed": true
+   },
+   "outputs": [],
+   "source": [
+    "# 本类已保存在d2lzh_pytorch包中方便以后使用\n",
+    "class RNNModel(nn.Module):\n",
+    "    def __init__(self, rnn_layer, vocab_size):\n",
+    "        super(RNNModel, self).__init__()\n",
+    "        self.rnn = rnn_layer\n",
+    "        self.hidden_size = rnn_layer.hidden_size * (2 if rnn_layer.bidirectional else 1) \n",
+    "        self.vocab_size = vocab_size\n",
+    "        self.dense = nn.Linear(self.hidden_size, vocab_size)\n",
+    "        self.state = None\n",
+    "\n",
+    "    def forward(self, inputs, state): # inputs: (batch, seq_len)\n",
+    "        # 获取one-hot向量表示\n",
+    "        X = d2l.to_onehot(inputs, vocab_size) # X是个list\n",
+    "        Y, self.state = self.rnn(torch.stack(X), state)\n",
+    "        # 全连接层会首先将Y的形状变成(num_steps * batch_size, num_hiddens)，它的输出\n",
+    "        # 形状为(num_steps * batch_size, vocab_size)\n",
+    "        output = self.dense(Y.view(-1, Y.shape[-1]))\n",
+    "        return output, self.state"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 6.5.2 训练模型"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 5,
+   "metadata": {
+    "collapsed": true
+   },
+   "outputs": [],
+   "source": [
+    "# 本函数已保存在d2lzh_pytorch包中方便以后使用\n",
+    "def predict_rnn_pytorch(prefix, num_chars, model, vocab_size, device, idx_to_char,\n",
+    "                      char_to_idx):\n",
+    "    state = None\n",
+    "    output = [char_to_idx[prefix[0]]] # output会记录prefix加上输出\n",
+    "    for t in range(num_chars + len(prefix) - 1):\n",
+    "        X = torch.tensor([output[-1]], device=device).view(1, 1)\n",
+    "        if state is not None:\n",
+    "            if isinstance(state, tuple): # LSTM, state:(h, c)  \n",
+    "                state = (state[0].to(device), state[1].to(device))\n",
+    "            else:   \n",
+    "                state = state.to(device)\n",
+    "            \n",
+    "        (Y, state) = model(X, state)  # 前向计算不需要传入模型参数\n",
+    "        if t < len(prefix) - 1:\n",
+    "            output.append(char_to_idx[prefix[t + 1]])\n",
+    "        else:\n",
+    "            output.append(int(Y.argmax(dim=1).item()))\n",
+    "    return ''.join([idx_to_char[i] for i in output])"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 6,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "'分开戏想暖迎凉想征凉征征'"
+      ]
+     },
+     "execution_count": 6,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "model = RNNModel(rnn_layer, vocab_size).to(device)\n",
+    "predict_rnn_pytorch('分开', 10, model, vocab_size, device, idx_to_char, char_to_idx)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 7,
+   "metadata": {
+    "collapsed": true
+   },
+   "outputs": [],
+   "source": [
+    "# 本函数已保存在d2lzh_pytorch包中方便以后使用\n",
+    "def train_and_predict_rnn_pytorch(model, num_hiddens, vocab_size, device,\n",
+    "                                corpus_indices, idx_to_char, char_to_idx,\n",
+    "                                num_epochs, num_steps, lr, clipping_theta,\n",
+    "                                batch_size, pred_period, pred_len, prefixes):\n",
+    "    loss = nn.CrossEntropyLoss()\n",
+    "    optimizer = torch.optim.Adam(model.parameters(), lr=lr)\n",
+    "    model.to(device)\n",
+    "    state = None\n",
+    "    for epoch in range(num_epochs):\n",
+    "        l_sum, n, start = 0.0, 0, time.time()\n",
+    "        data_iter = d2l.data_iter_consecutive(corpus_indices, batch_size, num_steps, device) # 相邻采样\n",
+    "        for X, Y in data_iter:\n",
+    "            if state is not None:\n",
+    "                # 使用detach函数从计算图分离隐藏状态, 这是为了\n",
+    "                # 使模型参数的梯度计算只依赖一次迭代读取的小批量序列(防止梯度计算开销太大)\n",
+    "                if isinstance (state, tuple): # LSTM, state:(h, c)  \n",
+    "                    state = (state[0].detach(), state[1].detach())\n",
+    "                else:   \n",
+    "                    state = state.detach()\n",
+    "    \n",
+    "            (output, state) = model(X, state) # output: 形状为(num_steps * batch_size, vocab_size)\n",
+    "            \n",
+    "            # Y的形状是(batch_size, num_steps)，转置后再变成长度为\n",
+    "            # batch * num_steps 的向量，这样跟输出的行一一对应\n",
+    "            y = torch.transpose(Y, 0, 1).contiguous().view(-1)\n",
+    "            l = loss(output, y.long())\n",
+    "            \n",
+    "            optimizer.zero_grad()\n",
+    "            l.backward()\n",
+    "            # 梯度裁剪\n",
+    "            d2l.grad_clipping(model.parameters(), clipping_theta, device)\n",
+    "            optimizer.step()\n",
+    "            l_sum += l.item() * y.shape[0]\n",
+    "            n += y.shape[0]\n",
+    "        \n",
+    "        try:\n",
+    "            perplexity = math.exp(l_sum / n)\n",
+    "        except OverflowError:\n",
+    "            perplexity = float('inf')\n",
+    "        if (epoch + 1) % pred_period == 0:\n",
+    "            print('epoch %d, perplexity %f, time %.2f sec' % (\n",
+    "                epoch + 1, perplexity, time.time() - start))\n",
+    "            for prefix in prefixes:\n",
+    "                print(' -', predict_rnn_pytorch(\n",
+    "                    prefix, pred_len, model, vocab_size, device, idx_to_char,\n",
+    "                    char_to_idx))"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 8,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "epoch 50, perplexity 10.658418, time 0.05 sec\n",
+      " - 分开始我妈  想要你 我不多 让我心到的 我妈妈 我不能再想 我不多再想 我不要再想 我不多再想 我不要\n",
+      " - 不分开 我想要你不你 我 你不要 让我心到的 我妈人 可爱女人 坏坏的让我疯狂的可爱女人 坏坏的让我疯狂的\n",
+      "epoch 100, perplexity 1.308539, time 0.05 sec\n",
+      " - 分开不会痛 不要 你在黑色幽默 开始了美丽全脸的梦滴 闪烁成回忆 伤人的美丽 你的完美主义 太彻底 让我\n",
+      " - 不分开不是我不要再想你 我不能这样牵着你的手不放开 爱可不可以简简单单没有伤害 你 靠着我的肩膀 你 在我\n",
+      "epoch 150, perplexity 1.070370, time 0.05 sec\n",
+      " - 分开不能去河南嵩山 学少林跟武当 快使用双截棍 哼哼哈兮 快使用双截棍 哼哼哈兮 习武之人切记 仁者无敌\n",
+      " - 不分开 在我会想通 是谁开没有全有开始 他心今天 一切人看 我 一口令秋软语的姑娘缓缓走过外滩 消失的 旧\n",
+      "epoch 200, perplexity 1.034663, time 0.05 sec\n",
+      " - 分开不能去吗周杰伦 才离 没要你在一场悲剧 我的完美主义 太彻底 分手的话像语言暴力 我已无能为力再提起\n",
+      " - 不分开 让我面到你 爱情来的太快就像龙卷风 离不开暴风圈来不及逃 我不能再想 我不能再想 我不 我不 我不\n",
+      "epoch 250, perplexity 1.021437, time 0.05 sec\n",
+      " - 分开 我我外的家边 你知道这 我爱不看的太  我想一个又重来不以 迷已文一只剩下回忆 让我叫带你 你你的\n",
+      " - 不分开 我我想想和 是你听没不  我不能不想  不知不觉 你已经离开我 不知不觉 我跟了这节奏 后知后觉 \n"
+     ]
+    }
+   ],
+   "source": [
+    "num_epochs, batch_size, lr, clipping_theta = 250, 32, 1e-3, 1e-2 # 注意这里的学习率设置\n",
+    "pred_period, pred_len, prefixes = 50, 50, ['分开', '不分开']\n",
+    "train_and_predict_rnn_pytorch(model, num_hiddens, vocab_size, device,\n",
+    "                            corpus_indices, idx_to_char, char_to_idx,\n",
+    "                            num_epochs, num_steps, lr, clipping_theta,\n",
+    "                            batch_size, pred_period, pred_len, prefixes)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "collapsed": true
+   },
+   "outputs": [],
+   "source": []
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python [default]",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.6.3"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}
--- a/code/chapter06_RNN/6.7_gru.ipynb
+++ b/code/chapter06_RNN/6.7_gru.ipynb
@ -0,0 +1,247 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# 6.7 门控循环单元（GRU）\n",
+    "## 6.7.2 读取数据集"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 1,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "1.2.0 cpu\n"
+     ]
+    }
+   ],
+   "source": [
+    "import numpy as np\n",
+    "import torch\n",
+    "from torch import nn, optim\n",
+    "import torch.nn.functional as F\n",
+    "\n",
+    "import sys\n",
+    "sys.path.append(\"..\") \n",
+    "import d2lzh_pytorch as d2l\n",
+    "device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')\n",
+    "\n",
+    "(corpus_indices, char_to_idx, idx_to_char, vocab_size) = d2l.load_data_jay_lyrics()\n",
+    "print(torch.__version__, device)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 6.7.3 从零开始实现\n",
+    "### 6.7.3.1 初始化模型参数"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 2,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "will use cpu\n"
+     ]
+    }
+   ],
+   "source": [
+    "num_inputs, num_hiddens, num_outputs = vocab_size, 256, vocab_size\n",
+    "print('will use', device)\n",
+    "\n",
+    "def get_params():\n",
+    "    def _one(shape):\n",
+    "        ts = torch.tensor(np.random.normal(0, 0.01, size=shape), device=device, dtype=torch.float32)\n",
+    "        return torch.nn.Parameter(ts, requires_grad=True)\n",
+    "    def _three():\n",
+    "        return (_one((num_inputs, num_hiddens)),\n",
+    "                _one((num_hiddens, num_hiddens)),\n",
+    "                torch.nn.Parameter(torch.zeros(num_hiddens, device=device, dtype=torch.float32), requires_grad=True))\n",
+    "    \n",
+    "    W_xz, W_hz, b_z = _three()  # 更新门参数\n",
+    "    W_xr, W_hr, b_r = _three()  # 重置门参数\n",
+    "    W_xh, W_hh, b_h = _three()  # 候选隐藏状态参数\n",
+    "    \n",
+    "    # 输出层参数\n",
+    "    W_hq = _one((num_hiddens, num_outputs))\n",
+    "    b_q = torch.nn.Parameter(torch.zeros(num_outputs, device=device, dtype=torch.float32), requires_grad=True)\n",
+    "    return nn.ParameterList([W_xz, W_hz, b_z, W_xr, W_hr, b_r, W_xh, W_hh, b_h, W_hq, b_q])"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### 6.7.3.2 定义模型"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 3,
+   "metadata": {
+    "collapsed": true
+   },
+   "outputs": [],
+   "source": [
+    "def init_gru_state(batch_size, num_hiddens, device):\n",
+    "    return (torch.zeros((batch_size, num_hiddens), device=device), )"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 4,
+   "metadata": {
+    "collapsed": true
+   },
+   "outputs": [],
+   "source": [
+    "def gru(inputs, state, params):\n",
+    "    W_xz, W_hz, b_z, W_xr, W_hr, b_r, W_xh, W_hh, b_h, W_hq, b_q = params\n",
+    "    H, = state\n",
+    "    outputs = []\n",
+    "    for X in inputs:\n",
+    "        Z = torch.sigmoid(torch.matmul(X, W_xz) + torch.matmul(H, W_hz) + b_z)\n",
+    "        R = torch.sigmoid(torch.matmul(X, W_xr) + torch.matmul(H, W_hr) + b_r)\n",
+    "        H_tilda = torch.tanh(torch.matmul(X, W_xh) + torch.matmul(R * H, W_hh) + b_h)\n",
+    "        H = Z * H + (1 - Z) * H_tilda\n",
+    "        Y = torch.matmul(H, W_hq) + b_q\n",
+    "        outputs.append(Y)\n",
+    "    return outputs, (H,)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### 6.7.3.3 训练模型并创作歌词"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 5,
+   "metadata": {
+    "collapsed": true
+   },
+   "outputs": [],
+   "source": [
+    "num_epochs, num_steps, batch_size, lr, clipping_theta = 160, 35, 32, 1e2, 1e-2\n",
+    "pred_period, pred_len, prefixes = 40, 50, ['分开', '不分开']"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 6,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "epoch 40, perplexity 150.963116, time 1.11 sec\n",
+      " - 分开 我想你 我不你 我不你 我不你 我不你 我不你 我不你 我不你 我不你 我不你 我不你 我不你 我\n",
+      " - 不分开 我想你 我不你 我不你 我不你 我不你 我不你 我不你 我不你 我不你 我不你 我不你 我不你 我\n",
+      "epoch 80, perplexity 31.683252, time 1.16 sec\n",
+      " - 分开 我想要你的微笑 一定                                       \n",
+      " - 不分开 不知不觉 我不要再想 我不要再想 我不 我不 我不 我不 我不 我不 我不 我不 我不 我不 我不\n",
+      "epoch 120, perplexity 5.855305, time 1.49 sec\n",
+      " - 分开我 想要你这样打我妈妈 难道你手不会痛吗 我想你这样打我妈妈 难道你手 你怎么在我想 说散 你说我久\n",
+      " - 不分开  没有你在我有多烦熬多烦恼  没有你烦 我有多烦恼  没有你在我有多难熬多难多  没有你烦 我有多\n",
+      "epoch 160, perplexity 1.815359, time 1.04 sec\n",
+      " - 分开 我想要这样牵 对你依依不舍 连隔壁邻居都猜到我现在的感受 河边的风 在吹着头发飘动 牵着你的手 一\n",
+      " - 不分开  是后过风 迷不知蒙 我给再这样活 我该好好生活 不知不觉 你已经离开我 不知不觉 我跟了这节奏 \n"
+     ]
+    }
+   ],
+   "source": [
+    "d2l.train_and_predict_rnn(gru, get_params, init_gru_state, num_hiddens,\n",
+    "                          vocab_size, device, corpus_indices, idx_to_char,\n",
+    "                          char_to_idx, False, num_epochs, num_steps, lr,\n",
+    "                          clipping_theta, batch_size, pred_period, pred_len,\n",
+    "                          prefixes)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 6.7.4 简洁实现"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 7,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "epoch 40, perplexity 1.018485, time 0.79 sec\n",
+      " - 分开的快乐是你 想你想的都会笑 没有你在 我有多难熬  没有你在我有多难熬多烦恼  没有你烦 我有多烦恼\n",
+      " - 不分开不 我不 我不要再想你 爱情来的太快就像龙卷风 离不开暴风圈来不及逃 我不能再想 我不能再想 我不 \n",
+      "epoch 80, perplexity 1.028805, time 0.74 sec\n",
+      " - 分开始想像 爸和妈当年的模样 说著一口吴侬软语的姑娘缓缓走过外滩 消失的 旧时光 一九四三 回头看 的片\n",
+      " - 不分开不 我不 我不 我不要再想你 爱情来的太快就像龙卷风 离不开暴风圈来不及逃 我不能再想 我不能再想 \n",
+      "epoch 120, perplexity 1.012296, time 0.73 sec\n",
+      " - 分开的话像语言暴力 我已无能为力再提起 决定中断熟悉 然后在这里 不限日期 然后将过去 慢慢温习 让我爱\n",
+      " - 不分开不 我不 我不能 爱情走的太快就像龙卷风 不能承受我已无处可躲 我不要再想 我不要再想 我不 我不 \n",
+      "epoch 160, perplexity 1.184842, time 0.74 sec\n",
+      " - 分开的快乐是你 想我想大声宣布 对你依依不舍 连隔壁邻居都猜到我现在的感受 河边的风 在吹着头发飘动 牵\n",
+      " - 不分开 快使用双截棍 哼哼哈兮 如果我有轻功 飞檐走壁 为人耿直不屈 一身正气 他们儿子我习惯 从小就耳濡\n"
+     ]
+    }
+   ],
+   "source": [
+    "lr = 1e-2\n",
+    "gru_layer = nn.GRU(input_size=vocab_size, hidden_size=num_hiddens)\n",
+    "model = d2l.RNNModel(gru_layer, vocab_size).to(device)\n",
+    "d2l.train_and_predict_rnn_pytorch(model, num_hiddens, vocab_size, device,\n",
+    "                                corpus_indices, idx_to_char, char_to_idx,\n",
+    "                                num_epochs, num_steps, lr, clipping_theta,\n",
+    "                                batch_size, pred_period, pred_len, prefixes)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "collapsed": true
+   },
+   "outputs": [],
+   "source": []
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.6.2"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}
--- a/code/chapter06_RNN/6.8_lstm.ipynb
+++ b/code/chapter06_RNN/6.8_lstm.ipynb
@ -0,0 +1,252 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# 6.8 长短期记忆（LSTM）\n",
+    "## 6.8.2 读取数据集"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 1,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "1.0.0 cpu\n"
+     ]
+    }
+   ],
+   "source": [
+    "import numpy as np\n",
+    "import torch\n",
+    "from torch import nn, optim\n",
+    "import torch.nn.functional as F\n",
+    "\n",
+    "import sys\n",
+    "sys.path.append(\"..\") \n",
+    "import d2lzh_pytorch as d2l\n",
+    "device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')\n",
+    "\n",
+    "(corpus_indices, char_to_idx, idx_to_char, vocab_size) = d2l.load_data_jay_lyrics()\n",
+    "\n",
+    "print(torch.__version__, device)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 6.8.3 从零开始实现\n",
+    "### 6.8.3.1 初始化模型参数"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 2,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "will use cpu\n"
+     ]
+    }
+   ],
+   "source": [
+    "num_inputs, num_hiddens, num_outputs = vocab_size, 256, vocab_size\n",
+    "print('will use', device)\n",
+    "\n",
+    "def get_params():\n",
+    "    def _one(shape):\n",
+    "        ts = torch.tensor(np.random.normal(0, 0.01, size=shape), device=device, dtype=torch.float32)\n",
+    "        return torch.nn.Parameter(ts, requires_grad=True)\n",
+    "    def _three():\n",
+    "        return (_one((num_inputs, num_hiddens)),\n",
+    "                _one((num_hiddens, num_hiddens)),\n",
+    "                torch.nn.Parameter(torch.zeros(num_hiddens, device=device, dtype=torch.float32), requires_grad=True))\n",
+    "    \n",
+    "    W_xi, W_hi, b_i = _three()  # 输入门参数\n",
+    "    W_xf, W_hf, b_f = _three()  # 遗忘门参数\n",
+    "    W_xo, W_ho, b_o = _three()  # 输出门参数\n",
+    "    W_xc, W_hc, b_c = _three()  # 候选记忆细胞参数\n",
+    "    \n",
+    "    # 输出层参数\n",
+    "    W_hq = _one((num_hiddens, num_outputs))\n",
+    "    b_q = torch.nn.Parameter(torch.zeros(num_outputs, device=device, dtype=torch.float32), requires_grad=True)\n",
+    "    return nn.ParameterList([W_xi, W_hi, b_i, W_xf, W_hf, b_f, W_xo, W_ho, b_o, W_xc, W_hc, b_c, W_hq, b_q])"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 6.8.4 定义模型"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 3,
+   "metadata": {
+    "collapsed": true
+   },
+   "outputs": [],
+   "source": [
+    "def init_lstm_state(batch_size, num_hiddens, device):\n",
+    "    return (torch.zeros((batch_size, num_hiddens), device=device), \n",
+    "            torch.zeros((batch_size, num_hiddens), device=device))"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 4,
+   "metadata": {
+    "collapsed": true
+   },
+   "outputs": [],
+   "source": [
+    "def lstm(inputs, state, params):\n",
+    "    [W_xi, W_hi, b_i, W_xf, W_hf, b_f, W_xo, W_ho, b_o, W_xc, W_hc, b_c, W_hq, b_q] = params\n",
+    "    (H, C) = state\n",
+    "    outputs = []\n",
+    "    for X in inputs:\n",
+    "        I = torch.sigmoid(torch.matmul(X, W_xi) + torch.matmul(H, W_hi) + b_i)\n",
+    "        F = torch.sigmoid(torch.matmul(X, W_xf) + torch.matmul(H, W_hf) + b_f)\n",
+    "        O = torch.sigmoid(torch.matmul(X, W_xo) + torch.matmul(H, W_ho) + b_o)\n",
+    "        C_tilda = torch.tanh(torch.matmul(X, W_xc) + torch.matmul(H, W_hc) + b_c)\n",
+    "        C = F * C + I * C_tilda\n",
+    "        H = O * C.tanh()\n",
+    "        Y = torch.matmul(H, W_hq) + b_q\n",
+    "        outputs.append(Y)\n",
+    "    return outputs, (H, C)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### 6.8.4.1 训练模型并创作歌词"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 5,
+   "metadata": {
+    "collapsed": true
+   },
+   "outputs": [],
+   "source": [
+    "num_epochs, num_steps, batch_size, lr, clipping_theta = 160, 35, 32, 1e2, 1e-2\n",
+    "pred_period, pred_len, prefixes = 40, 50, ['分开', '不分开']"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 6,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "epoch 40, perplexity 211.416571, time 1.37 sec\n",
+      " - 分开 我不的我 我不的我 我不的我 我不的我 我不的我 我不的我 我不的我 我不的我 我不的我 我不的我\n",
+      " - 不分开 我不的我 我不的我 我不的我 我不的我 我不的我 我不的我 我不的我 我不的我 我不的我 我不的我\n",
+      "epoch 80, perplexity 67.048346, time 1.35 sec\n",
+      " - 分开 我想你你 我不要再想 我不要这我 我不要这我 我不要这我 我不要这我 我不要这我 我不要这我 我不\n",
+      " - 不分开 我想你你想你 我不要这不样 我不要这我 我不要这我 我不要这我 我不要这我 我不要这我 我不要这我\n",
+      "epoch 120, perplexity 15.552743, time 1.36 sec\n",
+      " - 分开 我想带你的微笑 像这在 你想我 我想你 说你我 说你了 说给怎么么 有你在空 你在在空 在你的空 \n",
+      " - 不分开 我想要你已经堡 一样样 说你了 我想就这样着你 不知不觉 你已了离开活 后知后觉 我该了这生活 我\n",
+      "epoch 160, perplexity 4.274031, time 1.35 sec\n",
+      " - 分开 我想带你 你不一外在半空 我只能够远远著她 这些我 你想我难难头 一话看人对落我一望望我 我不那这\n",
+      " - 不分开 我想你这生堡 我知好烦 你不的节我 后知后觉 我该了这节奏 后知后觉 又过了一个秋 后知后觉 我该\n"
+     ]
+    }
+   ],
+   "source": [
+    "d2l.train_and_predict_rnn(lstm, get_params, init_lstm_state, num_hiddens,\n",
+    "                          vocab_size, device, corpus_indices, idx_to_char,\n",
+    "                          char_to_idx, False, num_epochs, num_steps, lr,\n",
+    "                          clipping_theta, batch_size, pred_period, pred_len,\n",
+    "                          prefixes)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 6.8.5 简洁实现"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 9,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "epoch 40, perplexity 1.020401, time 1.54 sec\n",
+      " - 分开始想担 妈跟我 一定是我妈在 因为分手前那句抱歉 在感动 穿梭时间的画面的钟 从反方向开始移动 回到\n",
+      " - 不分开始想像 妈跟我 我将我的寂寞封闭 然后在这里 不限日期 然后将过去 慢慢温习 让我爱上你 那场悲剧 \n",
+      "epoch 80, perplexity 1.011164, time 1.34 sec\n",
+      " - 分开始想担 你的 从前的可爱女人 温柔的让我心疼的可爱女人 透明的让我感动的可爱女人 坏坏的让我疯狂的可\n",
+      " - 不分开 我满了 让我疯狂的可爱女人 漂亮的让我面红的可爱女人 温柔的让我心疼的可爱女人 透明的让我感动的可\n",
+      "epoch 120, perplexity 1.025348, time 1.39 sec\n",
+      " - 分开始共渡每一天 手牵手 一步两步三步四步望著天 看星星 一颗两颗三颗四颗 连成线背著背默默许下心愿 看\n",
+      " - 不分开 我不懂 说了没用 他的笑容 有何不同 在你心中 我不再受宠 我的天空 是雨是风 还是彩虹 你在操纵\n",
+      "epoch 160, perplexity 1.017492, time 1.42 sec\n",
+      " - 分开始乡相信命运 感谢地心引力 让我碰到你 漂亮的让我面红的可爱女人 温柔的让我心疼的可爱女人 透明的让\n",
+      " - 不分开 我不能再想 我不 我不 我不能 爱情走的太快就像龙卷风 不能承受我已无处可躲 我不要再想 我不要再\n"
+     ]
+    }
+   ],
+   "source": [
+    "lr = 1e-2 # 注意调整学习率\n",
+    "lstm_layer = nn.LSTM(input_size=vocab_size, hidden_size=num_hiddens)\n",
+    "model = d2l.RNNModel(lstm_layer, vocab_size)\n",
+    "d2l.train_and_predict_rnn_pytorch(model, num_hiddens, vocab_size, device,\n",
+    "                                corpus_indices, idx_to_char, char_to_idx,\n",
+    "                                num_epochs, num_steps, lr, clipping_theta,\n",
+    "                                batch_size, pred_period, pred_len, prefixes)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "collapsed": true
+   },
+   "outputs": [],
+   "source": []
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python [default]",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.6.3"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}
--- a/code/chapter07_optimization/7.1_optimization-intro.ipynb
+++ b/code/chapter07_optimization/7.1_optimization-intro.ipynb
--- a/code/chapter07_optimization/7.2_gd-sgd.ipynb
+++ b/code/chapter07_optimization/7.2_gd-sgd.ipynb
--- a/code/chapter07_optimization/7.3_minibatch-sgd.ipynb
+++ b/code/chapter07_optimization/7.3_minibatch-sgd.ipynb
--- a/code/chapter07_optimization/7.4_momentum.ipynb
+++ b/code/chapter07_optimization/7.4_momentum.ipynb
--- a/code/chapter07_optimization/7.5_adagrad.ipynb
+++ b/code/chapter07_optimization/7.5_adagrad.ipynb
--- a/code/chapter07_optimization/7.6_rmsprop.ipynb
+++ b/code/chapter07_optimization/7.6_rmsprop.ipynb
--- a/code/chapter07_optimization/7.7_adadelta.ipynb
+++ b/code/chapter07_optimization/7.7_adadelta.ipynb
--- a/code/chapter07_optimization/7.8_adam.ipynb
+++ b/code/chapter07_optimization/7.8_adam.ipynb
--- a/code/chapter08_computational-performance/8.1_hybridize.ipynb
+++ b/code/chapter08_computational-performance/8.1_hybridize.ipynb
@ -0,0 +1,122 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# 8.1 命令式和符号式混合编程"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 1,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "10"
+      ]
+     },
+     "execution_count": 1,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "def add(a, b):\n",
+    "    return a + b\n",
+    "\n",
+    "def fancy_func(a, b, c, d):\n",
+    "    e = add(a, b)\n",
+    "    f = add(c, d)\n",
+    "    g = add(e, f)\n",
+    "    return g\n",
+    "\n",
+    "fancy_func(1, 2, 3, 4)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 2,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "\n",
+      "def add(a, b):\n",
+      "    return a + b\n",
+      "\n",
+      "def fancy_func(a, b, c, d):\n",
+      "    e = add(a, b)\n",
+      "    f = add(c, d)\n",
+      "    g = add(e, f)\n",
+      "    return g\n",
+      "\n",
+      "print(fancy_func(1, 2, 3, 4))\n",
+      "\n",
+      "10\n"
+     ]
+    }
+   ],
+   "source": [
+    "def add_str():\n",
+    "    return '''\n",
+    "def add(a, b):\n",
+    "    return a + b\n",
+    "'''\n",
+    "\n",
+    "def fancy_func_str():\n",
+    "    return '''\n",
+    "def fancy_func(a, b, c, d):\n",
+    "    e = add(a, b)\n",
+    "    f = add(c, d)\n",
+    "    g = add(e, f)\n",
+    "    return g\n",
+    "'''\n",
+    "\n",
+    "def evoke_str():\n",
+    "    return add_str() + fancy_func_str() + '''\n",
+    "print(fancy_func(1, 2, 3, 4))\n",
+    "'''\n",
+    "\n",
+    "prog = evoke_str()\n",
+    "print(prog)\n",
+    "y = compile(prog, '', 'exec')\n",
+    "exec(y)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "collapsed": true
+   },
+   "outputs": [],
+   "source": []
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python [default]",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.6.3"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}
--- a/code/chapter08_computational-performance/8.3_auto-parallelism.ipynb
+++ b/code/chapter08_computational-performance/8.3_auto-parallelism.ipynb
@ -0,0 +1,192 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# 8.3 自动并行计算"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 1,
+   "metadata": {
+    "ExecuteTime": {
+     "end_time": "2019-05-10T16:16:41.669018Z",
+     "start_time": "2019-05-10T16:16:36.457355Z"
+    }
+   },
+   "outputs": [],
+   "source": [
+    "import torch\n",
+    "import time\n",
+    "\n",
+    "assert torch.cuda.device_count() >= 2"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 2,
+   "metadata": {
+    "ExecuteTime": {
+     "end_time": "2019-05-10T16:17:29.013953Z",
+     "start_time": "2019-05-10T16:16:41.673871Z"
+    }
+   },
+   "outputs": [],
+   "source": [
+    "x_gpu1 = torch.rand(size=(100, 100), device='cuda:0')\n",
+    "x_gpu2 = torch.rand(size=(100, 100), device='cuda:2')"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 3,
+   "metadata": {
+    "ExecuteTime": {
+     "end_time": "2019-05-10T16:17:29.021652Z",
+     "start_time": "2019-05-10T16:17:29.017222Z"
+    }
+   },
+   "outputs": [],
+   "source": [
+    "class Benchmark():  # 本类已保存在d2lzh_pytorch包中方便以后使用\n",
+    "    def __init__(self, prefix=None):\n",
+    "        self.prefix = prefix + ' ' if prefix else ''\n",
+    "\n",
+    "    def __enter__(self):\n",
+    "        self.start = time.time()\n",
+    "\n",
+    "    def __exit__(self, *args):\n",
+    "        print('%stime: %.4f sec' % (self.prefix, time.time() - self.start))"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 4,
+   "metadata": {
+    "ExecuteTime": {
+     "end_time": "2019-05-10T16:17:29.069210Z",
+     "start_time": "2019-05-10T16:17:29.023602Z"
+    }
+   },
+   "outputs": [],
+   "source": [
+    "def run(x):\n",
+    "    for _ in range(20000):\n",
+    "        y = torch.mm(x, x)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 5,
+   "metadata": {
+    "ExecuteTime": {
+     "end_time": "2019-05-10T16:17:29.767144Z",
+     "start_time": "2019-05-10T16:17:29.071262Z"
+    }
+   },
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Run on GPU1. time: 0.2989 sec\n",
+      "Then run on GPU2. time: 0.3518 sec\n"
+     ]
+    }
+   ],
+   "source": [
+    "with Benchmark('Run on GPU1.'):\n",
+    "    run(x_gpu1)\n",
+    "    torch.cuda.synchronize()\n",
+    "\n",
+    "with Benchmark('Then run on GPU2.'):\n",
+    "    run(x_gpu2)\n",
+    "    torch.cuda.synchronize()"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 6,
+   "metadata": {
+    "ExecuteTime": {
+     "end_time": "2019-05-10T16:17:30.282318Z",
+     "start_time": "2019-05-10T16:17:29.770313Z"
+    }
+   },
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Run on both GPU1 and GPU2 in parallel. time: 0.5076 sec\n"
+     ]
+    }
+   ],
+   "source": [
+    "with Benchmark('Run on both GPU1 and GPU2 in parallel.'):\n",
+    "    run(x_gpu1)\n",
+    "    run(x_gpu2)\n",
+    "    torch.cuda.synchronize()"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": []
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python [conda env:py36]",
+   "language": "python",
+   "name": "conda-env-py36-py"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.6.8"
+  },
+  "varInspector": {
+   "cols": {
+    "lenName": 16,
+    "lenType": 16,
+    "lenVar": 40
+   },
+   "kernels_config": {
+    "python": {
+     "delete_cmd_postfix": "",
+     "delete_cmd_prefix": "del ",
+     "library": "var_list.py",
+     "varRefreshCmd": "print(var_dic_list())"
+    },
+    "r": {
+     "delete_cmd_postfix": ") ",
+     "delete_cmd_prefix": "rm(",
+     "library": "var_list.r",
+     "varRefreshCmd": "cat(var_dic_list()) "
+    }
+   },
+   "types_to_exclude": [
+    "module",
+    "function",
+    "builtin_function_or_method",
+    "instance",
+    "_Feature"
+   ],
+   "window_display": false
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}
--- a/code/chapter08_computational-performance/8.4_model.pt
+++ b/code/chapter08_computational-performance/8.4_model.pt
--- a/code/chapter08_computational-performance/8.4_multiple-gpus.ipynb
+++ b/code/chapter08_computational-performance/8.4_multiple-gpus.ipynb
@ -0,0 +1,247 @@
+{
+ "cells": [
+  {
+   "cell_type": "code",
+   "execution_count": 1,
+   "metadata": {
+    "ExecuteTime": {
+     "end_time": "2019-05-15T16:12:27.380643Z",
+     "start_time": "2019-05-15T16:12:25.699672Z"
+    }
+   },
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Thu May 16 00:12:26 2019       \n",
+      "+-----------------------------------------------------------------------------+\n",
+      "| NVIDIA-SMI 390.48                 Driver Version: 390.48                    |\n",
+      "|-------------------------------+----------------------+----------------------+\n",
+      "| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |\n",
+      "| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |\n",
+      "|===============================+======================+======================|\n",
+      "|   0  TITAN X (Pascal)    Off  | 00000000:02:00.0 Off |                  N/A |\n",
+      "| 46%   75C    P2    87W / 250W |  10995MiB / 12196MiB |      0%      Default |\n",
+      "+-------------------------------+----------------------+----------------------+\n",
+      "|   1  TITAN X (Pascal)    Off  | 00000000:04:00.0 Off |                  N/A |\n",
+      "| 54%   83C    P2    93W / 250W |  11671MiB / 12196MiB |     64%      Default |\n",
+      "+-------------------------------+----------------------+----------------------+\n",
+      "|   2  TITAN X (Pascal)    Off  | 00000000:83:00.0 Off |                  N/A |\n",
+      "| 62%   83C    P2   193W / 250W |  12096MiB / 12196MiB |     92%      Default |\n",
+      "+-------------------------------+----------------------+----------------------+\n",
+      "|   3  TITAN X (Pascal)    Off  | 00000000:84:00.0 Off |                  N/A |\n",
+      "| 51%   82C    P2   166W / 250W |   8144MiB / 12196MiB |     58%      Default |\n",
+      "+-------------------------------+----------------------+----------------------+\n",
+      "                                                                               \n",
+      "+-----------------------------------------------------------------------------+\n",
+      "| Processes:                                                       GPU Memory |\n",
+      "|  GPU       PID   Type   Process name                             Usage      |\n",
+      "|=============================================================================|\n",
+      "|    0     44683      C   python                                      3289MiB |\n",
+      "|    0    155760      C   python                                      4345MiB |\n",
+      "|    0    158310      C   python                                      2297MiB |\n",
+      "|    0    172338      C   /home/yzs/anaconda3/bin/python              1031MiB |\n",
+      "|    1    139985      C   python                                     11653MiB |\n",
+      "|    2     38630      C   python                                      5547MiB |\n",
+      "|    2     43127      C   python                                      5791MiB |\n",
+      "|    2    156710      C   python3                                      725MiB |\n",
+      "|    3     14444      C   python3                                     1891MiB |\n",
+      "|    3     43407      C   python                                      5841MiB |\n",
+      "|    3     88478      C   /home/tangss/.conda/envs/py36/bin/python     379MiB |\n",
+      "+-----------------------------------------------------------------------------+\n"
+     ]
+    }
+   ],
+   "source": [
+    "!nvidia-smi"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 2,
+   "metadata": {
+    "ExecuteTime": {
+     "end_time": "2019-05-15T16:12:29.958567Z",
+     "start_time": "2019-05-15T16:12:27.383299Z"
+    }
+   },
+   "outputs": [],
+   "source": [
+    "import torch"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 3,
+   "metadata": {
+    "ExecuteTime": {
+     "end_time": "2019-05-15T16:12:47.137875Z",
+     "start_time": "2019-05-15T16:12:29.962468Z"
+    }
+   },
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "Linear(in_features=10, out_features=1, bias=True)"
+      ]
+     },
+     "execution_count": 3,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "net = torch.nn.Linear(10, 1).cuda()\n",
+    "net"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 4,
+   "metadata": {
+    "ExecuteTime": {
+     "end_time": "2019-05-15T16:12:47.143709Z",
+     "start_time": "2019-05-15T16:12:47.139895Z"
+    }
+   },
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "DataParallel(\n",
+       "  (module): Linear(in_features=10, out_features=1, bias=True)\n",
+       ")"
+      ]
+     },
+     "execution_count": 4,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "net = torch.nn.DataParallel(net)\n",
+    "net"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 5,
+   "metadata": {
+    "ExecuteTime": {
+     "end_time": "2019-05-15T16:12:47.206714Z",
+     "start_time": "2019-05-15T16:12:47.145069Z"
+    }
+   },
+   "outputs": [],
+   "source": [
+    "torch.save(net.state_dict(), \"./8.4_model.pt\")"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 6,
+   "metadata": {
+    "ExecuteTime": {
+     "end_time": "2019-05-15T16:12:47.260076Z",
+     "start_time": "2019-05-15T16:12:47.208314Z"
+    }
+   },
+   "outputs": [],
+   "source": [
+    "new_net = torch.nn.Linear(10, 1)\n",
+    "# new_net.load_state_dict(torch.load(\"./8.4_model.pt\")) # 加载失败"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 7,
+   "metadata": {
+    "ExecuteTime": {
+     "end_time": "2019-05-15T16:12:47.317397Z",
+     "start_time": "2019-05-15T16:12:47.262131Z"
+    }
+   },
+   "outputs": [],
+   "source": [
+    "torch.save(net.module.state_dict(), \"./8.4_model.pt\")\n",
+    "new_net.load_state_dict(torch.load(\"./8.4_model.pt\")) # 加载成功"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 8,
+   "metadata": {
+    "ExecuteTime": {
+     "end_time": "2019-05-15T16:12:47.370299Z",
+     "start_time": "2019-05-15T16:12:47.319323Z"
+    }
+   },
+   "outputs": [],
+   "source": [
+    "torch.save(net.state_dict(), \"./8.4_model.pt\")\n",
+    "new_net = torch.nn.Linear(10, 1)\n",
+    "new_net = torch.nn.DataParallel(new_net)\n",
+    "new_net.load_state_dict(torch.load(\"./8.4_model.pt\")) # 加载成功"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": []
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python [default]",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.6.8"
+  },
+  "varInspector": {
+   "cols": {
+    "lenName": 16,
+    "lenType": 16,
+    "lenVar": 40
+   },
+   "kernels_config": {
+    "python": {
+     "delete_cmd_postfix": "",
+     "delete_cmd_prefix": "del ",
+     "library": "var_list.py",
+     "varRefreshCmd": "print(var_dic_list())"
+    },
+    "r": {
+     "delete_cmd_postfix": ") ",
+     "delete_cmd_prefix": "rm(",
+     "library": "var_list.r",
+     "varRefreshCmd": "cat(var_dic_list()) "
+    }
+   },
+   "types_to_exclude": [
+    "module",
+    "function",
+    "builtin_function_or_method",
+    "instance",
+    "_Feature"
+   ],
+   "window_display": false
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}
--- a/code/chapter09_computer-vision/9.11_neural-style.ipynb
+++ b/code/chapter09_computer-vision/9.11_neural-style.ipynb
--- a/code/chapter09_computer-vision/9.1_image-augmentation.ipynb
+++ b/code/chapter09_computer-vision/9.1_image-augmentation.ipynb
--- a/code/chapter09_computer-vision/9.2_fine-tuning.ipynb
+++ b/code/chapter09_computer-vision/9.2_fine-tuning.ipynb
--- a/code/chapter09_computer-vision/9.3_bounding-box.ipynb
+++ b/code/chapter09_computer-vision/9.3_bounding-box.ipynb
--- a/code/chapter09_computer-vision/9.4_anchor.ipynb
+++ b/code/chapter09_computer-vision/9.4_anchor.ipynb
--- a/code/chapter09_computer-vision/9.5_multiscale-object-detection.ipynb
+++ b/code/chapter09_computer-vision/9.5_multiscale-object-detection.ipynb
--- a/code/chapter09_computer-vision/9.6.0_prepare_pikachu.ipynb
+++ b/code/chapter09_computer-vision/9.6.0_prepare_pikachu.ipynb
@ -0,0 +1,204 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# 9.6.0 准备皮卡丘数据集"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 1,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import os\n",
+    "import json\n",
+    "from tqdm import tqdm\n",
+    "import numpy as np\n",
+    "import matplotlib.pyplot as plt\n",
+    "from mxnet.gluon import utils as gutils # pip install mxnet\n",
+    "from mxnet import image\n",
+    "\n",
+    "data_dir = '../../data/pikachu'\n",
+    "os.makedirs(data_dir, exist_ok=True)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 1. 下载原始数据集\n",
+    "见http://zh.d2l.ai/chapter_computer-vision/object-detection-dataset.html"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 2,
+   "metadata": {
+    "collapsed": true
+   },
+   "outputs": [],
+   "source": [
+    "def _download_pikachu(data_dir):\n",
+    "    root_url = ('https://apache-mxnet.s3-accelerate.amazonaws.com/'\n",
+    "                'gluon/dataset/pikachu/')\n",
+    "    dataset = {'train.rec': 'e6bcb6ffba1ac04ff8a9b1115e650af56ee969c8',\n",
+    "               'train.idx': 'dcf7318b2602c06428b9988470c731621716c393',\n",
+    "               'val.rec': 'd6c33f799b4d058e82f2cb5bd9a976f69d72d520'}\n",
+    "    for k, v in dataset.items():\n",
+    "        gutils.download(root_url + k, os.path.join(data_dir, k), sha1_hash=v)\n",
+    "\n",
+    "if not os.path.exists(os.path.join(data_dir, \"train.rec\")):\n",
+    "    print(\"下载原始数据集到%s...\" % data_dir)\n",
+    "    _download_pikachu(data_dir)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 2. MXNet数据迭代器"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 3,
+   "metadata": {
+    "collapsed": true
+   },
+   "outputs": [],
+   "source": [
+    "def load_data_pikachu(batch_size, edge_size=256):  # edge_size：输出图像的宽和高\n",
+    "    train_iter = image.ImageDetIter(\n",
+    "        path_imgrec=os.path.join(data_dir, 'train.rec'),\n",
+    "        path_imgidx=os.path.join(data_dir, 'train.idx'),\n",
+    "        batch_size=batch_size,\n",
+    "        data_shape=(3, edge_size, edge_size),  # 输出图像的形状\n",
+    "#         shuffle=False,  # 以随机顺序读取数据集\n",
+    "#         rand_crop=1,  # 随机裁剪的概率为1\n",
+    "        min_object_covered=0.95, max_attempts=200)\n",
+    "    val_iter = image.ImageDetIter(\n",
+    "        path_imgrec=os.path.join(data_dir, 'val.rec'), batch_size=batch_size,\n",
+    "        data_shape=(3, edge_size, edge_size), shuffle=False)\n",
+    "    return train_iter, val_iter"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 4,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "((3, 256, 256), (1, 5))"
+      ]
+     },
+     "execution_count": 4,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "batch_size, edge_size = 1, 256\n",
+    "train_iter, val_iter = load_data_pikachu(batch_size, edge_size)\n",
+    "batch = train_iter.next()\n",
+    "batch.data[0][0].shape, batch.label[0][0].shape"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 3. 转换成PNG图片并保存"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 6,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "def process(data_iter, save_dir):\n",
+    "    \"\"\"batch size == 1\"\"\"\n",
+    "    data_iter.reset() # 从头开始\n",
+    "    all_label = dict()\n",
+    "    id = 1\n",
+    "    os.makedirs(os.path.join(save_dir, 'images'), exist_ok=True)\n",
+    "    for sample in tqdm(data_iter):\n",
+    "        x = sample.data[0][0].asnumpy().transpose((1,2,0))\n",
+    "        plt.imsave(os.path.join(save_dir, 'images', str(id) + '.png'), x / 255.0)\n",
+    "\n",
+    "        y = sample.label[0][0][0].asnumpy()\n",
+    "\n",
+    "        label = {}\n",
+    "        label[\"class\"] = int(y[0])\n",
+    "        label[\"loc\"] = y[1:].tolist()\n",
+    "\n",
+    "        all_label[str(id) + '.png'] = label.copy()\n",
+    "\n",
+    "        id += 1\n",
+    "\n",
+    "    with open(os.path.join(save_dir, 'label.json'), 'w') as f:\n",
+    "        json.dump(all_label, f, indent=True)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 7,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "900it [00:40, 22.03it/s]\n"
+     ]
+    }
+   ],
+   "source": [
+    "process(data_iter = train_iter, save_dir = os.path.join(data_dir, \"train\"))"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 8,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "100it [00:04, 22.86it/s]\n"
+     ]
+    }
+   ],
+   "source": [
+    "process(data_iter = val_iter, save_dir = os.path.join(data_dir, \"val\"))"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.6.2"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}
--- a/code/chapter09_computer-vision/9.6_object-detection-dataset.ipynb
+++ b/code/chapter09_computer-vision/9.6_object-detection-dataset.ipynb
--- a/code/chapter09_computer-vision/9.8_rcnn.ipynb
+++ b/code/chapter09_computer-vision/9.8_rcnn.ipynb
@ -0,0 +1,128 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# 9.8 区域卷积神经网络（R-CNN）系列"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 1,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "0.4.0a0+6b959ee\n"
+     ]
+    }
+   ],
+   "source": [
+    "import torch\n",
+    "import torchvision\n",
+    "\n",
+    "print(torchvision.__version__)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 9.8.2 Fast R-CNN"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 2,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "tensor([[[[ 0.,  1.,  2.,  3.],\n",
+       "          [ 4.,  5.,  6.,  7.],\n",
+       "          [ 8.,  9., 10., 11.],\n",
+       "          [12., 13., 14., 15.]]]])"
+      ]
+     },
+     "execution_count": 2,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "X = torch.arange(16, dtype=torch.float).view(1, 1, 4, 4)\n",
+    "X"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 3,
+   "metadata": {
+    "collapsed": true
+   },
+   "outputs": [],
+   "source": [
+    "rois = torch.tensor([[0, 0, 0, 20, 20], [0, 0, 10, 30, 30]], dtype=torch.float)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 4,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "tensor([[[[ 5.,  6.],\n",
+       "          [ 9., 10.]]],\n",
+       "\n",
+       "\n",
+       "        [[[ 9., 11.],\n",
+       "          [13., 15.]]]])"
+      ]
+     },
+     "execution_count": 4,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "torchvision.ops.roi_pool(X, rois, output_size=(2, 2), spatial_scale=0.1)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "collapsed": true
+   },
+   "outputs": [],
+   "source": []
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.6.2"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}
--- a/code/chapter09_computer-vision/9.9_semantic-segmentation-and-dataset.ipynb
+++ b/code/chapter09_computer-vision/9.9_semantic-segmentation-and-dataset.ipynb
--- a/code/chapter10_natural-language-processing/10.12_machine-translation.ipynb
+++ b/code/chapter10_natural-language-processing/10.12_machine-translation.ipynb
@ -0,0 +1,523 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# 10.12 机器翻译"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 1,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "1.2.0 cpu\n"
+     ]
+    }
+   ],
+   "source": [
+    "import collections\n",
+    "import os\n",
+    "import io\n",
+    "import math\n",
+    "import torch\n",
+    "from torch import nn\n",
+    "import torch.nn.functional as F\n",
+    "import torchtext.vocab as Vocab\n",
+    "import torch.utils.data as Data\n",
+    "\n",
+    "import sys\n",
+    "sys.path.append(\"..\") \n",
+    "import d2lzh_pytorch as d2l\n",
+    "\n",
+    "PAD, BOS, EOS = '<pad>', '<bos>', '<eos>'\n",
+    "os.environ[\"CUDA_VISIBLE_DEVICES\"] = \"0\"\n",
+    "device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')\n",
+    "\n",
+    "print(torch.__version__, device)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 10.12.1 读取和预处理数据"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 2,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# 将一个序列中所有的词记录在all_tokens中以便之后构造词典，然后在该序列后面添加PAD直到序列\n",
+    "# 长度变为max_seq_len，然后将序列保存在all_seqs中\n",
+    "def process_one_seq(seq_tokens, all_tokens, all_seqs, max_seq_len):\n",
+    "    all_tokens.extend(seq_tokens)\n",
+    "    seq_tokens += [EOS] + [PAD] * (max_seq_len - len(seq_tokens) - 1)\n",
+    "    all_seqs.append(seq_tokens)\n",
+    "\n",
+    "# 使用所有的词来构造词典。并将所有序列中的词变换为词索引后构造Tensor\n",
+    "def build_data(all_tokens, all_seqs):\n",
+    "    vocab = Vocab.Vocab(collections.Counter(all_tokens),\n",
+    "                        specials=[PAD, BOS, EOS])\n",
+    "    indices = [[vocab.stoi[w] for w in seq] for seq in all_seqs]\n",
+    "    return vocab, torch.tensor(indices)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 3,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "def read_data(max_seq_len):\n",
+    "    # in和out分别是input和output的缩写\n",
+    "    in_tokens, out_tokens, in_seqs, out_seqs = [], [], [], []\n",
+    "    with io.open('../../data/fr-en-small.txt') as f:\n",
+    "        lines = f.readlines()\n",
+    "    for line in lines:\n",
+    "        in_seq, out_seq = line.rstrip().split('\\t')\n",
+    "        in_seq_tokens, out_seq_tokens = in_seq.split(' '), out_seq.split(' ')\n",
+    "        if max(len(in_seq_tokens), len(out_seq_tokens)) > max_seq_len - 1:\n",
+    "            continue  # 如果加上EOS后长于max_seq_len，则忽略掉此样本\n",
+    "        process_one_seq(in_seq_tokens, in_tokens, in_seqs, max_seq_len)\n",
+    "        process_one_seq(out_seq_tokens, out_tokens, out_seqs, max_seq_len)\n",
+    "    in_vocab, in_data = build_data(in_tokens, in_seqs)\n",
+    "    out_vocab, out_data = build_data(out_tokens, out_seqs)\n",
+    "    return in_vocab, out_vocab, Data.TensorDataset(in_data, out_data)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 4,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "(tensor([ 5,  4, 45,  3,  2,  0,  0]), tensor([ 8,  4, 27,  3,  2,  0,  0]))"
+      ]
+     },
+     "execution_count": 4,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "max_seq_len = 7\n",
+    "in_vocab, out_vocab, dataset = read_data(max_seq_len)\n",
+    "dataset[0]"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 10.12.2 含注意力机制的编码器—解码器\n",
+    "### 10.12.2.1 编码器"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 5,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "class Encoder(nn.Module):\n",
+    "    def __init__(self, vocab_size, embed_size, num_hiddens, num_layers,\n",
+    "                 drop_prob=0, **kwargs):\n",
+    "        super(Encoder, self).__init__(**kwargs)\n",
+    "        self.embedding = nn.Embedding(vocab_size, embed_size)\n",
+    "        self.rnn = nn.GRU(embed_size, num_hiddens, num_layers, dropout=drop_prob)\n",
+    "\n",
+    "    def forward(self, inputs, state):\n",
+    "        # 输入形状是(批量大小, 时间步数)。将输出互换样本维和时间步维\n",
+    "        embedding = self.embedding(inputs.long()).permute(1, 0, 2) # (seq_len, batch, input_size)\n",
+    "        return self.rnn(embedding, state)\n",
+    "\n",
+    "    def begin_state(self):\n",
+    "        return None"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 6,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "(torch.Size([7, 4, 16]), torch.Size([2, 4, 16]))"
+      ]
+     },
+     "execution_count": 6,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "encoder = Encoder(vocab_size=10, embed_size=8, num_hiddens=16, num_layers=2)\n",
+    "output, state = encoder(torch.zeros((4, 7)), encoder.begin_state())\n",
+    "output.shape, state.shape # GRU的state是h, 而LSTM的是一个元组(h, c)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### 10.12.2.2 注意力机制"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 7,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "def attention_model(input_size, attention_size):\n",
+    "    model = nn.Sequential(nn.Linear(input_size, attention_size, bias=False),\n",
+    "                          nn.Tanh(),\n",
+    "                          nn.Linear(attention_size, 1, bias=False))\n",
+    "    return model"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 8,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "def attention_forward(model, enc_states, dec_state):\n",
+    "    \"\"\"\n",
+    "    enc_states: (时间步数, 批量大小, 隐藏单元个数)\n",
+    "    dec_state: (批量大小, 隐藏单元个数)\n",
+    "    \"\"\"\n",
+    "    # 将解码器隐藏状态广播到和编码器隐藏状态形状相同后进行连结\n",
+    "    dec_states = dec_state.unsqueeze(dim=0).expand_as(enc_states)\n",
+    "    enc_and_dec_states = torch.cat((enc_states, dec_states), dim=2)\n",
+    "    e = model(enc_and_dec_states)  # 形状为(时间步数, 批量大小, 1)\n",
+    "    alpha = F.softmax(e, dim=0)  # 在时间步维度做softmax运算\n",
+    "    return (alpha * enc_states).sum(dim=0)  # 返回背景变量"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 9,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "torch.Size([4, 8])"
+      ]
+     },
+     "execution_count": 9,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "seq_len, batch_size, num_hiddens = 10, 4, 8\n",
+    "model = attention_model(2*num_hiddens, 10) \n",
+    "enc_states = torch.zeros((seq_len, batch_size, num_hiddens))\n",
+    "dec_state = torch.zeros((batch_size, num_hiddens))\n",
+    "attention_forward(model, enc_states, dec_state).shape"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### 10.12.2.3 含注意力机制的解码器"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 10,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "class Decoder(nn.Module):\n",
+    "    def __init__(self, vocab_size, embed_size, num_hiddens, num_layers,\n",
+    "                 attention_size, drop_prob=0):\n",
+    "        super(Decoder, self).__init__()\n",
+    "        self.embedding = nn.Embedding(vocab_size, embed_size)\n",
+    "        self.attention = attention_model(2*num_hiddens, attention_size)\n",
+    "        # GRU的输入包含attention输出的c和实际输入, 所以尺寸是 num_hiddens+embed_size\n",
+    "        self.rnn = nn.GRU(num_hiddens + embed_size, num_hiddens, \n",
+    "                          num_layers, dropout=drop_prob)\n",
+    "        self.out = nn.Linear(num_hiddens, vocab_size)\n",
+    "\n",
+    "    def forward(self, cur_input, state, enc_states):\n",
+    "        \"\"\"\n",
+    "        cur_input shape: (batch, )\n",
+    "        state shape: (num_layers, batch, num_hiddens)\n",
+    "        \"\"\"\n",
+    "        # 使用注意力机制计算背景向量\n",
+    "        c = attention_forward(self.attention, enc_states, state[-1])\n",
+    "        # 将嵌入后的输入和背景向量在特征维连结, (批量大小, num_hiddens+embed_size)\n",
+    "        input_and_c = torch.cat((self.embedding(cur_input), c), dim=1) \n",
+    "        # 为输入和背景向量的连结增加时间步维，时间步个数为1\n",
+    "        output, state = self.rnn(input_and_c.unsqueeze(0), state)\n",
+    "        # 移除时间步维，输出形状为(批量大小, 输出词典大小)\n",
+    "        output = self.out(output).squeeze(dim=0)\n",
+    "        return output, state\n",
+    "\n",
+    "    def begin_state(self, enc_state):\n",
+    "        # 直接将编码器最终时间步的隐藏状态作为解码器的初始隐藏状态\n",
+    "        return enc_state"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 10.12.3 训练模型"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 11,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "def batch_loss(encoder, decoder, X, Y, loss):\n",
+    "    batch_size = X.shape[0]\n",
+    "    enc_state = encoder.begin_state()\n",
+    "    enc_outputs, enc_state = encoder(X, enc_state)\n",
+    "    # 初始化解码器的隐藏状态\n",
+    "    dec_state = decoder.begin_state(enc_state)\n",
+    "    # 解码器在最初时间步的输入是BOS\n",
+    "    dec_input = torch.tensor([out_vocab.stoi[BOS]] * batch_size)\n",
+    "    # 我们将使用掩码变量mask来忽略掉标签为填充项PAD的损失, 初始全1\n",
+    "    mask, num_not_pad_tokens = torch.ones(batch_size,), 0\n",
+    "    l = torch.tensor([0.0])\n",
+    "    for y in Y.permute(1,0): # Y shape: (batch, seq_len)\n",
+    "        dec_output, dec_state = decoder(dec_input, dec_state, enc_outputs)\n",
+    "        l = l + (mask * loss(dec_output, y)).sum()\n",
+    "        dec_input = y  # 使用强制教学\n",
+    "        num_not_pad_tokens += mask.sum().item()\n",
+    "        # EOS后面全是PAD. 下面一行保证一旦遇到EOS接下来的循环中mask就一直是0\n",
+    "        mask = mask * (y != out_vocab.stoi[EOS]).float()\n",
+    "    return l / num_not_pad_tokens"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 12,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "def train(encoder, decoder, dataset, lr, batch_size, num_epochs):\n",
+    "    enc_optimizer = torch.optim.Adam(encoder.parameters(), lr=lr)\n",
+    "    dec_optimizer = torch.optim.Adam(decoder.parameters(), lr=lr)\n",
+    "\n",
+    "    loss = nn.CrossEntropyLoss(reduction='none')\n",
+    "    data_iter = Data.DataLoader(dataset, batch_size, shuffle=True)\n",
+    "    for epoch in range(num_epochs):\n",
+    "        l_sum = 0.0\n",
+    "        for X, Y in data_iter:\n",
+    "            enc_optimizer.zero_grad()\n",
+    "            dec_optimizer.zero_grad()\n",
+    "            l = batch_loss(encoder, decoder, X, Y, loss)\n",
+    "            l.backward()\n",
+    "            enc_optimizer.step()\n",
+    "            dec_optimizer.step()\n",
+    "            l_sum += l.item()\n",
+    "        if (epoch + 1) % 10 == 0:\n",
+    "            print(\"epoch %d, loss %.3f\" % (epoch + 1, l_sum / len(data_iter)))"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 13,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "epoch 10, loss 0.475\n",
+      "epoch 20, loss 0.245\n",
+      "epoch 30, loss 0.157\n",
+      "epoch 40, loss 0.052\n",
+      "epoch 50, loss 0.039\n"
+     ]
+    }
+   ],
+   "source": [
+    "embed_size, num_hiddens, num_layers = 64, 64, 2\n",
+    "attention_size, drop_prob, lr, batch_size, num_epochs = 10, 0.5, 0.01, 2, 50\n",
+    "encoder = Encoder(len(in_vocab), embed_size, num_hiddens, num_layers,\n",
+    "                  drop_prob)\n",
+    "decoder = Decoder(len(out_vocab), embed_size, num_hiddens, num_layers,\n",
+    "                  attention_size, drop_prob)\n",
+    "train(encoder, decoder, dataset, lr, batch_size, num_epochs)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 10.12.4 预测不定长的序列"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 14,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "def translate(encoder, decoder, input_seq, max_seq_len):\n",
+    "    in_tokens = input_seq.split(' ')\n",
+    "    in_tokens += [EOS] + [PAD] * (max_seq_len - len(in_tokens) - 1)\n",
+    "    enc_input = torch.tensor([[in_vocab.stoi[tk] for tk in in_tokens]]) # batch=1\n",
+    "    enc_state = encoder.begin_state()\n",
+    "    enc_output, enc_state = encoder(enc_input, enc_state)\n",
+    "    dec_input = torch.tensor([out_vocab.stoi[BOS]])\n",
+    "    dec_state = decoder.begin_state(enc_state)\n",
+    "    output_tokens = []\n",
+    "    for _ in range(max_seq_len):\n",
+    "        dec_output, dec_state = decoder(dec_input, dec_state, enc_output)\n",
+    "        pred = dec_output.argmax(dim=1)\n",
+    "        pred_token = out_vocab.itos[int(pred.item())]\n",
+    "        if pred_token == EOS:  # 当任一时间步搜索出EOS时，输出序列即完成\n",
+    "            break\n",
+    "        else:\n",
+    "            output_tokens.append(pred_token)\n",
+    "            dec_input = pred\n",
+    "    return output_tokens"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 15,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "['they', 'are', 'watching', '.']"
+      ]
+     },
+     "execution_count": 15,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "input_seq = 'ils regardent .'\n",
+    "translate(encoder, decoder, input_seq, max_seq_len)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 10.12.5 评价翻译结果"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 16,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "def bleu(pred_tokens, label_tokens, k):\n",
+    "    len_pred, len_label = len(pred_tokens), len(label_tokens)\n",
+    "    score = math.exp(min(0, 1 - len_label / len_pred))\n",
+    "    for n in range(1, k + 1):\n",
+    "        num_matches, label_subs = 0, collections.defaultdict(int)\n",
+    "        for i in range(len_label - n + 1):\n",
+    "            label_subs[''.join(label_tokens[i: i + n])] += 1\n",
+    "        for i in range(len_pred - n + 1):\n",
+    "            if label_subs[''.join(pred_tokens[i: i + n])] > 0:\n",
+    "                num_matches += 1\n",
+    "                label_subs[''.join(pred_tokens[i: i + n])] -= 1\n",
+    "        score *= math.pow(num_matches / (len_pred - n + 1), math.pow(0.5, n))\n",
+    "    return score"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 17,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "def score(input_seq, label_seq, k):\n",
+    "    pred_tokens = translate(encoder, decoder, input_seq, max_seq_len)\n",
+    "    label_tokens = label_seq.split(' ')\n",
+    "    print('bleu %.3f, predict: %s' % (bleu(pred_tokens, label_tokens, k),\n",
+    "                                      ' '.join(pred_tokens)))"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 18,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "bleu 1.000, predict: they are watching .\n"
+     ]
+    }
+   ],
+   "source": [
+    "score('ils regardent .', 'they are watching .', k=2)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 19,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "bleu 0.658, predict: they are exhausted .\n"
+     ]
+    }
+   ],
+   "source": [
+    "score('ils sont canadienne .', 'they are canadian .', k=2)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": []
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python [conda env:py36]",
+   "language": "python",
+   "name": "conda-env-py36-py"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.6.2"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}
--- a/code/chapter10_natural-language-processing/10.3_word2vec-pytorch.ipynb
+++ b/code/chapter10_natural-language-processing/10.3_word2vec-pytorch.ipynb
@ -0,0 +1,765 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# 10.3 word2vec的实现"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 1,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "1.0.0\n"
+     ]
+    }
+   ],
+   "source": [
+    "import collections\n",
+    "import math\n",
+    "import random\n",
+    "import sys\n",
+    "import time\n",
+    "import os\n",
+    "import numpy as np\n",
+    "import torch\n",
+    "from torch import nn\n",
+    "import torch.utils.data as Data\n",
+    "\n",
+    "sys.path.append(\"..\") \n",
+    "import d2lzh_pytorch as d2l\n",
+    "print(torch.__version__)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 10.3.1 处理数据集"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 2,
+   "metadata": {
+    "collapsed": true
+   },
+   "outputs": [],
+   "source": [
+    "assert 'ptb.train.txt' in os.listdir(\"../../data/ptb\")"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 3,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "'# sentences: 42068'"
+      ]
+     },
+     "execution_count": 3,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "with open('../../data/ptb/ptb.train.txt', 'r') as f:\n",
+    "    lines = f.readlines()\n",
+    "    # st是sentence的缩写\n",
+    "    raw_dataset = [st.split() for st in lines]\n",
+    "\n",
+    "'# sentences: %d' % len(raw_dataset)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 4,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "# tokens: 24 ['aer', 'banknote', 'berlitz', 'calloway', 'centrust']\n",
+      "# tokens: 15 ['pierre', '<unk>', 'N', 'years', 'old']\n",
+      "# tokens: 11 ['mr.', '<unk>', 'is', 'chairman', 'of']\n"
+     ]
+    }
+   ],
+   "source": [
+    "for st in raw_dataset[:3]:\n",
+    "    print('# tokens:', len(st), st[:5])"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### 10.3.1.1 建立词语索引"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 5,
+   "metadata": {
+    "collapsed": true
+   },
+   "outputs": [],
+   "source": [
+    "# tk是token的缩写\n",
+    "counter = collections.Counter([tk for st in raw_dataset for tk in st])\n",
+    "counter = dict(filter(lambda x: x[1] >= 5, counter.items()))"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 6,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "'# tokens: 887100'"
+      ]
+     },
+     "execution_count": 6,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "idx_to_token = [tk for tk, _ in counter.items()]\n",
+    "token_to_idx = {tk: idx for idx, tk in enumerate(idx_to_token)}\n",
+    "dataset = [[token_to_idx[tk] for tk in st if tk in token_to_idx]\n",
+    "           for st in raw_dataset]\n",
+    "num_tokens = sum([len(st) for st in dataset])\n",
+    "'# tokens: %d' % num_tokens"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### 10.3.1.2 二次采样"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 7,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "'# tokens: 375647'"
+      ]
+     },
+     "execution_count": 7,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "def discard(idx):\n",
+    "    return random.uniform(0, 1) < 1 - math.sqrt(\n",
+    "        1e-4 / counter[idx_to_token[idx]] * num_tokens)\n",
+    "\n",
+    "subsampled_dataset = [[tk for tk in st if not discard(tk)] for st in dataset]\n",
+    "'# tokens: %d' % sum([len(st) for st in subsampled_dataset])"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 8,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "'# the: before=50770, after=2043'"
+      ]
+     },
+     "execution_count": 8,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "def compare_counts(token):\n",
+    "    return '# %s: before=%d, after=%d' % (token, sum(\n",
+    "        [st.count(token_to_idx[token]) for st in dataset]), sum(\n",
+    "        [st.count(token_to_idx[token]) for st in subsampled_dataset]))\n",
+    "\n",
+    "compare_counts('the')"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 9,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "'# join: before=45, after=45'"
+      ]
+     },
+     "execution_count": 9,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "compare_counts('join')"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### 10.3.1.3 提取中心词和背景词"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 10,
+   "metadata": {
+    "collapsed": true
+   },
+   "outputs": [],
+   "source": [
+    "def get_centers_and_contexts(dataset, max_window_size):\n",
+    "    centers, contexts = [], []\n",
+    "    for st in dataset:\n",
+    "        if len(st) < 2:  # 每个句子至少要有2个词才可能组成一对“中心词-背景词”\n",
+    "            continue\n",
+    "        centers += st\n",
+    "        for center_i in range(len(st)):\n",
+    "            window_size = random.randint(1, max_window_size)\n",
+    "            indices = list(range(max(0, center_i - window_size),\n",
+    "                                 min(len(st), center_i + 1 + window_size)))\n",
+    "            indices.remove(center_i)  # 将中心词排除在背景词之外\n",
+    "            contexts.append([st[idx] for idx in indices])\n",
+    "    return centers, contexts"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 11,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "dataset [[0, 1, 2, 3, 4, 5, 6], [7, 8, 9]]\n",
+      "center 0 has contexts [1, 2]\n",
+      "center 1 has contexts [0, 2]\n",
+      "center 2 has contexts [0, 1, 3, 4]\n",
+      "center 3 has contexts [1, 2, 4, 5]\n",
+      "center 4 has contexts [3, 5]\n",
+      "center 5 has contexts [4, 6]\n",
+      "center 6 has contexts [4, 5]\n",
+      "center 7 has contexts [8, 9]\n",
+      "center 8 has contexts [7, 9]\n",
+      "center 9 has contexts [7, 8]\n"
+     ]
+    }
+   ],
+   "source": [
+    "tiny_dataset = [list(range(7)), list(range(7, 10))]\n",
+    "print('dataset', tiny_dataset)\n",
+    "for center, context in zip(*get_centers_and_contexts(tiny_dataset, 2)):\n",
+    "    print('center', center, 'has contexts', context)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 12,
+   "metadata": {
+    "collapsed": true
+   },
+   "outputs": [],
+   "source": [
+    "all_centers, all_contexts = get_centers_and_contexts(subsampled_dataset, 5)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 10.3.2 负采样"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 13,
+   "metadata": {
+    "collapsed": true
+   },
+   "outputs": [],
+   "source": [
+    "def get_negatives(all_contexts, sampling_weights, K):\n",
+    "    all_negatives, neg_candidates, i = [], [], 0\n",
+    "    population = list(range(len(sampling_weights)))\n",
+    "    for contexts in all_contexts:\n",
+    "        negatives = []\n",
+    "        while len(negatives) < len(contexts) * K:\n",
+    "            if i == len(neg_candidates):\n",
+    "                # 根据每个词的权重（sampling_weights）随机生成k个词的索引作为噪声词。\n",
+    "                # 为了高效计算，可以将k设得稍大一点\n",
+    "                i, neg_candidates = 0, random.choices(\n",
+    "                    population, sampling_weights, k=int(1e5))\n",
+    "            neg, i = neg_candidates[i], i + 1\n",
+    "            # 噪声词不能是背景词\n",
+    "            if neg not in set(contexts):\n",
+    "                negatives.append(neg)\n",
+    "        all_negatives.append(negatives)\n",
+    "    return all_negatives\n",
+    "\n",
+    "sampling_weights = [counter[w]**0.75 for w in idx_to_token]\n",
+    "all_negatives = get_negatives(all_contexts, sampling_weights, 5)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 10.3.3 读取数据"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 14,
+   "metadata": {
+    "collapsed": true
+   },
+   "outputs": [],
+   "source": [
+    "def batchify(data):\n",
+    "    \"\"\"用作DataLoader的参数collate_fn: 输入是个长为batchsize的list, list中的每个元素都是__getitem__得到的结果\"\"\"\n",
+    "    max_len = max(len(c) + len(n) for _, c, n in data)\n",
+    "    centers, contexts_negatives, masks, labels = [], [], [], []\n",
+    "    for center, context, negative in data:\n",
+    "        cur_len = len(context) + len(negative)\n",
+    "        centers += [center]\n",
+    "        contexts_negatives += [context + negative + [0] * (max_len - cur_len)]\n",
+    "        masks += [[1] * cur_len + [0] * (max_len - cur_len)]\n",
+    "        labels += [[1] * len(context) + [0] * (max_len - len(context))]\n",
+    "    return (torch.tensor(centers).view(-1, 1), torch.tensor(contexts_negatives),\n",
+    "            torch.tensor(masks), torch.tensor(labels))"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 15,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "centers shape: torch.Size([512, 1])\n",
+      "contexts_negatives shape: torch.Size([512, 60])\n",
+      "masks shape: torch.Size([512, 60])\n",
+      "labels shape: torch.Size([512, 60])\n"
+     ]
+    }
+   ],
+   "source": [
+    "class MyDataset(torch.utils.data.Dataset):\n",
+    "    def __init__(self, centers, contexts, negatives):\n",
+    "        assert len(centers) == len(contexts) == len(negatives)\n",
+    "        self.centers = centers\n",
+    "        self.contexts = contexts\n",
+    "        self.negatives = negatives\n",
+    "        \n",
+    "    def __getitem__(self, index):\n",
+    "        return (self.centers[index], self.contexts[index], self.negatives[index])\n",
+    "\n",
+    "    def __len__(self):\n",
+    "        return len(self.centers)\n",
+    "\n",
+    "batch_size = 512\n",
+    "num_workers = 0 if sys.platform.startswith('win32') else 4\n",
+    "\n",
+    "dataset = MyDataset(all_centers, \n",
+    "                    all_contexts, \n",
+    "                    all_negatives)\n",
+    "data_iter = Data.DataLoader(dataset, batch_size, shuffle=True,\n",
+    "                            collate_fn=batchify, \n",
+    "                            num_workers=num_workers)\n",
+    "for batch in data_iter:\n",
+    "    for name, data in zip(['centers', 'contexts_negatives', 'masks',\n",
+    "                           'labels'], batch):\n",
+    "        print(name, 'shape:', data.shape)\n",
+    "    break"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 10.3.4 跳字模型\n",
+    "### 10.3.4.1 嵌入层"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 16,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "Parameter containing:\n",
+       "tensor([[-2.8935,  1.9747, -0.2081, -0.6574],\n",
+       "        [ 1.3135, -1.7396, -1.4210,  1.3302],\n",
+       "        [-0.0465,  1.0802, -0.5344,  0.5250],\n",
+       "        [-0.6899,  1.1832, -0.1694,  0.1382],\n",
+       "        [-1.3940, -1.4121,  0.1867,  0.7681],\n",
+       "        [ 0.2224, -0.3751,  0.5170,  0.1359],\n",
+       "        [-1.4377,  0.4700,  0.5167,  0.8427],\n",
+       "        [ 1.5523,  0.0542,  1.2034, -0.1215],\n",
+       "        [-0.4874, -0.7876, -1.1580,  0.0728],\n",
+       "        [-1.4077, -0.8691, -0.8106, -0.0612],\n",
+       "        [-0.4633, -1.8948,  0.1791,  2.1354],\n",
+       "        [ 0.4180,  1.3088,  1.2537,  2.0183],\n",
+       "        [ 1.5453,  1.3754, -0.3551,  0.4333],\n",
+       "        [ 1.7966, -0.2033, -0.5374, -0.0457],\n",
+       "        [ 1.7540,  0.3209,  0.9063,  1.0655],\n",
+       "        [-0.2148, -0.0743, -1.9261,  1.1415],\n",
+       "        [-0.6571, -0.7888,  0.6224,  1.0660],\n",
+       "        [-1.5191,  1.7596,  0.8295,  0.8935],\n",
+       "        [ 0.4348, -0.2445, -0.6763,  1.5176],\n",
+       "        [ 0.2910,  0.4196, -1.6204,  1.8422]], requires_grad=True)"
+      ]
+     },
+     "execution_count": 16,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "embed = nn.Embedding(num_embeddings=20, embedding_dim=4)\n",
+    "embed.weight"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 17,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "tensor([[[ 1.3135, -1.7396, -1.4210,  1.3302],\n",
+       "         [-0.0465,  1.0802, -0.5344,  0.5250],\n",
+       "         [-0.6899,  1.1832, -0.1694,  0.1382]],\n",
+       "\n",
+       "        [[-1.3940, -1.4121,  0.1867,  0.7681],\n",
+       "         [ 0.2224, -0.3751,  0.5170,  0.1359],\n",
+       "         [-1.4377,  0.4700,  0.5167,  0.8427]]], grad_fn=<EmbeddingBackward>)"
+      ]
+     },
+     "execution_count": 17,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "x = torch.tensor([[1, 2, 3], [4, 5, 6]], dtype=torch.long)\n",
+    "embed(x)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### 10.3.4.2 小批量乘法"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 18,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "torch.Size([2, 1, 6])"
+      ]
+     },
+     "execution_count": 18,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "X = torch.ones((2, 1, 4))\n",
+    "Y = torch.ones((2, 4, 6))\n",
+    "torch.bmm(X, Y).shape"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### 10.3.4.3 跳字模型前向计算"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 19,
+   "metadata": {
+    "collapsed": true
+   },
+   "outputs": [],
+   "source": [
+    "def skip_gram(center, contexts_and_negatives, embed_v, embed_u):\n",
+    "    v = embed_v(center)\n",
+    "    u = embed_u(contexts_and_negatives)\n",
+    "    pred = torch.bmm(v, u.permute(0, 2, 1))\n",
+    "    return pred"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 10.3.5 训练模型\n",
+    "### 10.3.5.1 二元交叉熵损失函数"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 20,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "class SigmoidBinaryCrossEntropyLoss(nn.Module):\n",
+    "    def __init__(self): # none mean sum\n",
+    "        super(SigmoidBinaryCrossEntropyLoss, self).__init__()\n",
+    "    def forward(self, inputs, targets, mask=None):\n",
+    "        \"\"\"\n",
+    "        input – Tensor shape: (batch_size, len)\n",
+    "        target – Tensor of the same shape as input\n",
+    "        \"\"\"\n",
+    "        inputs, targets, mask = inputs.float(), targets.float(), mask.float()\n",
+    "        res = nn.functional.binary_cross_entropy_with_logits(inputs, targets, reduction=\"none\", weight=mask)\n",
+    "        return res.mean(dim=1)\n",
+    "\n",
+    "loss = SigmoidBinaryCrossEntropyLoss()"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 21,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "tensor([0.8740, 1.2100])"
+      ]
+     },
+     "execution_count": 21,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "pred = torch.tensor([[1.5, 0.3, -1, 2], [1.1, -0.6, 2.2, 0.4]])\n",
+    "# 标签变量label中的1和0分别代表背景词和噪声词\n",
+    "label = torch.tensor([[1, 0, 0, 0], [1, 1, 0, 0]])\n",
+    "mask = torch.tensor([[1, 1, 1, 1], [1, 1, 1, 0]])  # 掩码变量\n",
+    "loss(pred, label, mask) * mask.shape[1] / mask.float().sum(dim=1)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 22,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "0.8740\n",
+      "1.2100\n"
+     ]
+    }
+   ],
+   "source": [
+    "def sigmd(x):\n",
+    "    return - math.log(1 / (1 + math.exp(-x)))\n",
+    "\n",
+    "print('%.4f' % ((sigmd(1.5) + sigmd(-0.3) + sigmd(1) + sigmd(-2)) / 4)) # 注意1-sigmoid(x) = sigmoid(-x)\n",
+    "print('%.4f' % ((sigmd(1.1) + sigmd(-0.6) + sigmd(-2.2)) / 3))"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### 10.3.5.2 初始化模型参数"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 23,
+   "metadata": {
+    "collapsed": true
+   },
+   "outputs": [],
+   "source": [
+    "embed_size = 100\n",
+    "net = nn.Sequential(\n",
+    "    nn.Embedding(num_embeddings=len(idx_to_token), embedding_dim=embed_size),\n",
+    "    nn.Embedding(num_embeddings=len(idx_to_token), embedding_dim=embed_size)\n",
+    ")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### 10.3.5.3 定义训练函数"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 24,
+   "metadata": {
+    "collapsed": true
+   },
+   "outputs": [],
+   "source": [
+    "def train(net, lr, num_epochs):\n",
+    "    device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')\n",
+    "    print(\"train on\", device)\n",
+    "    net = net.to(device)\n",
+    "    optimizer = torch.optim.Adam(net.parameters(), lr=lr)\n",
+    "    for epoch in range(num_epochs):\n",
+    "        start, l_sum, n = time.time(), 0.0, 0\n",
+    "        for batch in data_iter:\n",
+    "            center, context_negative, mask, label = [d.to(device) for d in batch]\n",
+    "            \n",
+    "            pred = skip_gram(center, context_negative, net[0], net[1])\n",
+    "            \n",
+    "            # 使用掩码变量mask来避免填充项对损失函数计算的影响\n",
+    "            l = (loss(pred.view(label.shape), label, mask) *\n",
+    "                 mask.shape[1] / mask.float().sum(dim=1)).mean() # 一个batch的平均loss\n",
+    "            optimizer.zero_grad()\n",
+    "            l.backward()\n",
+    "            optimizer.step()\n",
+    "            l_sum += l.cpu().item()\n",
+    "            n += 1\n",
+    "        print('epoch %d, loss %.2f, time %.2fs'\n",
+    "              % (epoch + 1, l_sum / n, time.time() - start))"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 25,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "train on cpu\n",
+      "epoch 1, loss 1.97, time 74.53s\n",
+      "epoch 2, loss 0.62, time 81.85s\n",
+      "epoch 3, loss 0.45, time 74.49s\n",
+      "epoch 4, loss 0.39, time 72.04s\n",
+      "epoch 5, loss 0.37, time 72.21s\n",
+      "epoch 6, loss 0.35, time 71.81s\n",
+      "epoch 7, loss 0.34, time 72.00s\n",
+      "epoch 8, loss 0.33, time 74.45s\n",
+      "epoch 9, loss 0.32, time 72.08s\n",
+      "epoch 10, loss 0.32, time 72.05s\n"
+     ]
+    }
+   ],
+   "source": [
+    "train(net, 0.01, 10)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 10.3.6 应用词嵌入模型"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 27,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "cosine sim=0.478: hard-disk\n",
+      "cosine sim=0.446: intel\n",
+      "cosine sim=0.440: drives\n"
+     ]
+    }
+   ],
+   "source": [
+    "def get_similar_tokens(query_token, k, embed):\n",
+    "    W = embed.weight.data\n",
+    "    x = W[token_to_idx[query_token]]\n",
+    "    # 添加的1e-9是为了数值稳定性\n",
+    "    cos = torch.matmul(W, x) / (torch.sum(W * W, dim=1) * torch.sum(x * x) + 1e-9).sqrt()\n",
+    "    _, topk = torch.topk(cos, k=k+1)\n",
+    "    topk = topk.cpu().numpy()\n",
+    "    for i in topk[1:]:  # 除去输入词\n",
+    "        print('cosine sim=%.3f: %s' % (cos[i], (idx_to_token[i])))\n",
+    "        \n",
+    "get_similar_tokens('chip', 3, net[0])"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "collapsed": true
+   },
+   "outputs": [],
+   "source": []
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python [default]",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.6.8"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}
--- a/code/chapter10_natural-language-processing/10.6_similarity-analogy.ipynb
+++ b/code/chapter10_natural-language-processing/10.6_similarity-analogy.ipynb
@ -0,0 +1,353 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# 10.6 求近义词和类比词\n",
+    "## 10.6.1 使用预训练的词向量"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 1,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "1.0.0\n"
+     ]
+    },
+    {
+     "data": {
+      "text/plain": [
+       "dict_keys(['charngram.100d', 'fasttext.en.300d', 'fasttext.simple.300d', 'glove.42B.300d', 'glove.840B.300d', 'glove.twitter.27B.25d', 'glove.twitter.27B.50d', 'glove.twitter.27B.100d', 'glove.twitter.27B.200d', 'glove.6B.50d', 'glove.6B.100d', 'glove.6B.200d', 'glove.6B.300d'])"
+      ]
+     },
+     "execution_count": 1,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "import torch\n",
+    "import torchtext.vocab as vocab\n",
+    "\n",
+    "print(torch.__version__)\n",
+    "vocab.pretrained_aliases.keys()"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 2,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "['glove.42B.300d',\n",
+       " 'glove.840B.300d',\n",
+       " 'glove.twitter.27B.25d',\n",
+       " 'glove.twitter.27B.50d',\n",
+       " 'glove.twitter.27B.100d',\n",
+       " 'glove.twitter.27B.200d',\n",
+       " 'glove.6B.50d',\n",
+       " 'glove.6B.100d',\n",
+       " 'glove.6B.200d',\n",
+       " 'glove.6B.300d']"
+      ]
+     },
+     "execution_count": 2,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "[key for key in vocab.pretrained_aliases.keys()\n",
+    "        if \"glove\" in key]"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 3,
+   "metadata": {
+    "collapsed": true
+   },
+   "outputs": [],
+   "source": [
+    "cache_dir = \"/Users/tangshusen/Datasets/glove\"\n",
+    "# glove = vocab.pretrained_aliases[\"glove.6B.50d\"](cache=cache_dir)\n",
+    "glove = vocab.GloVe(name='6B', dim=50, cache=cache_dir) # 与上面等价"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 4,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "一共包含400000个词。\n"
+     ]
+    }
+   ],
+   "source": [
+    "print(\"一共包含%d个词。\" % len(glove.stoi))"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 5,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "(3366, 'beautiful')"
+      ]
+     },
+     "execution_count": 5,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "glove.stoi['beautiful'], glove.itos[3366]"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 10.6.2 应用预训练词向量\n",
+    "### 10.6.2.1 求近义词"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 6,
+   "metadata": {
+    "collapsed": true
+   },
+   "outputs": [],
+   "source": [
+    "def knn(W, x, k):\n",
+    "    # 添加的1e-9是为了数值稳定性\n",
+    "    cos = torch.matmul(W, x.view((-1,))) / (\n",
+    "        (torch.sum(W * W, dim=1) + 1e-9).sqrt() * torch.sum(x * x).sqrt())\n",
+    "    _, topk = torch.topk(cos, k=k)\n",
+    "    topk = topk.cpu().numpy()\n",
+    "    return topk, [cos[i].item() for i in topk]"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 7,
+   "metadata": {
+    "collapsed": true
+   },
+   "outputs": [],
+   "source": [
+    "def get_similar_tokens(query_token, k, embed):\n",
+    "    topk, cos = knn(embed.vectors,\n",
+    "                    embed.vectors[embed.stoi[query_token]], k+1)\n",
+    "    for i, c in zip(topk[1:], cos[1:]):  # 除去输入词\n",
+    "        print('cosine sim=%.3f: %s' % (c, (embed.itos[i])))"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 8,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "cosine sim=0.856: chips\n",
+      "cosine sim=0.749: intel\n",
+      "cosine sim=0.749: electronics\n"
+     ]
+    }
+   ],
+   "source": [
+    "get_similar_tokens('chip', 3, glove)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 9,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "cosine sim=0.839: babies\n",
+      "cosine sim=0.800: boy\n",
+      "cosine sim=0.792: girl\n"
+     ]
+    }
+   ],
+   "source": [
+    "get_similar_tokens('baby', 3, glove)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 10,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "cosine sim=0.921: lovely\n",
+      "cosine sim=0.893: gorgeous\n",
+      "cosine sim=0.830: wonderful\n"
+     ]
+    }
+   ],
+   "source": [
+    "get_similar_tokens('beautiful', 3, glove)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### 10.6.2.2 求类比词"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 11,
+   "metadata": {
+    "collapsed": true
+   },
+   "outputs": [],
+   "source": [
+    "def get_analogy(token_a, token_b, token_c, embed):\n",
+    "    vecs = [embed.vectors[embed.stoi[t]] \n",
+    "                for t in [token_a, token_b, token_c]]\n",
+    "    x = vecs[1] - vecs[0] + vecs[2]\n",
+    "    topk, cos = knn(embed.vectors, x, 1)\n",
+    "    return embed.itos[topk[0]]"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 12,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "'daughter'"
+      ]
+     },
+     "execution_count": 12,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "get_analogy('man', 'woman', 'son', glove)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 13,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "'japan'"
+      ]
+     },
+     "execution_count": 13,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "get_analogy('beijing', 'china', 'tokyo', glove)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 14,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "'biggest'"
+      ]
+     },
+     "execution_count": 14,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "get_analogy('bad', 'worst', 'big', glove)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 15,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "'went'"
+      ]
+     },
+     "execution_count": 15,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "get_analogy('do', 'did', 'go', glove)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "collapsed": true
+   },
+   "outputs": [],
+   "source": []
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python [default]",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.6.8"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}
--- a/code/chapter10_natural-language-processing/10.7_sentiment-analysis-rnn.ipynb
+++ b/code/chapter10_natural-language-processing/10.7_sentiment-analysis-rnn.ipynb
@ -0,0 +1,546 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# 10.7 文本情感分类：使用循环神经网络"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 1,
+   "metadata": {
+    "ExecuteTime": {
+     "end_time": "2019-07-03T04:26:23.247619Z",
+     "start_time": "2019-07-03T04:26:20.949830Z"
+    }
+   },
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "1.1.0 cuda\n"
+     ]
+    }
+   ],
+   "source": [
+    "import collections\n",
+    "import os\n",
+    "import random\n",
+    "import tarfile\n",
+    "import torch\n",
+    "from torch import nn\n",
+    "import torchtext.vocab as Vocab\n",
+    "import torch.utils.data as Data\n",
+    "\n",
+    "import sys\n",
+    "sys.path.append(\"..\") \n",
+    "import d2lzh_pytorch as d2l\n",
+    "\n",
+    "os.environ[\"CUDA_VISIBLE_DEVICES\"] = \"2\"\n",
+    "device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')\n",
+    "\n",
+    "DATA_ROOT = \"/data1/tangss/Datasets\"\n",
+    "\n",
+    "print(torch.__version__, device)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 10.7.1 文本情感分类数据\n",
+    "### 10.7.1.1 读取数据"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 2,
+   "metadata": {
+    "ExecuteTime": {
+     "end_time": "2019-07-03T04:26:23.255913Z",
+     "start_time": "2019-07-03T04:26:23.250957Z"
+    },
+    "collapsed": true
+   },
+   "outputs": [],
+   "source": [
+    "fname = os.path.join(DATA_ROOT, \"aclImdb_v1.tar.gz\")\n",
+    "if not os.path.exists(os.path.join(DATA_ROOT, \"aclImdb\")):\n",
+    "    print(\"从压缩包解压...\")\n",
+    "    with tarfile.open(fname, 'r') as f:\n",
+    "        f.extractall(DATA_ROOT)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 3,
+   "metadata": {
+    "ExecuteTime": {
+     "end_time": "2019-07-03T04:26:39.257587Z",
+     "start_time": "2019-07-03T04:26:23.258808Z"
+    }
+   },
+   "outputs": [
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "100%|██████████| 12500/12500 [00:00<00:00, 34211.42it/s]\n",
+      "100%|██████████| 12500/12500 [00:00<00:00, 38506.48it/s]\n",
+      "100%|██████████| 12500/12500 [00:00<00:00, 31316.61it/s]\n",
+      "100%|██████████| 12500/12500 [00:00<00:00, 29664.72it/s]\n"
+     ]
+    }
+   ],
+   "source": [
+    "from tqdm import tqdm\n",
+    "def read_imdb(folder='train', data_root=\"/S1/CSCL/tangss/Datasets/aclImdb\"):  # 本函数已保存在d2lzh_pytorch包中方便以后使用\n",
+    "    data = []\n",
+    "    for label in ['pos', 'neg']:\n",
+    "        folder_name = os.path.join(data_root, folder, label)\n",
+    "        for file in tqdm(os.listdir(folder_name)):\n",
+    "            with open(os.path.join(folder_name, file), 'rb') as f:\n",
+    "                review = f.read().decode('utf-8').replace('\\n', '').lower()\n",
+    "                data.append([review, 1 if label == 'pos' else 0])\n",
+    "    random.shuffle(data)\n",
+    "    return data\n",
+    "\n",
+    "data_root = os.path.join(DATA_ROOT, \"aclImdb\")\n",
+    "train_data, test_data = read_imdb('train', data_root), read_imdb('test', data_root)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### 10.7.1.2 预处理数据"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 4,
+   "metadata": {
+    "ExecuteTime": {
+     "end_time": "2019-07-03T04:26:39.262666Z",
+     "start_time": "2019-07-03T04:26:39.259588Z"
+    },
+    "collapsed": true
+   },
+   "outputs": [],
+   "source": [
+    "def get_tokenized_imdb(data):  # 本函数已保存在d2lzh_pytorch包中方便以后使用\n",
+    "    \"\"\"\n",
+    "    data: list of [string, label]\n",
+    "    \"\"\"\n",
+    "    def tokenizer(text):\n",
+    "        return [tok.lower() for tok in text.split(' ')]\n",
+    "    return [tokenizer(review) for review, _ in data]"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 5,
+   "metadata": {
+    "ExecuteTime": {
+     "end_time": "2019-07-03T04:26:42.010298Z",
+     "start_time": "2019-07-03T04:26:39.264464Z"
+    }
+   },
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "('# words in vocab:', 46152)"
+      ]
+     },
+     "execution_count": 5,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "def get_vocab_imdb(data):  # 本函数已保存在d2lzh_pytorch包中方便以后使用\n",
+    "    tokenized_data = get_tokenized_imdb(data)\n",
+    "    counter = collections.Counter([tk for st in tokenized_data for tk in st])\n",
+    "    return Vocab.Vocab(counter, min_freq=5)\n",
+    "\n",
+    "vocab = get_vocab_imdb(train_data)\n",
+    "'# words in vocab:', len(vocab)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 6,
+   "metadata": {
+    "ExecuteTime": {
+     "end_time": "2019-07-03T04:26:42.016214Z",
+     "start_time": "2019-07-03T04:26:42.012406Z"
+    },
+    "collapsed": true
+   },
+   "outputs": [],
+   "source": [
+    "def preprocess_imdb(data, vocab):  # 本函数已保存在d2lzh_torch包中方便以后使用\n",
+    "    max_l = 500  # 将每条评论通过截断或者补0，使得长度变成500\n",
+    "\n",
+    "    def pad(x):\n",
+    "        return x[:max_l] if len(x) > max_l else x + [0] * (max_l - len(x))\n",
+    "\n",
+    "    tokenized_data = get_tokenized_imdb(data)\n",
+    "    features = torch.tensor([pad([vocab.stoi[word] for word in words]) for words in tokenized_data])\n",
+    "    labels = torch.tensor([score for _, score in data])\n",
+    "    return features, labels"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### 10.7.1.3 创建数据迭代器"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 7,
+   "metadata": {
+    "ExecuteTime": {
+     "end_time": "2019-07-03T04:26:47.614720Z",
+     "start_time": "2019-07-03T04:26:42.017922Z"
+    },
+    "collapsed": true
+   },
+   "outputs": [],
+   "source": [
+    "batch_size = 64\n",
+    "train_set = Data.TensorDataset(*preprocess_imdb(train_data, vocab))\n",
+    "test_set = Data.TensorDataset(*preprocess_imdb(test_data, vocab))\n",
+    "train_iter = Data.DataLoader(train_set, batch_size, shuffle=True)\n",
+    "test_iter = Data.DataLoader(test_set, batch_size)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 8,
+   "metadata": {
+    "ExecuteTime": {
+     "end_time": "2019-07-03T04:26:47.624512Z",
+     "start_time": "2019-07-03T04:26:47.616891Z"
+    }
+   },
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "X torch.Size([64, 500]) y torch.Size([64])\n"
+     ]
+    },
+    {
+     "data": {
+      "text/plain": [
+       "('#batches:', 391)"
+      ]
+     },
+     "execution_count": 8,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "for X, y in train_iter:\n",
+    "    print('X', X.shape, 'y', y.shape)\n",
+    "    break\n",
+    "'#batches:', len(train_iter)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 10.7.2 使用循环神经网络的模型"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 9,
+   "metadata": {
+    "ExecuteTime": {
+     "end_time": "2019-07-03T04:26:47.630109Z",
+     "start_time": "2019-07-03T04:26:47.625789Z"
+    },
+    "collapsed": true
+   },
+   "outputs": [],
+   "source": [
+    "class BiRNN(nn.Module):\n",
+    "    def __init__(self, vocab, embed_size, num_hiddens, num_layers):\n",
+    "        super(BiRNN, self).__init__()\n",
+    "        self.embedding = nn.Embedding(len(vocab), embed_size)\n",
+    "        \n",
+    "        # bidirectional设为True即得到双向循环神经网络\n",
+    "        self.encoder = nn.LSTM(input_size=embed_size, \n",
+    "                                hidden_size=num_hiddens, \n",
+    "                                num_layers=num_layers,\n",
+    "                                bidirectional=True)\n",
+    "        self.decoder = nn.Linear(4*num_hiddens, 2) # 初始时间步和最终时间步的隐藏状态作为全连接层输入\n",
+    "\n",
+    "    def forward(self, inputs):\n",
+    "        # inputs的形状是(批量大小, 词数)，因为LSTM需要将序列长度(seq_len)作为第一维，所以将输入转置后\n",
+    "        # 再提取词特征，输出形状为(词数, 批量大小, 词向量维度)\n",
+    "        embeddings = self.embedding(inputs.permute(1, 0))\n",
+    "        # rnn.LSTM只传入输入embeddings，因此只返回最后一层的隐藏层在各时间步的隐藏状态。\n",
+    "        # outputs形状是(词数, 批量大小, 2 * 隐藏单元个数)\n",
+    "        outputs, _ = self.encoder(embeddings) # output, (h, c)\n",
+    "        # 连结初始时间步和最终时间步的隐藏状态作为全连接层输入。它的形状为\n",
+    "        # (批量大小, 4 * 隐藏单元个数)。\n",
+    "        encoding = torch.cat((outputs[0], outputs[-1]), -1)\n",
+    "        outs = self.decoder(encoding)\n",
+    "        return outs"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 10,
+   "metadata": {
+    "ExecuteTime": {
+     "end_time": "2019-07-03T04:26:47.684133Z",
+     "start_time": "2019-07-03T04:26:47.631441Z"
+    },
+    "collapsed": true
+   },
+   "outputs": [],
+   "source": [
+    "embed_size, num_hiddens, num_layers = 100, 100, 2\n",
+    "net = BiRNN(vocab, embed_size, num_hiddens, num_layers)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### 10.7.2.1 加载预训练的词向量"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 11,
+   "metadata": {
+    "ExecuteTime": {
+     "end_time": "2019-07-03T04:26:47.895604Z",
+     "start_time": "2019-07-03T04:26:47.685801Z"
+    }
+   },
+   "outputs": [],
+   "source": [
+    "glove_vocab = Vocab.GloVe(name='6B', dim=100, cache=os.path.join(DATA_ROOT, \"glove\"))"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 12,
+   "metadata": {
+    "ExecuteTime": {
+     "end_time": "2019-07-03T04:26:48.102388Z",
+     "start_time": "2019-07-03T04:26:47.897582Z"
+    }
+   },
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "There are 21202 oov words.\n"
+     ]
+    }
+   ],
+   "source": [
+    "def load_pretrained_embedding(words, pretrained_vocab):\n",
+    "    \"\"\"从预训练好的vocab中提取出words对应的词向量\"\"\"\n",
+    "    embed = torch.zeros(len(words), pretrained_vocab.vectors[0].shape[0]) # 初始化为0\n",
+    "    oov_count = 0 # out of vocabulary\n",
+    "    for i, word in enumerate(words):\n",
+    "        try:\n",
+    "            idx = pretrained_vocab.stoi[word]\n",
+    "            embed[i, :] = pretrained_vocab.vectors[idx]\n",
+    "        except KeyError:\n",
+    "            oov_count += 1\n",
+    "    if oov_count > 0:\n",
+    "        print(\"There are %d oov words.\" % oov_count)\n",
+    "    return embed\n",
+    "\n",
+    "net.embedding.weight.data.copy_(load_pretrained_embedding(vocab.itos, glove_vocab))\n",
+    "net.embedding.weight.requires_grad = False # 直接加载预训练好的, 所以不需要更新它"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### 10.7.2.2 训练并评价模型"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 13,
+   "metadata": {
+    "ExecuteTime": {
+     "end_time": "2019-07-03T04:47:57.808046Z",
+     "start_time": "2019-07-03T04:26:48.104185Z"
+    }
+   },
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "training on  cuda\n",
+      "epoch 1, loss 0.5415, train acc 0.719, test acc 0.819, time 48.7 sec\n",
+      "epoch 2, loss 0.1897, train acc 0.837, test acc 0.852, time 53.0 sec\n",
+      "epoch 3, loss 0.1105, train acc 0.857, test acc 0.844, time 51.6 sec\n",
+      "epoch 4, loss 0.0719, train acc 0.881, test acc 0.865, time 52.1 sec\n",
+      "epoch 5, loss 0.0519, train acc 0.894, test acc 0.852, time 51.2 sec\n"
+     ]
+    }
+   ],
+   "source": [
+    "lr, num_epochs = 0.01, 5\n",
+    "optimizer = torch.optim.Adam(filter(lambda p: p.requires_grad, net.parameters()), lr=lr)\n",
+    "loss = nn.CrossEntropyLoss()\n",
+    "d2l.train(train_iter, test_iter, net, loss, optimizer, device, num_epochs)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 14,
+   "metadata": {
+    "ExecuteTime": {
+     "end_time": "2019-07-03T04:47:57.813888Z",
+     "start_time": "2019-07-03T04:47:57.810244Z"
+    },
+    "collapsed": true
+   },
+   "outputs": [],
+   "source": [
+    "# 本函数已保存在d2lzh包中方便以后使用\n",
+    "def predict_sentiment(net, vocab, sentence):\n",
+    "    \"\"\"sentence是词语的列表\"\"\"\n",
+    "    device = list(net.parameters())[0].device\n",
+    "    sentence = torch.tensor([vocab.stoi[word] for word in sentence], device=device)\n",
+    "    label = torch.argmax(net(sentence.view((1, -1))), dim=1)\n",
+    "    return 'positive' if label.item() == 1 else 'negative'"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 15,
+   "metadata": {
+    "ExecuteTime": {
+     "end_time": "2019-07-03T04:47:57.829262Z",
+     "start_time": "2019-07-03T04:47:57.815487Z"
+    }
+   },
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "'positive'"
+      ]
+     },
+     "execution_count": 15,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "predict_sentiment(net, vocab, ['this', 'movie', 'is', 'so', 'great'])"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 16,
+   "metadata": {
+    "ExecuteTime": {
+     "end_time": "2019-07-03T04:47:57.838439Z",
+     "start_time": "2019-07-03T04:47:57.830707Z"
+    }
+   },
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "'negative'"
+      ]
+     },
+     "execution_count": 16,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "predict_sentiment(net, vocab, ['this', 'movie', 'is', 'so', 'bad'])"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {
+    "collapsed": true
+   },
+   "outputs": [],
+   "source": []
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python [conda env:py36_pytorch]",
+   "language": "python",
+   "name": "conda-env-py36_pytorch-py"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.6.2"
+  },
+  "varInspector": {
+   "cols": {
+    "lenName": 16,
+    "lenType": 16,
+    "lenVar": 40
+   },
+   "kernels_config": {
+    "python": {
+     "delete_cmd_postfix": "",
+     "delete_cmd_prefix": "del ",
+     "library": "var_list.py",
+     "varRefreshCmd": "print(var_dic_list())"
+    },
+    "r": {
+     "delete_cmd_postfix": ") ",
+     "delete_cmd_prefix": "rm(",
+     "library": "var_list.r",
+     "varRefreshCmd": "cat(var_dic_list()) "
+    }
+   },
+   "types_to_exclude": [
+    "module",
+    "function",
+    "builtin_function_or_method",
+    "instance",
+    "_Feature"
+   ],
+   "window_display": false
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}
--- a/code/chapter10_natural-language-processing/10.8_sentiment-analysis-cnn.ipynb
+++ b/code/chapter10_natural-language-processing/10.8_sentiment-analysis-cnn.ipynb
@ -0,0 +1,425 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# 10.8 文本情感分类：使用卷积神经网络（textCNN）"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 1,
+   "metadata": {
+    "ExecuteTime": {
+     "end_time": "2019-07-04T15:24:30.611583Z",
+     "start_time": "2019-07-04T15:24:28.120724Z"
+    }
+   },
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "1.0.0 cuda\n"
+     ]
+    }
+   ],
+   "source": [
+    "import os\n",
+    "import torch\n",
+    "from torch import nn\n",
+    "import torchtext.vocab as Vocab\n",
+    "import torch.utils.data as Data\n",
+    "import  torch.nn.functional as F\n",
+    "\n",
+    "import sys\n",
+    "sys.path.append(\"..\") \n",
+    "import d2lzh_pytorch as d2l\n",
+    "\n",
+    "os.environ[\"CUDA_VISIBLE_DEVICES\"] = \"0\"\n",
+    "device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')\n",
+    "\n",
+    "DATA_ROOT = \"/S1/CSCL/tangss/Datasets\"\n",
+    "print(torch.__version__, device)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 10.8.1 一维卷积层"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 2,
+   "metadata": {
+    "ExecuteTime": {
+     "end_time": "2019-07-04T15:24:30.618608Z",
+     "start_time": "2019-07-04T15:24:30.614302Z"
+    }
+   },
+   "outputs": [],
+   "source": [
+    "def corr1d(X, K):\n",
+    "    w = K.shape[0]\n",
+    "    Y = torch.zeros((X.shape[0] - w + 1))\n",
+    "    for i in range(Y.shape[0]):\n",
+    "        Y[i] = (X[i: i + w] * K).sum()\n",
+    "    return Y"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 3,
+   "metadata": {
+    "ExecuteTime": {
+     "end_time": "2019-07-04T15:24:30.634912Z",
+     "start_time": "2019-07-04T15:24:30.621140Z"
+    }
+   },
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "tensor([ 2.,  5.,  8., 11., 14., 17.])"
+      ]
+     },
+     "execution_count": 3,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "X, K = torch.tensor([0, 1, 2, 3, 4, 5, 6]), torch.tensor([1, 2])\n",
+    "corr1d(X, K)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 4,
+   "metadata": {
+    "ExecuteTime": {
+     "end_time": "2019-07-04T15:24:30.645344Z",
+     "start_time": "2019-07-04T15:24:30.637083Z"
+    }
+   },
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "tensor([ 2.,  8., 14., 20., 26., 32.])"
+      ]
+     },
+     "execution_count": 4,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "def corr1d_multi_in(X, K):\n",
+    "    # 首先沿着X和K的第0维（通道维）遍历并计算一维互相关结果。然后将所有结果堆叠起来沿第0维累加\n",
+    "    return torch.stack([corr1d(x, k) for x, k in zip(X, K)]).sum(dim=0)\n",
+    "\n",
+    "X = torch.tensor([[0, 1, 2, 3, 4, 5, 6],\n",
+    "              [1, 2, 3, 4, 5, 6, 7],\n",
+    "              [2, 3, 4, 5, 6, 7, 8]])\n",
+    "K = torch.tensor([[1, 2], [3, 4], [-1, -3]])\n",
+    "corr1d_multi_in(X, K)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 10.8.2 时序最大池化层"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 5,
+   "metadata": {
+    "ExecuteTime": {
+     "end_time": "2019-07-04T15:24:30.650834Z",
+     "start_time": "2019-07-04T15:24:30.647333Z"
+    }
+   },
+   "outputs": [],
+   "source": [
+    "class GlobalMaxPool1d(nn.Module):\n",
+    "    def __init__(self):\n",
+    "        super(GlobalMaxPool1d, self).__init__()\n",
+    "    def forward(self, x):\n",
+    "         # x shape: (batch_size, channel, seq_len)\n",
+    "        return F.max_pool1d(x, kernel_size=x.shape[2]) # shape: (batch_size, channel, 1)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 10.8.3 读取和预处理IMDb数据集"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 6,
+   "metadata": {
+    "ExecuteTime": {
+     "end_time": "2019-07-04T15:24:58.666425Z",
+     "start_time": "2019-07-04T15:24:30.652855Z"
+    }
+   },
+   "outputs": [
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "100%|██████████| 12500/12500 [00:02<00:00, 4376.39it/s]\n",
+      "100%|██████████| 12500/12500 [00:02<00:00, 4834.11it/s]\n",
+      "100%|██████████| 12500/12500 [00:02<00:00, 4556.64it/s]\n",
+      "100%|██████████| 12500/12500 [00:11<00:00, 1076.09it/s]\n"
+     ]
+    }
+   ],
+   "source": [
+    "batch_size = 64\n",
+    "train_data = d2l.read_imdb('train', data_root=os.path.join(DATA_ROOT, \"aclImdb\"))\n",
+    "test_data = d2l.read_imdb('test', data_root=os.path.join(DATA_ROOT, \"aclImdb\"))\n",
+    "vocab = d2l.get_vocab_imdb(train_data)\n",
+    "train_set = Data.TensorDataset(*d2l.preprocess_imdb(train_data, vocab))\n",
+    "test_set = Data.TensorDataset(*d2l.preprocess_imdb(test_data, vocab))\n",
+    "train_iter = Data.DataLoader(train_set, batch_size, shuffle=True)\n",
+    "test_iter = Data.DataLoader(test_set, batch_size)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## 10.8.4 textCNN模型"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 7,
+   "metadata": {
+    "ExecuteTime": {
+     "end_time": "2019-07-04T15:24:58.674283Z",
+     "start_time": "2019-07-04T15:24:58.668832Z"
+    }
+   },
+   "outputs": [],
+   "source": [
+    "class TextCNN(nn.Module):\n",
+    "    def __init__(self, vocab, embed_size, kernel_sizes, num_channels):\n",
+    "        super(TextCNN, self).__init__()\n",
+    "        self.embedding = nn.Embedding(len(vocab), embed_size)\n",
+    "        # 不参与训练的嵌入层\n",
+    "        self.constant_embedding = nn.Embedding(len(vocab), embed_size)\n",
+    "        self.dropout = nn.Dropout(0.5)\n",
+    "        self.decoder = nn.Linear(sum(num_channels), 2)\n",
+    "        # 时序最大池化层没有权重，所以可以共用一个实例\n",
+    "        self.pool = GlobalMaxPool1d()\n",
+    "        self.convs = nn.ModuleList()  # 创建多个一维卷积层\n",
+    "        for c, k in zip(num_channels, kernel_sizes):\n",
+    "            self.convs.append(nn.Conv1d(in_channels = 2*embed_size, \n",
+    "                                        out_channels = c, \n",
+    "                                        kernel_size = k))\n",
+    "\n",
+    "    def forward(self, inputs):\n",
+    "        # 将两个形状是(批量大小, 词数, 词向量维度)的嵌入层的输出按词向量连结\n",
+    "        embeddings = torch.cat((\n",
+    "            self.embedding(inputs), \n",
+    "            self.constant_embedding(inputs)), dim=2) # (batch, seq_len, 2*embed_size)\n",
+    "        # 根据Conv1D要求的输入格式，将词向量维，即一维卷积层的通道维(即词向量那一维)，变换到前一维\n",
+    "        embeddings = embeddings.permute(0, 2, 1)\n",
+    "        # 对于每个一维卷积层，在时序最大池化后会得到一个形状为(批量大小, 通道大小, 1)的\n",
+    "        # Tensor。使用flatten函数去掉最后一维，然后在通道维上连结\n",
+    "        encoding = torch.cat([self.pool(F.relu(conv(embeddings))).squeeze(-1) for conv in self.convs], dim=1)\n",
+    "        # 应用丢弃法后使用全连接层得到输出\n",
+    "        outputs = self.decoder(self.dropout(encoding))\n",
+    "        return outputs"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 8,
+   "metadata": {
+    "ExecuteTime": {
+     "end_time": "2019-07-04T15:24:58.764854Z",
+     "start_time": "2019-07-04T15:24:58.675824Z"
+    }
+   },
+   "outputs": [],
+   "source": [
+    "embed_size, kernel_sizes, nums_channels = 100, [3, 4, 5], [100, 100, 100]\n",
+    "net = TextCNN(vocab, embed_size, kernel_sizes, nums_channels)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### 10.8.4.1 加载预训练的词向量"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 9,
+   "metadata": {
+    "ExecuteTime": {
+     "end_time": "2019-07-04T15:25:00.616142Z",
+     "start_time": "2019-07-04T15:24:58.766569Z"
+    }
+   },
+   "outputs": [],
+   "source": [
+    "glove_vocab = Vocab.GloVe(name='6B', dim=100, cache=os.path.join(DATA_ROOT, \"glove\"))\n",
+    "net.embedding.weight.data.copy_(\n",
+    "    d2l.load_pretrained_embedding(vocab.itos, glove_vocab))\n",
+    "net.constant_embedding.weight.data.copy_(\n",
+    "    d2l.load_pretrained_embedding(vocab.itos, glove_vocab))\n",
+    "net.constant_embedding.weight.requires_grad = False"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### 10.8.4.2 训练并评价模型"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 10,
+   "metadata": {
+    "ExecuteTime": {
+     "end_time": "2019-07-04T15:28:36.938512Z",
+     "start_time": "2019-07-04T15:25:00.618194Z"
+    }
+   },
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "training on  cuda\n",
+      "epoch 1, loss 0.4811, train acc 0.762, test acc 0.848, time 42.6 sec\n",
+      "epoch 2, loss 0.1601, train acc 0.864, test acc 0.869, time 42.3 sec\n",
+      "epoch 3, loss 0.0714, train acc 0.915, test acc 0.879, time 42.3 sec\n",
+      "epoch 4, loss 0.0289, train acc 0.958, test acc 0.867, time 42.3 sec\n",
+      "epoch 5, loss 0.0124, train acc 0.979, test acc 0.861, time 42.3 sec\n"
+     ]
+    }
+   ],
+   "source": [
+    "lr, num_epochs = 0.001, 5\n",
+    "optimizer = torch.optim.Adam(filter(lambda p: p.requires_grad, net.parameters()), lr=lr)\n",
+    "loss = nn.CrossEntropyLoss()\n",
+    "d2l.train(train_iter, test_iter, net, loss, optimizer, device, num_epochs)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 11,
+   "metadata": {
+    "ExecuteTime": {
+     "end_time": "2019-07-04T15:28:36.945999Z",
+     "start_time": "2019-07-04T15:28:36.940672Z"
+    }
+   },
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "'positive'"
+      ]
+     },
+     "execution_count": 11,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "d2l.predict_sentiment(net, vocab, ['this', 'movie', 'is', 'so', 'great'])"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 12,
+   "metadata": {
+    "ExecuteTime": {
+     "end_time": "2019-07-04T15:28:36.954105Z",
+     "start_time": "2019-07-04T15:28:36.947516Z"
+    }
+   },
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "'negative'"
+      ]
+     },
+     "execution_count": 12,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "d2l.predict_sentiment(net, vocab, ['this', 'movie', 'is', 'so', 'bad'])"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python [conda env:py36]",
+   "language": "python",
+   "name": "conda-env-py36-py"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.6.3"
+  },
+  "varInspector": {
+   "cols": {
+    "lenName": 16,
+    "lenType": 16,
+    "lenVar": 40
+   },
+   "kernels_config": {
+    "python": {
+     "delete_cmd_postfix": "",
+     "delete_cmd_prefix": "del ",
+     "library": "var_list.py",
+     "varRefreshCmd": "print(var_dic_list())"
+    },
+    "r": {
+     "delete_cmd_postfix": ") ",
+     "delete_cmd_prefix": "rm(",
+     "library": "var_list.r",
+     "varRefreshCmd": "cat(var_dic_list()) "
+    }
+   },
+   "types_to_exclude": [
+    "module",
+    "function",
+    "builtin_function_or_method",
+    "instance",
+    "_Feature"
+   ],
+   "window_display": false
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}
--- a/code/d2lzh_pytorch/init.py
+++ b/code/d2lzh_pytorch/init.py
@ -0,0 +1,2 @@
+from .utils import *
+
--- a/code/d2lzh_pytorch/utils.py
+++ b/code/d2lzh_pytorch/utils.py
--- a/code/minist/Minist.py
+++ b/code/minist/Minist.py
@ -0,0 +1,170 @@
+import torch
+import torchvision
+from torch.utils.data import DataLoader
+import torch.nn as nn
+import torch.nn.functional as F
+import torch.optim as optim
+import matplotlib.pyplot as plt
+
+n_epochs = 3
+batch_size_train = 64
+batch_size_test = 1000
+learning_rate = 0.01
+momentum = 0.5
+log_interval = 10
+random_seed = 1
+
+torch.manual_seed(random_seed)
+
+
+train_loader = torch.utils.data.DataLoader(
+  torchvision.datasets.MNIST('./data/', train=True, download=True,
+                             transform=torchvision.transforms.Compose([
+                               torchvision.transforms.ToTensor(),
+                               torchvision.transforms.Normalize(
+                                 (0.1307,), (0.3081,))
+                             ])),
+  batch_size=batch_size_train, shuffle=True)
+test_loader = torch.utils.data.DataLoader(
+  torchvision.datasets.MNIST('./data/', train=False, download=True,
+                             transform=torchvision.transforms.Compose([
+                               torchvision.transforms.ToTensor(),
+                               torchvision.transforms.Normalize(
+                                 (0.1307,), (0.3081,))
+                             ])),
+  batch_size=batch_size_test, shuffle=True)
+
+examples = enumerate(test_loader)
+batch_idx, (example_data, example_targets) = next(examples)
+print(example_targets)
+print(example_data.shape)
+
+
+fig = plt.figure()
+for i in range(6):
+  plt.subplot(2,3,i+1)
+  plt.tight_layout()
+  plt.imshow(example_data[i][0], cmap='gray', interpolation='none')
+  plt.title("Ground Truth: {}".format(example_targets[i]))
+  plt.xticks([])
+  plt.yticks([])
+plt.show()
+
+
+
+
+class Net(nn.Module):
+    def __init__(self):
+        super(Net, self).__init__()
+        self.conv1 = nn.Conv2d(1, 10, kernel_size=5)
+        self.conv2 = nn.Conv2d(10, 20, kernel_size=5)
+        self.conv2_drop = nn.Dropout2d()
+        self.fc1 = nn.Linear(320, 50)
+        self.fc2 = nn.Linear(50, 10)
+    def forward(self, x):
+        x = F.relu(F.max_pool2d(self.conv1(x), 2))
+        x = F.relu(F.max_pool2d(self.conv2_drop(self.conv2(x)), 2))
+        x = x.view(-1, 320)
+        x = F.relu(self.fc1(x))
+        x = F.dropout(x, training=self.training)
+        x = self.fc2(x)
+        return F.log_softmax(x)
+
+def train(epoch):
+  network.train()
+  for batch_idx, (data, target) in enumerate(train_loader):
+    optimizer.zero_grad()
+    output = network(data)
+    loss = F.nll_loss(output, target)
+    loss.backward()
+    optimizer.step()
+    if batch_idx % log_interval == 0:
+      print('Train Epoch: {} [{}/{} ({:.0f}%)]\tLoss: {:.6f}'.format(
+        epoch, batch_idx * len(data), len(train_loader.dataset),
+        100. * batch_idx / len(train_loader), loss.item()))
+      train_losses.append(loss.item())
+      train_counter.append(
+        (batch_idx*64) + ((epoch-1)*len(train_loader.dataset)))
+      torch.save(network.state_dict(), './model.pth')
+      torch.save(optimizer.state_dict(), './optimizer.pth')
+      
+#train(1)
+
+def test():
+  network.eval()
+  test_loss = 0
+  correct = 0
+  with torch.no_grad():
+    for data, target in test_loader:
+      output = network(data)
+      test_loss += F.nll_loss(output, target, size_average=False).item()
+      pred = output.data.max(1, keepdim=True)[1]
+      correct += pred.eq(target.data.view_as(pred)).sum()
+  test_loss /= len(test_loader.dataset)
+  test_losses.append(test_loss)
+  print('\nTest set: Avg. loss: {:.4f}, Accuracy: {}/{} ({:.0f}%)\n'.format(
+    test_loss, correct, len(test_loader.dataset),
+    100. * correct / len(test_loader.dataset)))
+
+network = Net()
+optimizer = optim.SGD(network.parameters(), lr=learning_rate,
+                      momentum=momentum)
+
+train_losses = []
+train_counter = []
+test_losses = []
+test_counter = [i*len(train_loader.dataset) for i in range(n_epochs + 1)]
+
+
+
+for epoch in range(1, n_epochs + 1):
+  train(epoch)
+  test()
+  
+
+fig = plt.figure()
+plt.plot(train_counter, train_losses, color='blue')
+plt.scatter(test_counter, test_losses, color='red')
+plt.legend(['Train Loss', 'Test Loss'], loc='upper right')
+plt.xlabel('number of training examples seen')
+plt.ylabel('negative log likelihood loss')
+plt.show()
+
+
+examples = enumerate(test_loader)
+batch_idx, (example_data, example_targets) = next(examples)
+with torch.no_grad():
+  output = network(example_data)
+fig = plt.figure()
+for i in range(6):
+  plt.subplot(2,3,i+1)
+  plt.tight_layout()
+  plt.imshow(example_data[i][0], cmap='gray', interpolation='none')
+  plt.title("Prediction: {}".format(
+    output.data.max(1, keepdim=True)[1][i].item()))
+  plt.xticks([])
+  plt.yticks([])
+plt.show()
+
+
+continued_network = Net()
+continued_optimizer = optim.SGD(network.parameters(), lr=learning_rate,
+                                momentum=momentum)
+
+network_state_dict = torch.load('model.pth')
+continued_network.load_state_dict(network_state_dict)
+optimizer_state_dict = torch.load('optimizer.pth')
+continued_optimizer.load_state_dict(optimizer_state_dict)
+
+for i in range(4,9):
+  test_counter.append(i*len(train_loader.dataset))
+  train(i)
+  test()
+  
+fig = plt.figure()
+plt.plot(train_counter, train_losses, color='blue')
+plt.scatter(test_counter, test_losses, color='red')
+plt.legend(['Train Loss', 'Test Loss'], loc='upper right')
+plt.xlabel('number of training examples seen')
+plt.ylabel('negative log likelihood loss')
+plt.show()
--- a/code/minist/data/MNIST/raw/train-images-idx3-ubyte.gz
+++ b/code/minist/data/MNIST/raw/train-images-idx3-ubyte.gz
--- a/data/MNIST/processed/test.pt
+++ b/data/MNIST/processed/test.pt
--- a/data/MNIST/processed/training.pt
+++ b/data/MNIST/processed/training.pt
--- a/data/MNIST/raw/t10k-images-idx3-ubyte
+++ b/data/MNIST/raw/t10k-images-idx3-ubyte
--- a/data/MNIST/raw/t10k-images-idx3-ubyte.gz
+++ b/data/MNIST/raw/t10k-images-idx3-ubyte.gz
--- a/data/MNIST/raw/t10k-labels-idx1-ubyte
+++ b/data/MNIST/raw/t10k-labels-idx1-ubyte
--- a/data/MNIST/raw/t10k-labels-idx1-ubyte.gz
+++ b/data/MNIST/raw/t10k-labels-idx1-ubyte.gz
--- a/data/MNIST/raw/train-images-idx3-ubyte
+++ b/data/MNIST/raw/train-images-idx3-ubyte
--- a/data/MNIST/raw/train-images-idx3-ubyte.gz
+++ b/data/MNIST/raw/train-images-idx3-ubyte.gz
--- a/data/MNIST/raw/train-labels-idx1-ubyte
+++ b/data/MNIST/raw/train-labels-idx1-ubyte
--- a/data/MNIST/raw/train-labels-idx1-ubyte.gz
+++ b/data/MNIST/raw/train-labels-idx1-ubyte.gz
--- a/data/airfoil_self_noise.dat
+++ b/data/airfoil_self_noise.dat
--- a/data/autumn_oak.jpg
+++ b/data/autumn_oak.jpg
--- a/data/fr-en-small.txt
+++ b/data/fr-en-small.txt
@ -0,0 +1,20 @@
+elle est vieille .	she is old .
+elle est tranquille .	she is quiet .
+elle a tort .	she is wrong .
+elle est canadienne .	she is canadian .
+elle est japonaise .	she is japanese .
+ils sont russes .	they are russian .
+ils se disputent .	they are arguing .
+ils regardent .	they are watching .
+ils sont acteurs .	they are actors .
+elles sont crevees .	they are exhausted .
+il est mon genre !	he is my type !
+il a des ennuis .	he is in trouble .
+c est mon frere .	he is my brother .
+c est mon oncle .	he is my uncle .
+il a environ mon age .	he is about my age .
+elles sont toutes deux bonnes .	they are both good .
+elle est bonne nageuse .	she is a good swimmer .
+c est une personne adorable .	he is a lovable person .
+il fait du velo .	he is riding a bicycle .
+ils sont de grands amis .	they are great friends .
--- a/data/jaychou_lyrics.txt.zip
+++ b/data/jaychou_lyrics.txt.zip
--- a/data/kaggle_house/data_description.txt
+++ b/data/kaggle_house/data_description.txt
@ -0,0 +1,523 @@
+MSSubClass: Identifies the type of dwelling involved in the sale.	
+
+        20	1-STORY 1946 & NEWER ALL STYLES
+        30	1-STORY 1945 & OLDER
+        40	1-STORY W/FINISHED ATTIC ALL AGES
+        45	1-1/2 STORY - UNFINISHED ALL AGES
+        50	1-1/2 STORY FINISHED ALL AGES
+        60	2-STORY 1946 & NEWER
+        70	2-STORY 1945 & OLDER
+        75	2-1/2 STORY ALL AGES
+        80	SPLIT OR MULTI-LEVEL
+        85	SPLIT FOYER
+        90	DUPLEX - ALL STYLES AND AGES
+       120	1-STORY PUD (Planned Unit Development) - 1946 & NEWER
+       150	1-1/2 STORY PUD - ALL AGES
+       160	2-STORY PUD - 1946 & NEWER
+       180	PUD - MULTILEVEL - INCL SPLIT LEV/FOYER
+       190	2 FAMILY CONVERSION - ALL STYLES AND AGES
+
+MSZoning: Identifies the general zoning classification of the sale.
+		
+       A	Agriculture
+       C	Commercial
+       FV	Floating Village Residential
+       I	Industrial
+       RH	Residential High Density
+       RL	Residential Low Density
+       RP	Residential Low Density Park 
+       RM	Residential Medium Density
+	
+LotFrontage: Linear feet of street connected to property
+
+LotArea: Lot size in square feet
+
+Street: Type of road access to property
+
+       Grvl	Gravel	
+       Pave	Paved
+       	
+Alley: Type of alley access to property
+
+       Grvl	Gravel
+       Pave	Paved
+       NA 	No alley access
+		
+LotShape: General shape of property
+
+       Reg	Regular	
+       IR1	Slightly irregular
+       IR2	Moderately Irregular
+       IR3	Irregular
+       
+LandContour: Flatness of the property
+
+       Lvl	Near Flat/Level	
+       Bnk	Banked - Quick and significant rise from street grade to building
+       HLS	Hillside - Significant slope from side to side
+       Low	Depression
+		
+Utilities: Type of utilities available
+		
+       AllPub	All public Utilities (E,G,W,& S)	
+       NoSewr	Electricity, Gas, and Water (Septic Tank)
+       NoSeWa	Electricity and Gas Only
+       ELO	Electricity only	
+	
+LotConfig: Lot configuration
+
+       Inside	Inside lot
+       Corner	Corner lot
+       CulDSac	Cul-de-sac
+       FR2	Frontage on 2 sides of property
+       FR3	Frontage on 3 sides of property
+	
+LandSlope: Slope of property
+		
+       Gtl	Gentle slope
+       Mod	Moderate Slope	
+       Sev	Severe Slope
+	
+Neighborhood: Physical locations within Ames city limits
+
+       Blmngtn	Bloomington Heights
+       Blueste	Bluestem
+       BrDale	Briardale
+       BrkSide	Brookside
+       ClearCr	Clear Creek
+       CollgCr	College Creek
+       Crawfor	Crawford
+       Edwards	Edwards
+       Gilbert	Gilbert
+       IDOTRR	Iowa DOT and Rail Road
+       MeadowV	Meadow Village
+       Mitchel	Mitchell
+       Names	North Ames
+       NoRidge	Northridge
+       NPkVill	Northpark Villa
+       NridgHt	Northridge Heights
+       NWAmes	Northwest Ames
+       OldTown	Old Town
+       SWISU	South & West of Iowa State University
+       Sawyer	Sawyer
+       SawyerW	Sawyer West
+       Somerst	Somerset
+       StoneBr	Stone Brook
+       Timber	Timberland
+       Veenker	Veenker
+			
+Condition1: Proximity to various conditions
+	
+       Artery	Adjacent to arterial street
+       Feedr	Adjacent to feeder street	
+       Norm	Normal	
+       RRNn	Within 200' of North-South Railroad
+       RRAn	Adjacent to North-South Railroad
+       PosN	Near positive off-site feature--park, greenbelt, etc.
+       PosA	Adjacent to postive off-site feature
+       RRNe	Within 200' of East-West Railroad
+       RRAe	Adjacent to East-West Railroad
+	
+Condition2: Proximity to various conditions (if more than one is present)
+		
+       Artery	Adjacent to arterial street
+       Feedr	Adjacent to feeder street	
+       Norm	Normal	
+       RRNn	Within 200' of North-South Railroad
+       RRAn	Adjacent to North-South Railroad
+       PosN	Near positive off-site feature--park, greenbelt, etc.
+       PosA	Adjacent to postive off-site feature
+       RRNe	Within 200' of East-West Railroad
+       RRAe	Adjacent to East-West Railroad
+	
+BldgType: Type of dwelling
+		
+       1Fam	Single-family Detached	
+       2FmCon	Two-family Conversion; originally built as one-family dwelling
+       Duplx	Duplex
+       TwnhsE	Townhouse End Unit
+       TwnhsI	Townhouse Inside Unit
+	
+HouseStyle: Style of dwelling
+	
+       1Story	One story
+       1.5Fin	One and one-half story: 2nd level finished
+       1.5Unf	One and one-half story: 2nd level unfinished
+       2Story	Two story
+       2.5Fin	Two and one-half story: 2nd level finished
+       2.5Unf	Two and one-half story: 2nd level unfinished
+       SFoyer	Split Foyer
+       SLvl	Split Level
+	
+OverallQual: Rates the overall material and finish of the house
+
+       10	Very Excellent
+       9	Excellent
+       8	Very Good
+       7	Good
+       6	Above Average
+       5	Average
+       4	Below Average
+       3	Fair
+       2	Poor
+       1	Very Poor
+	
+OverallCond: Rates the overall condition of the house
+
+       10	Very Excellent
+       9	Excellent
+       8	Very Good
+       7	Good
+       6	Above Average	
+       5	Average
+       4	Below Average	
+       3	Fair
+       2	Poor
+       1	Very Poor
+		
+YearBuilt: Original construction date
+
+YearRemodAdd: Remodel date (same as construction date if no remodeling or additions)
+
+RoofStyle: Type of roof
+
+       Flat	Flat
+       Gable	Gable
+       Gambrel	Gabrel (Barn)
+       Hip	Hip
+       Mansard	Mansard
+       Shed	Shed
+		
+RoofMatl: Roof material
+
+       ClyTile	Clay or Tile
+       CompShg	Standard (Composite) Shingle
+       Membran	Membrane
+       Metal	Metal
+       Roll	Roll
+       Tar&Grv	Gravel & Tar
+       WdShake	Wood Shakes
+       WdShngl	Wood Shingles
+		
+Exterior1st: Exterior covering on house
+
+       AsbShng	Asbestos Shingles
+       AsphShn	Asphalt Shingles
+       BrkComm	Brick Common
+       BrkFace	Brick Face
+       CBlock	Cinder Block
+       CemntBd	Cement Board
+       HdBoard	Hard Board
+       ImStucc	Imitation Stucco
+       MetalSd	Metal Siding
+       Other	Other
+       Plywood	Plywood
+       PreCast	PreCast	
+       Stone	Stone
+       Stucco	Stucco
+       VinylSd	Vinyl Siding
+       Wd Sdng	Wood Siding
+       WdShing	Wood Shingles
+	
+Exterior2nd: Exterior covering on house (if more than one material)
+
+       AsbShng	Asbestos Shingles
+       AsphShn	Asphalt Shingles
+       BrkComm	Brick Common
+       BrkFace	Brick Face
+       CBlock	Cinder Block
+       CemntBd	Cement Board
+       HdBoard	Hard Board
+       ImStucc	Imitation Stucco
+       MetalSd	Metal Siding
+       Other	Other
+       Plywood	Plywood
+       PreCast	PreCast
+       Stone	Stone
+       Stucco	Stucco
+       VinylSd	Vinyl Siding
+       Wd Sdng	Wood Siding
+       WdShing	Wood Shingles
+	
+MasVnrType: Masonry veneer type
+
+       BrkCmn	Brick Common
+       BrkFace	Brick Face
+       CBlock	Cinder Block
+       None	None
+       Stone	Stone
+	
+MasVnrArea: Masonry veneer area in square feet
+
+ExterQual: Evaluates the quality of the material on the exterior 
+		
+       Ex	Excellent
+       Gd	Good
+       TA	Average/Typical
+       Fa	Fair
+       Po	Poor
+		
+ExterCond: Evaluates the present condition of the material on the exterior
+		
+       Ex	Excellent
+       Gd	Good
+       TA	Average/Typical
+       Fa	Fair
+       Po	Poor
+		
+Foundation: Type of foundation
+		
+       BrkTil	Brick & Tile
+       CBlock	Cinder Block
+       PConc	Poured Contrete	
+       Slab	Slab
+       Stone	Stone
+       Wood	Wood
+		
+BsmtQual: Evaluates the height of the basement
+
+       Ex	Excellent (100+ inches)	
+       Gd	Good (90-99 inches)
+       TA	Typical (80-89 inches)
+       Fa	Fair (70-79 inches)
+       Po	Poor (<70 inches
+       NA	No Basement
+		
+BsmtCond: Evaluates the general condition of the basement
+
+       Ex	Excellent
+       Gd	Good
+       TA	Typical - slight dampness allowed
+       Fa	Fair - dampness or some cracking or settling
+       Po	Poor - Severe cracking, settling, or wetness
+       NA	No Basement
+	
+BsmtExposure: Refers to walkout or garden level walls
+
+       Gd	Good Exposure
+       Av	Average Exposure (split levels or foyers typically score average or above)	
+       Mn	Mimimum Exposure
+       No	No Exposure
+       NA	No Basement
+	
+BsmtFinType1: Rating of basement finished area
+
+       GLQ	Good Living Quarters
+       ALQ	Average Living Quarters
+       BLQ	Below Average Living Quarters	
+       Rec	Average Rec Room
+       LwQ	Low Quality
+       Unf	Unfinshed
+       NA	No Basement
+		
+BsmtFinSF1: Type 1 finished square feet
+
+BsmtFinType2: Rating of basement finished area (if multiple types)
+
+       GLQ	Good Living Quarters
+       ALQ	Average Living Quarters
+       BLQ	Below Average Living Quarters	
+       Rec	Average Rec Room
+       LwQ	Low Quality
+       Unf	Unfinshed
+       NA	No Basement
+
+BsmtFinSF2: Type 2 finished square feet
+
+BsmtUnfSF: Unfinished square feet of basement area
+
+TotalBsmtSF: Total square feet of basement area
+
+Heating: Type of heating
+		
+       Floor	Floor Furnace
+       GasA	Gas forced warm air furnace
+       GasW	Gas hot water or steam heat
+       Grav	Gravity furnace	
+       OthW	Hot water or steam heat other than gas
+       Wall	Wall furnace
+		
+HeatingQC: Heating quality and condition
+
+       Ex	Excellent
+       Gd	Good
+       TA	Average/Typical
+       Fa	Fair
+       Po	Poor
+		
+CentralAir: Central air conditioning
+
+       N	No
+       Y	Yes
+		
+Electrical: Electrical system
+
+       SBrkr	Standard Circuit Breakers & Romex
+       FuseA	Fuse Box over 60 AMP and all Romex wiring (Average)	
+       FuseF	60 AMP Fuse Box and mostly Romex wiring (Fair)
+       FuseP	60 AMP Fuse Box and mostly knob & tube wiring (poor)
+       Mix	Mixed
+		
+1stFlrSF: First Floor square feet
+ 
+2ndFlrSF: Second floor square feet
+
+LowQualFinSF: Low quality finished square feet (all floors)
+
+GrLivArea: Above grade (ground) living area square feet
+
+BsmtFullBath: Basement full bathrooms
+
+BsmtHalfBath: Basement half bathrooms
+
+FullBath: Full bathrooms above grade
+
+HalfBath: Half baths above grade
+
+Bedroom: Bedrooms above grade (does NOT include basement bedrooms)
+
+Kitchen: Kitchens above grade
+
+KitchenQual: Kitchen quality
+
+       Ex	Excellent
+       Gd	Good
+       TA	Typical/Average
+       Fa	Fair
+       Po	Poor
+       	
+TotRmsAbvGrd: Total rooms above grade (does not include bathrooms)
+
+Functional: Home functionality (Assume typical unless deductions are warranted)
+
+       Typ	Typical Functionality
+       Min1	Minor Deductions 1
+       Min2	Minor Deductions 2
+       Mod	Moderate Deductions
+       Maj1	Major Deductions 1
+       Maj2	Major Deductions 2
+       Sev	Severely Damaged
+       Sal	Salvage only
+		
+Fireplaces: Number of fireplaces
+
+FireplaceQu: Fireplace quality
+
+       Ex	Excellent - Exceptional Masonry Fireplace
+       Gd	Good - Masonry Fireplace in main level
+       TA	Average - Prefabricated Fireplace in main living area or Masonry Fireplace in basement
+       Fa	Fair - Prefabricated Fireplace in basement
+       Po	Poor - Ben Franklin Stove
+       NA	No Fireplace
+		
+GarageType: Garage location
+		
+       2Types	More than one type of garage
+       Attchd	Attached to home
+       Basment	Basement Garage
+       BuiltIn	Built-In (Garage part of house - typically has room above garage)
+       CarPort	Car Port
+       Detchd	Detached from home
+       NA	No Garage
+		
+GarageYrBlt: Year garage was built
+		
+GarageFinish: Interior finish of the garage
+
+       Fin	Finished
+       RFn	Rough Finished	
+       Unf	Unfinished
+       NA	No Garage
+		
+GarageCars: Size of garage in car capacity
+
+GarageArea: Size of garage in square feet
+
+GarageQual: Garage quality
+
+       Ex	Excellent
+       Gd	Good
+       TA	Typical/Average
+       Fa	Fair
+       Po	Poor
+       NA	No Garage
+		
+GarageCond: Garage condition
+
+       Ex	Excellent
+       Gd	Good
+       TA	Typical/Average
+       Fa	Fair
+       Po	Poor
+       NA	No Garage
+		
+PavedDrive: Paved driveway
+
+       Y	Paved 
+       P	Partial Pavement
+       N	Dirt/Gravel
+		
+WoodDeckSF: Wood deck area in square feet
+
+OpenPorchSF: Open porch area in square feet
+
+EnclosedPorch: Enclosed porch area in square feet
+
+3SsnPorch: Three season porch area in square feet
+
+ScreenPorch: Screen porch area in square feet
+
+PoolArea: Pool area in square feet
+
+PoolQC: Pool quality
+		
+       Ex	Excellent
+       Gd	Good
+       TA	Average/Typical
+       Fa	Fair
+       NA	No Pool
+		
+Fence: Fence quality
+		
+       GdPrv	Good Privacy
+       MnPrv	Minimum Privacy
+       GdWo	Good Wood
+       MnWw	Minimum Wood/Wire
+       NA	No Fence
+	
+MiscFeature: Miscellaneous feature not covered in other categories
+		
+       Elev	Elevator
+       Gar2	2nd Garage (if not described in garage section)
+       Othr	Other
+       Shed	Shed (over 100 SF)
+       TenC	Tennis Court
+       NA	None
+		
+MiscVal: $Value of miscellaneous feature
+
+MoSold: Month Sold (MM)
+
+YrSold: Year Sold (YYYY)
+
+SaleType: Type of sale
+		
+       WD 	Warranty Deed - Conventional
+       CWD	Warranty Deed - Cash
+       VWD	Warranty Deed - VA Loan
+       New	Home just constructed and sold
+       COD	Court Officer Deed/Estate
+       Con	Contract 15% Down payment regular terms
+       ConLw	Contract Low Down payment and low interest
+       ConLI	Contract Low Interest
+       ConLD	Contract Low Down
+       Oth	Other
+		
+SaleCondition: Condition of sale
+
+       Normal	Normal Sale
+       Abnorml	Abnormal Sale -  trade, foreclosure, short sale
+       AdjLand	Adjoining Land Purchase
+       Alloca	Allocation - two linked properties with separate deeds, typically condo with a garage unit	
+       Family	Sale between family members
+       Partial	Home was not completed when last assessed (associated with New Homes)
--- a/data/kaggle_house/sample_submission.csv
+++ b/data/kaggle_house/sample_submission.csv
--- a/data/kaggle_house/test.csv
+++ b/data/kaggle_house/test.csv
--- a/data/kaggle_house/train.csv
+++ b/data/kaggle_house/train.csv
--- a/data/ptb/README
+++ b/data/ptb/README
@ -0,0 +1,10 @@
+Data description:
+
+Penn Treebank Corpus
+    - should be free for research purposes
+    - the same processing of data as used in many LM papers, including "Empirical Evaluation and Combination of Advanced Language Modeling Techniques"
+    - ptb.train.txt: train set
+    - ptb.valid.txt: development set (should be used just for tuning hyper-parameters, but not for training)
+    - ptb.test.txt: test set for reporting perplexity
+    
+    - ptb.char.*: the same data, just rewritten as sequences of characters, with spaces rewritten as '_' - useful for training character based models, as is shown in example 9
--- a/data/ptb/ptb.test.txt
+++ b/data/ptb/ptb.test.txt
--- a/data/ptb/ptb.train.txt
+++ b/data/ptb/ptb.train.txt
--- a/data/ptb/ptb.valid.txt
+++ b/data/ptb/ptb.valid.txt
--- a/data/rainier.jpg
+++ b/data/rainier.jpg
--- a/docs/.nojekyll
+++ b/docs/.nojekyll
--- a/docs/README.md
+++ b/docs/README.md
@ -0,0 +1,175 @@
+
+<div align=center>
+<img width="500" src="img/cover.png" alt="封面"/>
+</div>
+
+[本项目](https://tangshusen.me/Dive-into-DL-PyTorch)将[《动手学深度学习》](http://zh.d2l.ai/) 原书中MXNet代码实现改为PyTorch实现。原书作者：阿斯顿·张、李沐、扎卡里 C. 立顿、亚历山大 J. 斯莫拉以及其他社区贡献者，GitHub地址：https://github.com/d2l-ai/d2l-zh
+
+此书的[中](https://zh.d2l.ai/)[英](https://d2l.ai/)版本存在一些不同，针对此书英文版的PyTorch重构可参考[这个项目](https://github.com/dsgiitr/d2l-pytorch)。
+There are some differences between the [Chinese](https://zh.d2l.ai/) and [English](https://d2l.ai/) versions of this book. For the PyTorch modifying of the English version, you can refer to [this repo](https://github.com/dsgiitr/d2l-pytorch).
+
+
+## 简介
+本仓库主要包含code和docs两个文件夹（外加一些数据存放在data中）。其中code文件夹就是每章相关jupyter notebook代码（基于PyTorch）；docs文件夹就是markdown格式的《动手学深度学习》书中的相关内容，然后利用[docsify](https://docsify.js.org/#/zh-cn/)将网页文档部署到GitHub Pages上，由于原书使用的是MXNet框架，所以docs内容可能与原书略有不同，但是整体内容是一样的。欢迎对本项目做出贡献或提出issue。
+
+## 面向人群
+本项目面向对深度学习感兴趣，尤其是想使用PyTorch进行深度学习的童鞋。本项目并不要求你有任何深度学习或者机器学习的背景知识，你只需了解基础的数学和编程，如基础的线性代数、微分和概率，以及基础的Python编程。
+
+## 食用方法 
+### 方法一
+本仓库包含一些latex公式，但github的markdown原生是不支持公式显示的，而docs文件夹已经利用[docsify](https://docsify.js.org/#/zh-cn/)被部署到了GitHub Pages上，所以查看文档最简便的方法就是直接访问[本项目网页版](https://tangshusen.me/Dive-into-DL-PyTorch)。当然如果你还想跑一下运行相关代码的话还是得把本项目clone下来，然后运行code文件夹下相关代码。
+
+### 方法二
+你还可以在本地访问文档，先安装`docsify-cli`工具:
+``` shell
+npm i docsify-cli -g
+```
+然后将本项目clone到本地:
+``` shell
+git clone https://github.com/ShusenTang/Dive-into-DL-PyTorch.git
+cd Dive-into-DL-PyTorch
+```
+然后运行一个本地服务器，这样就可以很方便的在`http://localhost:3000`实时访问文档网页渲染效果。
+``` shell
+docsify serve docs
+```
+
+### 方法三
+如果你不想安装`docsify-cli`工具，甚至你的电脑上都没有安装`Node.js`，而出于某些原因你又想在本地浏览文档，那么你可以在`docker`容器中运行网页服务。
+
+首先将本项目clone到本地:
+``` shell
+git clone https://github.com/ShusenTang/Dive-into-DL-PyTorch.git
+cd Dive-into-DL-PyTorch
+```
+之后使用如下命令创建一个名称为「d2dl」的`docker`镜像：
+``` shell
+docker build -t d2dl .
+```
+镜像创建好后，运行如下命令创建一个新的容器：
+``` shell
+docker run -dp 3000:3000 d2dl
+```
+最后在浏览器中打开这个地址`http://localhost:3000/#/`，就能愉快地访问文档了。适合那些不想在电脑上装太多工具的小伙伴。
+
+
+## 目录
+* [简介]()
+* [阅读指南](read_guide.md)
+* [1. 深度学习简介](chapter01_DL-intro/deep-learning-intro.md)
+* 2\. 预备知识
+   * [2.1 环境配置](chapter02_prerequisite/2.1_install.md)
+   * [2.2 数据操作](chapter02_prerequisite/2.2_tensor.md)
+   * [2.3 自动求梯度](chapter02_prerequisite/2.3_autograd.md)
+* 3\. 深度学习基础
+   * [3.1 线性回归](chapter03_DL-basics/3.1_linear-regression.md)
+   * [3.2 线性回归的从零开始实现](chapter03_DL-basics/3.2_linear-regression-scratch.md)
+   * [3.3 线性回归的简洁实现](chapter03_DL-basics/3.3_linear-regression-pytorch.md)
+   * [3.4 softmax回归](chapter03_DL-basics/3.4_softmax-regression.md)
+   * [3.5 图像分类数据集（Fashion-MNIST）](chapter03_DL-basics/3.5_fashion-mnist.md)
+   * [3.6 softmax回归的从零开始实现](chapter03_DL-basics/3.6_softmax-regression-scratch.md)
+   * [3.7 softmax回归的简洁实现](chapter03_DL-basics/3.7_softmax-regression-pytorch.md)
+   * [3.8 多层感知机](chapter03_DL-basics/3.8_mlp.md)
+   * [3.9 多层感知机的从零开始实现](chapter03_DL-basics/3.9_mlp-scratch.md)
+   * [3.10 多层感知机的简洁实现](chapter03_DL-basics/3.10_mlp-pytorch.md)
+   * [3.11 模型选择、欠拟合和过拟合](chapter03_DL-basics/3.11_underfit-overfit.md)
+   * [3.12 权重衰减](chapter03_DL-basics/3.12_weight-decay.md)
+   * [3.13 丢弃法](chapter03_DL-basics/3.13_dropout.md)
+   * [3.14 正向传播、反向传播和计算图](chapter03_DL-basics/3.14_backprop.md)
+   * [3.15 数值稳定性和模型初始化](chapter03_DL-basics/3.15_numerical-stability-and-init.md)
+   * [3.16 实战Kaggle比赛：房价预测](chapter03_DL-basics/3.16_kaggle-house-price.md)
+* 4\. 深度学习计算
+   * [4.1 模型构造](chapter04_DL_computation/4.1_model-construction.md)
+   * [4.2 模型参数的访问、初始化和共享](chapter04_DL_computation/4.2_parameters.md)
+   * [4.3 模型参数的延后初始化](chapter04_DL_computation/4.3_deferred-init.md)
+   * [4.4 自定义层](chapter04_DL_computation/4.4_custom-layer.md)
+   * [4.5 读取和存储](chapter04_DL_computation/4.5_read-write.md)
+   * [4.6 GPU计算](chapter04_DL_computation/4.6_use-gpu.md)
+* 5\. 卷积神经网络
+   * [5.1 二维卷积层](chapter05_CNN/5.1_conv-layer.md)
+   * [5.2 填充和步幅](chapter05_CNN/5.2_padding-and-strides.md)
+   * [5.3 多输入通道和多输出通道](chapter05_CNN/5.3_channels.md)
+   * [5.4 池化层](chapter05_CNN/5.4_pooling.md)
+   * [5.5 卷积神经网络（LeNet）](chapter05_CNN/5.5_lenet.md)
+   * [5.6 深度卷积神经网络（AlexNet）](chapter05_CNN/5.6_alexnet.md)
+   * [5.7 使用重复元素的网络（VGG）](chapter05_CNN/5.7_vgg.md)
+   * [5.8 网络中的网络（NiN）](chapter05_CNN/5.8_nin.md)
+   * [5.9 含并行连结的网络（GoogLeNet）](chapter05_CNN/5.9_googlenet.md)
+   * [5.10 批量归一化](chapter05_CNN/5.10_batch-norm.md)
+   * [5.11 残差网络（ResNet）](chapter05_CNN/5.11_resnet.md)
+   * [5.12 稠密连接网络（DenseNet）](chapter05_CNN/5.12_densenet.md)
+* 6\. 循环神经网络
+   * [6.1 语言模型](chapter06_RNN/6.1_lang-model.md)
+   * [6.2 循环神经网络](chapter06_RNN/6.2_rnn.md)
+   * [6.3 语言模型数据集（周杰伦专辑歌词）](chapter06_RNN/6.3_lang-model-dataset.md)
+   * [6.4 循环神经网络的从零开始实现](chapter06_RNN/6.4_rnn-scratch.md)
+   * [6.5 循环神经网络的简洁实现](chapter06_RNN/6.5_rnn-pytorch.md)
+   * [6.6 通过时间反向传播](chapter06_RNN/6.6_bptt.md)
+   * [6.7 门控循环单元（GRU）](chapter06_RNN/6.7_gru.md)
+   * [6.8 长短期记忆（LSTM）](chapter06_RNN/6.8_lstm.md)
+   * [6.9 深度循环神经网络](chapter06_RNN/6.9_deep-rnn.md)
+   * [6.10 双向循环神经网络](chapter06_RNN/6.10_bi-rnn.md)
+* 7\. 优化算法
+   * [7.1 优化与深度学习](chapter07_optimization/7.1_optimization-intro.md)
+   * [7.2 梯度下降和随机梯度下降](chapter07_optimization/7.2_gd-sgd.md)
+   * [7.3 小批量随机梯度下降](chapter07_optimization/7.3_minibatch-sgd.md)
+   * [7.4 动量法](chapter07_optimization/7.4_momentum.md)
+   * [7.5 AdaGrad算法](chapter07_optimization/7.5_adagrad.md)
+   * [7.6 RMSProp算法](chapter07_optimization/7.6_rmsprop.md)
+   * [7.7 AdaDelta算法](chapter07_optimization/7.7_adadelta.md)
+   * [7.8 Adam算法](chapter07_optimization/7.8_adam.md)
+* 8\. 计算性能
+   * [8.1 命令式和符号式混合编程](chapter08_computational-performance/8.1_hybridize.md)
+   * [8.2 异步计算](chapter08_computational-performance/8.2_async-computation.md)
+   * [8.3 自动并行计算](chapter08_computational-performance/8.3_auto-parallelism.md)
+   * [8.4 多GPU计算](chapter08_computational-performance/8.4_multiple-gpus.md)
+* 9\. 计算机视觉
+   * [9.1 图像增广](chapter09_computer-vision/9.1_image-augmentation.md)
+   * [9.2 微调](chapter09_computer-vision/9.2_fine-tuning.md)
+   * [9.3 目标检测和边界框](chapter09_computer-vision/9.3_bounding-box.md)
+   * [9.4 锚框](chapter09_computer-vision/9.4_anchor.md)
+   * [9.5 多尺度目标检测](chapter09_computer-vision/9.5_multiscale-object-detection.md)
+   * [9.6 目标检测数据集（皮卡丘）](chapter09_computer-vision/9.6_object-detection-dataset.md)
+   - [ ] 9.7 单发多框检测（SSD）
+   * [9.8 区域卷积神经网络（R-CNN）系列](chapter09_computer-vision/9.8_rcnn.md)
+   * [9.9 语义分割和数据集](chapter09_computer-vision/9.9_semantic-segmentation-and-dataset.md)
+   - [ ] 9.10 全卷积网络（FCN）
+   * [9.11 样式迁移](chapter09_computer-vision/9.11_neural-style.md)
+   - [ ] 9.12 实战Kaggle比赛：图像分类（CIFAR-10）
+   - [ ] 9.13 实战Kaggle比赛：狗的品种识别（ImageNet Dogs）
+* 10\. 自然语言处理
+   * [10.1 词嵌入（word2vec）](chapter10_natural-language-processing/10.1_word2vec.md)
+   * [10.2 近似训练](chapter10_natural-language-processing/10.2_approx-training.md)
+   * [10.3 word2vec的实现](chapter10_natural-language-processing/10.3_word2vec-pytorch.md)
+   * [10.4 子词嵌入（fastText）](chapter10_natural-language-processing/10.4_fasttext.md)
+   * [10.5 全局向量的词嵌入（GloVe）](chapter10_natural-language-processing/10.5_glove.md)
+   * [10.6 求近义词和类比词](chapter10_natural-language-processing/10.6_similarity-analogy.md)
+   * [10.7 文本情感分类：使用循环神经网络](chapter10_natural-language-processing/10.7_sentiment-analysis-rnn.md)
+   * [10.8 文本情感分类：使用卷积神经网络（textCNN）](chapter10_natural-language-processing/10.8_sentiment-analysis-cnn.md)
+   * [10.9 编码器—解码器（seq2seq）](chapter10_natural-language-processing/10.9_seq2seq.md)
+   * [10.10 束搜索](chapter10_natural-language-processing/10.10_beam-search.md)
+   * [10.11 注意力机制](chapter10_natural-language-processing/10.11_attention.md)
+   * [10.12 机器翻译](chapter10_natural-language-processing/10.12_machine-translation.md)
+
+
+
+持续更新中......
+
+
+
+
+## 原书地址
+中文版：[动手学深度学习](https://zh.d2l.ai/) | [Github仓库](https://github.com/d2l-ai/d2l-zh)       
+English Version: [Dive into Deep Learning](https://d2l.ai/) | [Github Repo](https://github.com/d2l-ai/d2l-en)
+
+
+## 引用
+如果您在研究中使用了这个项目请引用原书:
+```
+@book{zhang2019dive,
+    title={Dive into Deep Learning},
+    author={Aston Zhang and Zachary C. Lipton and Mu Li and Alexander J. Smola},
+    note={\url{http://www.d2l.ai}},
+    year={2020}
+}
+```
--- a/docs/_sidebar.md
+++ b/docs/_sidebar.md
@ -0,0 +1,96 @@
+* [简介]()
+* [阅读指南](read_guide.md)
+* [1. 深度学习简介](chapter01_DL-intro/deep-learning-intro.md)
+* 2\. 预备知识
+   * [2.1 环境配置](chapter02_prerequisite/2.1_install.md)
+   * [2.2 数据操作](chapter02_prerequisite/2.2_tensor.md)
+   * [2.3 自动求梯度](chapter02_prerequisite/2.3_autograd.md)
+* 3\. 深度学习基础
+   * [3.1 线性回归](chapter03_DL-basics/3.1_linear-regression.md)
+   * [3.2 线性回归的从零开始实现](chapter03_DL-basics/3.2_linear-regression-scratch.md)
+   * [3.3 线性回归的简洁实现](chapter03_DL-basics/3.3_linear-regression-pytorch.md)
+   * [3.4 softmax回归](chapter03_DL-basics/3.4_softmax-regression.md)
+   * [3.5 图像分类数据集（Fashion-MNIST）](chapter03_DL-basics/3.5_fashion-mnist.md)
+   * [3.6 softmax回归的从零开始实现](chapter03_DL-basics/3.6_softmax-regression-scratch.md)
+   * [3.7 softmax回归的简洁实现](chapter03_DL-basics/3.7_softmax-regression-pytorch.md)
+   * [3.8 多层感知机](chapter03_DL-basics/3.8_mlp.md)
+   * [3.9 多层感知机的从零开始实现](chapter03_DL-basics/3.9_mlp-scratch.md)
+   * [3.10 多层感知机的简洁实现](chapter03_DL-basics/3.10_mlp-pytorch.md)
+   * [3.11 模型选择、欠拟合和过拟合](chapter03_DL-basics/3.11_underfit-overfit.md)
+   * [3.12 权重衰减](chapter03_DL-basics/3.12_weight-decay.md)
+   * [3.13 丢弃法](chapter03_DL-basics/3.13_dropout.md)
+   * [3.14 正向传播、反向传播和计算图](chapter03_DL-basics/3.14_backprop.md)
+   * [3.15 数值稳定性和模型初始化](chapter03_DL-basics/3.15_numerical-stability-and-init.md)
+   * [3.16 实战Kaggle比赛：房价预测](chapter03_DL-basics/3.16_kaggle-house-price.md)
+* 4\. 深度学习计算
+   * [4.1 模型构造](chapter04_DL_computation/4.1_model-construction.md)
+   * [4.2 模型参数的访问、初始化和共享](chapter04_DL_computation/4.2_parameters.md)
+   * [4.3 模型参数的延后初始化](chapter04_DL_computation/4.3_deferred-init.md)
+   * [4.4 自定义层](chapter04_DL_computation/4.4_custom-layer.md)
+   * [4.5 读取和存储](chapter04_DL_computation/4.5_read-write.md)
+   * [4.6 GPU计算](chapter04_DL_computation/4.6_use-gpu.md)
+* 5\. 卷积神经网络
+   * [5.1 二维卷积层](chapter05_CNN/5.1_conv-layer.md)
+   * [5.2 填充和步幅](chapter05_CNN/5.2_padding-and-strides.md)
+   * [5.3 多输入通道和多输出通道](chapter05_CNN/5.3_channels.md)
+   * [5.4 池化层](chapter05_CNN/5.4_pooling.md)
+   * [5.5 卷积神经网络（LeNet）](chapter05_CNN/5.5_lenet.md)
+   * [5.6 深度卷积神经网络（AlexNet）](chapter05_CNN/5.6_alexnet.md)
+   * [5.7 使用重复元素的网络（VGG）](chapter05_CNN/5.7_vgg.md)
+   * [5.8 网络中的网络（NiN）](chapter05_CNN/5.8_nin.md)
+   * [5.9 含并行连结的网络（GoogLeNet）](chapter05_CNN/5.9_googlenet.md)
+   * [5.10 批量归一化](chapter05_CNN/5.10_batch-norm.md)
+   * [5.11 残差网络（ResNet）](chapter05_CNN/5.11_resnet.md)
+   * [5.12 稠密连接网络（DenseNet）](chapter05_CNN/5.12_densenet.md)
+* 6\. 循环神经网络
+   * [6.1 语言模型](chapter06_RNN/6.1_lang-model.md)
+   * [6.2 循环神经网络](chapter06_RNN/6.2_rnn.md)
+   * [6.3 语言模型数据集（周杰伦专辑歌词）](chapter06_RNN/6.3_lang-model-dataset.md)
+   * [6.4 循环神经网络的从零开始实现](chapter06_RNN/6.4_rnn-scratch.md)
+   * [6.5 循环神经网络的简洁实现](chapter06_RNN/6.5_rnn-pytorch.md)
+   * [6.6 通过时间反向传播](chapter06_RNN/6.6_bptt.md)
+   * [6.7 门控循环单元（GRU）](chapter06_RNN/6.7_gru.md)
+   * [6.8 长短期记忆（LSTM）](chapter06_RNN/6.8_lstm.md)
+   * [6.9 深度循环神经网络](chapter06_RNN/6.9_deep-rnn.md)
+   * [6.10 双向循环神经网络](chapter06_RNN/6.10_bi-rnn.md)
+* 7\. 优化算法
+   * [7.1 优化与深度学习](chapter07_optimization/7.1_optimization-intro.md)
+   * [7.2 梯度下降和随机梯度下降](chapter07_optimization/7.2_gd-sgd.md)
+   * [7.3 小批量随机梯度下降](chapter07_optimization/7.3_minibatch-sgd.md)
+   * [7.4 动量法](chapter07_optimization/7.4_momentum.md)
+   * [7.5 AdaGrad算法](chapter07_optimization/7.5_adagrad.md)
+   * [7.6 RMSProp算法](chapter07_optimization/7.6_rmsprop.md)
+   * [7.7 AdaDelta算法](chapter07_optimization/7.7_adadelta.md)
+   * [7.8 Adam算法](chapter07_optimization/7.8_adam.md)
+* 8\. 计算性能
+   * [8.1 命令式和符号式混合编程](chapter08_computational-performance/8.1_hybridize.md)
+   * [8.2 异步计算](chapter08_computational-performance/8.2_async-computation.md)
+   * [8.3 自动并行计算](chapter08_computational-performance/8.3_auto-parallelism.md)
+   * [8.4 多GPU计算](chapter08_computational-performance/8.4_multiple-gpus.md)
+* 9\. 计算机视觉
+   * [9.1 图像增广](chapter09_computer-vision/9.1_image-augmentation.md)
+   * [9.2 微调](chapter09_computer-vision/9.2_fine-tuning.md)
+   * [9.3 目标检测和边界框](chapter09_computer-vision/9.3_bounding-box.md)
+   * [9.4 锚框](chapter09_computer-vision/9.4_anchor.md)
+   * [9.5 多尺度目标检测](chapter09_computer-vision/9.5_multiscale-object-detection.md)
+   * [9.6 目标检测数据集（皮卡丘）](chapter09_computer-vision/9.6_object-detection-dataset.md)
+   * 9.7 单发多框检测（SSD）
+   * [9.8 区域卷积神经网络（R-CNN）系列](chapter09_computer-vision/9.8_rcnn.md)
+   * [9.9 语义分割和数据集](chapter09_computer-vision/9.9_semantic-segmentation-and-dataset.md)
+   * 9.10 全卷积网络（FCN）
+   * [9.11 样式迁移](chapter09_computer-vision/9.11_neural-style.md)
+   * 9.12 实战Kaggle比赛：图像分类（CIFAR-10）
+   * 9.13 实战Kaggle比赛：狗的品种识别（ImageNet Dogs）
+* 10\. 自然语言处理
+   * [10.1 词嵌入（word2vec）](chapter10_natural-language-processing/10.1_word2vec.md)
+   * [10.2 近似训练](chapter10_natural-language-processing/10.2_approx-training.md)
+   * [10.3 word2vec的实现](chapter10_natural-language-processing/10.3_word2vec-pytorch.md)
+   * [10.4 子词嵌入（fastText）](chapter10_natural-language-processing/10.4_fasttext.md)
+   * [10.5 全局向量的词嵌入（GloVe）](chapter10_natural-language-processing/10.5_glove.md)
+   * [10.6 求近义词和类比词](chapter10_natural-language-processing/10.6_similarity-analogy.md)
+   * [10.7 文本情感分类：使用循环神经网络](chapter10_natural-language-processing/10.7_sentiment-analysis-rnn.md)
+   * [10.8 文本情感分类：使用卷积神经网络（textCNN）](chapter10_natural-language-processing/10.8_sentiment-analysis-cnn.md)
+   * [10.9 编码器—解码器（seq2seq）](chapter10_natural-language-processing/10.9_seq2seq.md)
+   * [10.10 束搜索](chapter10_natural-language-processing/10.10_beam-search.md)
+   * [10.11 注意力机制](chapter10_natural-language-processing/10.11_attention.md)
+   * [10.12 机器翻译](chapter10_natural-language-processing/10.12_machine-translation.md)
--- a/docs/chapter01_DL-intro/deep-learning-intro.md
+++ b/docs/chapter01_DL-intro/deep-learning-intro.md
@ -0,0 +1,183 @@
+# 深度学习简介
+你可能已经接触过编程，并开发过一两款程序。同时你可能读过关于深度学习或者机器学习的铺天盖地的报道，尽管很多时候它们被赋予了更广义的名字：人工智能。实际上，或者说幸运的是，大部分程序并不需要深度学习或者是更广义上的人工智能技术。例如，如果我们要为一台微波炉编写一个用户界面，只需要一点儿工夫我们便能设计出十几个按钮以及一系列能精确描述微波炉在各种情况下的表现的规则。再比如，假设我们要编写一个电子邮件客户端。这样的程序比微波炉要复杂一些，但我们还是可以沉下心来一步一步思考：客户端的用户界面将需要几个输入框来接受收件人、主题、邮件正文等，程序将监听键盘输入并写入一个缓冲区，然后将它们显示在相应的输入框中。当用户点击“发送”按钮时，我们需要检查收件人邮箱地址的格式是否正确，并检查邮件主题是否为空，或在主题为空时警告用户，而后用相应的协议传送邮件。
+
+值得注意的是，在以上两个例子中，我们都不需要收集真实世界中的数据，也不需要系统地提取这些数据的特征。只要有充足的时间，我们的常识与编程技巧已经足够让我们完成任务。
+
+与此同时，我们很容易就能找到一些连世界上最好的程序员也无法仅用编程技巧解决的简单问题。例如，假设我们想要编写一个判定一张图像中有没有猫的程序。这件事听起来好像很简单，对不对？程序只需要对每张输入图像输出“真”（表示有猫）或者“假”（表示无猫）即可。但令人惊讶的是，即使是世界上最优秀的计算机科学家和程序员也不懂如何编写这样的程序。
+
+我们该从哪里入手呢？我们先进一步简化这个问题：若假设所有图像的高和宽都是同样的400像素大小，一个像素由红绿蓝三个值构成，那么一张图像就由近50万个数值表示。那么哪些数值隐藏着我们需要的信息呢？是所有数值的平均数，还是四个角的数值，抑或是图像中的某一个特别的点？事实上，要想解读图像中的内容，需要寻找仅仅在结合成千上万的数值时才会出现的特征，如边缘、质地、形状、眼睛、鼻子等，最终才能判断图像中是否有猫。
+
+一种解决以上问题的思路是逆向思考。与其设计一个解决问题的程序，不如从最终的需求入手来寻找一个解决方案。事实上，这也是目前的机器学习和深度学习应用共同的核心思想：我们可以称其为“用数据编程”。与其枯坐在房间里思考怎么设计一个识别猫的程序，不如利用人类肉眼在图像中识别猫的能力。我们可以收集一些已知包含猫与不包含猫的真实图像，然后我们的目标就转化成如何从这些图像入手得到一个可以推断出图像中是否有猫的函数。这个函数的形式通常通过我们的知识来针对特定问题选定。例如，我们使用一个二次函数来判断图像中是否有猫，但是像二次函数系数值这样的函数参数的具体值则是通过数据来确定。
+
+通俗来说，机器学习是一门讨论各式各样的适用于不同问题的函数形式，以及如何使用数据来有效地获取函数参数具体值的学科。深度学习是指机器学习中的一类函数，它们的形式通常为多层神经网络。近年来，仰仗着大数据集和强大的硬件，深度学习已逐渐成为处理图像、文本语料和声音信号等复杂高维度数据的主要方法。
+
+我们现在正处于一个程序设计得到深度学习的帮助越来越多的时代。这可以说是计算机科学历史上的一个分水岭。举个例子，深度学习已经在你的手机里：拼写校正、语音识别、认出社交媒体照片里的好友们等。得益于优秀的算法、快速而廉价的算力、前所未有的大量数据以及强大的软件工具，如今大多数软件工程师都有能力建立复杂的模型来解决十年前连最优秀的科学家都觉得棘手的问题。
+
+本书希望能帮助读者进入深度学习的浪潮中。我们希望结合数学、代码和样例让深度学习变得触手可及。本书不要求具有高深的数学或编程背景，我们将随着章节的发展逐一解释所需要的知识。更值得一提的是，本书的每一节都是一个可以独立运行的Jupyter记事本。读者可以从网上获得这些记事本，并且可以在个人电脑或云端服务器上执行它们。这样读者就可以随意改动书中的代码并得到及时反馈。我们希望本书能帮助和启发新一代的程序员、创业者、统计学家、生物学家，以及所有对深度学习感兴趣的人。
+
+
+## 起源
+
+虽然深度学习似乎是最近几年刚兴起的名词，但它所基于的神经网络模型和用数据编程的核心思想已经被研究了数百年。自古以来，人类就一直渴望能从数据中分析出预知未来的窍门。实际上，数据分析正是大部分自然科学的本质，我们希望从日常的观测中提取规则，并找寻不确定性。
+
+早在17世纪，[雅各比·伯努利（1655--1705）](https://en.wikipedia.org/wiki/Jacob_Bernoulli)提出了描述只有两种结果的随机过程（如抛掷一枚硬币）的伯努利分布。大约一个世纪之后，[卡尔·弗里德里希·高斯（1777--1855）](https://en.wikipedia.org/wiki/Carl_Friedrich_Gauss)发明了今日仍广泛用在从保险计算到医学诊断等领域的最小二乘法。概率论、统计学和模式识别等工具帮助自然科学的实验学家们从数据回归到自然定律，从而发现了如欧姆定律（描述电阻两端电压和流经电阻电流关系的定律）这类可以用线性模型完美表达的一系列自然法则。
+
+即使是在中世纪，数学家也热衷于利用统计学来做出估计。例如，在[雅各比·科贝尔（1460--1533）](https://www.maa.org/press/periodicals/convergence/mathematical-treasures-jacob-kobels-geometry)的几何书中记载了使用16名男子的平均脚长来估计男子的平均脚长。
+
+<div align=center>
+<img width="600" src="../img/chapter01/1.1_koebel.jpg"/>
+</div>
+<center>图1.1 在中世纪，16名男子的平均脚长被用来估计男子的平均脚长</center>
+
+
+
+如图1.1所示，在这个研究中，16位成年男子被要求在离开教堂时站成一排并把脚贴在一起，而后他们脚的总长度除以16得到了一个估计：这个数字大约相当于今日的一英尺。这个算法之后又被改进，以应对特异形状的脚：最长和最短的脚不计入，只对剩余的脚长取平均值，即裁剪平均值的雏形。
+
+现代统计学在20世纪的真正起飞要归功于数据的收集和发布。统计学巨匠之一[罗纳德·费雪（1890--1962）](https://en.wikipedia.org/wiki/Ronald_Fisher)对统计学理论和统计学在基因学中的应用功不可没。他发明的许多算法和公式，例如线性判别分析和费雪信息，仍经常被使用。即使是他在1936年发布的Iris数据集，仍然偶尔被用于演示机器学习算法。
+
+[克劳德·香农（1916--2001）](https://en.wikipedia.org/wiki/Claude_Shannon)的信息论以及[阿兰·图灵 （1912--1954）](https://en.wikipedia.org/wiki/Allan_Turing)的计算理论也对机器学习有深远影响。图灵在他著名的论文[《计算机器与智能》](https://www.jstor.org/stable/2251299)中提出了“机器可以思考吗？”这样一个问题 [1]。在他描述的“图灵测试”中，如果一个人在使用文本交互时不能区分他的对话对象到底是人类还是机器的话，那么即可认为这台机器是有智能的。时至今日，智能机器的发展可谓日新月异。
+
+另一个对深度学习有重大影响的领域是神经科学与心理学。既然人类显然能够展现出智能，那么对于解释并逆向工程人类智能机理的探究也在情理之中。最早的算法之一是由[唐纳德·赫布（1904--1985）](https://en.wikipedia.org/wiki/Donald_O._Hebb)正式提出的。在他开创性的著作[《行为的组织》](http://s-f-walker.org.uk/pubsebooks/pdfs/The_Organization_of_Behavior-Donald_O._Hebb.pdf)中，他提出神经是通过正向强化来学习的，即赫布理论 [2]。赫布理论是感知机学习算法的原型，并成为支撑今日深度学习的随机梯度下降算法的基石：强化合意的行为、惩罚不合意的行为，最终获得优良的神经网络参数。
+
+来源于生物学的灵感是神经网络名字的由来。这类研究者可以追溯到一个多世纪前的[亚历山大·贝恩（1818--1903）](https://en.wikipedia.org/wiki/Alexander_Bain)和[查尔斯·斯科特·谢灵顿（1857--1952）](https://en.wikipedia.org/wiki/Charles_Scott_Sherrington)。研究者们尝试组建模仿神经元互动的计算电路。随着时间发展，神经网络的生物学解释被稀释，但仍保留了这个名字。时至今日，绝大多数神经网络都包含以下的核心原则。
+
+* 交替使用线性处理单元与非线性处理单元，它们经常被称为“层”。
+* 使用链式法则（即反向传播）来更新网络的参数。
+
+在最初的快速发展之后，自约1995年起至2005年，大部分机器学习研究者的视线从神经网络上移开了。这是由于多种原因。首先，训练神经网络需要极强的计算力。尽管20世纪末内存已经足够，计算力却不够充足。其次，当时使用的数据集也相对小得多。费雪在1936年发布的的Iris数据集仅有150个样本，并被广泛用于测试算法的性能。具有6万个样本的MNIST数据集在当时已经被认为是非常庞大了，尽管它如今已被认为是典型的简单数据集。由于数据和计算力的稀缺，从经验上来说，如核方法、决策树和概率图模型等统计工具更优。它们不像神经网络一样需要长时间的训练，并且在强大的理论保证下提供可以预测的结果。
+
+## 发展
+
+互联网的崛起、价廉物美的传感器和低价的存储器令我们越来越容易获取大量数据。加之便宜的计算力，尤其是原本为电脑游戏设计的GPU的出现，上文描述的情况改变了许多。一瞬间，原本被认为不可能的算法和模型变得触手可及。这样的发展趋势从如下表格中可见一斑。
+
+|年代|数据样本个数|内存|每秒浮点计算数|
+|:--|:-:|:-:|:-:|
+|1970|100（Iris）|1 KB|100 K（Intel 8080）|
+|1980|1 K（波士顿房价）|100 KB|1 M（Intel 80186）|
+|1990|10 K（手写字符识别）|10 MB|10 M（Intel 80486）|
+|2000|10 M（网页）|100 MB|1 G（Intel Core）|
+|2010|10 G（广告）|1 GB|1 T（NVIDIA C2050）|
+|2020|1 T（社交网络）|100 GB|1 P（NVIDIA DGX-2）|
+
+很显然，存储容量没能跟上数据量增长的步伐。与此同时，计算力的增长又盖过了数据量的增长。这样的趋势使得统计模型可以在优化参数上投入更多的计算力，但同时需要提高存储的利用效率，例如使用非线性处理单元。这也相应导致了机器学习和统计学的最优选择从广义线性模型及核方法变化为深度多层神经网络。这样的变化正是诸如多层感知机、卷积神经网络、长短期记忆循环神经网络和Q学习等深度学习的支柱模型在过去10年从坐了数十年的冷板凳上站起来被“重新发现”的原因。
+
+近年来在统计模型、应用和算法上的进展常被拿来与寒武纪大爆发（历史上物种数量大爆发的一个时期）做比较。但这些进展不仅仅是因为可用资源变多了而让我们得以用新瓶装旧酒。下面的列表仅仅涵盖了近十年来深度学习长足发展的部分原因。
+
+* 优秀的容量控制方法，如丢弃法，使大型网络的训练不再受制于过拟合（大型神经网络学会记忆大部分训练数据的行为） [3]。这是靠在整个网络中注入噪声而达到的，如训练时随机将权重替换为随机的数字 [4]。
+
+* 注意力机制解决了另一个困扰统计学超过一个世纪的问题：如何在不增加参数的情况下扩展一个系统的记忆容量和复杂度。注意力机制使用了一个可学习的指针结构来构建出一个精妙的解决方法 [5]。也就是说，与其在像机器翻译这样的任务中记忆整个句子，不如记忆指向翻译的中间状态的指针。由于生成译文前不需要再存储整句原文的信息，这样的结构使准确翻译长句变得可能。
+
+* 记忆网络 [6]和神经编码器—解释器 [7]这样的多阶设计使得针对推理过程的迭代建模方法变得可能。这些模型允许重复修改深度网络的内部状态，这样就能模拟出推理链条上的各个步骤，就好像处理器在计算过程中修改内存一样。
+
+* 另一个重大发展是生成对抗网络的发明 [8]。传统上，用在概率分布估计和生成模型上的统计方法更多地关注于找寻正确的概率分布，以及正确的采样算法。生成对抗网络的关键创新在于将采样部分替换成了任意的含有可微分参数的算法。这些参数将被训练到使辨别器不能再分辨真实的和生成的样本。生成对抗网络可使用任意算法来生成输出的这一特性为许多技巧打开了新的大门。例如生成奔跑的斑马 [9]和生成名流的照片 [10] 都是生成对抗网络发展的见证。
+
+* 许多情况下单个GPU已经不能满足在大型数据集上进行训练的需要。过去10年内我们构建分布式并行训练算法的能力已经有了极大的提升。设计可扩展算法的最大瓶颈在于深度学习优化算法的核心：随机梯度下降需要相对更小的批量。与此同时，更小的批量也会降低GPU的效率。如果使用1,024个GPU，每个GPU的批量大小为32个样本，那么单步训练的批量大小将是32,000个以上。近年来李沐 [11]、Yang You等人 [12]以及Xianyan Jia等人 [13]的工作将批量大小增至多达64,000个样例，并把在ImageNet数据集上训练ResNet-50模型的时间降到了7分钟。与之对比，最初的训练时间需要以天来计算。
+
+* 并行计算的能力也为至少在可以采用模拟情况下的强化学习的发展贡献了力量。并行计算帮助计算机在围棋、雅达利游戏、星际争霸和物理模拟上达到了超过人类的水准。
+
+* 深度学习框架也在传播深度学习思想的过程中扮演了重要角色。[Caffe](https://github.com/BVLC/caffe)、 [Torch](https://github.com/torch)和[Theano](https://github.com/Theano/Theano)这样的第一代框架使建模变得更简单。许多开创性的论文都用到了这些框架。如今它们已经被[TensorFlow](https://github.com/tensorflow/tensorflow)（经常是以高层API [Keras](https://github.com/keras-team/keras)的形式被使用）、[CNTK](https://github.com/Microsoft/CNTK)、 [Caffe 2](https://github.com/caffe2/caffe2) 和[Apache MXNet](https://github.com/apache/incubator-mxnet)所取代。第三代，即命令式深度学习框架，是由用类似NumPy的语法来定义模型的 [Chainer](https://github.com/chainer/chainer)所开创的。这样的思想后来被 [PyTorch](https://github.com/pytorch/pytorch)和MXNet的[Gluon API](https://github.com/apache/incubator-mxnet) 采用，后者也正是本书用来教学深度学习的工具。
+
+系统研究者负责构建更好的工具，统计学家建立更好的模型。这样的分工使工作大大简化。举例来说，在2014年时，训练一个逻辑回归模型曾是卡内基梅隆大学布置给机器学习方向的新入学博士生的作业问题。时至今日，这个问题只需要少于10行的代码便可以完成，普通的程序员都可以做到。
+
+## 成功案例
+
+长期以来机器学习总能完成其他方法难以完成的目标。例如，自20世纪90年代起，邮件的分拣就开始使用光学字符识别。实际上这正是知名的MNIST和USPS手写数字数据集的来源。机器学习也是电子支付系统的支柱，可以用于读取银行支票、进行授信评分以及防止金融欺诈。机器学习算法在网络上被用来提供搜索结果、个性化推荐和网页排序。虽然长期处于公众视野之外，但是机器学习已经渗透到了我们工作和生活的方方面面。直到近年来，在此前认为无法被解决的问题以及直接关系到消费者的问题上取得突破性进展后，机器学习才逐渐变成公众的焦点。这些进展基本归功于深度学习。
+
+* 苹果公司的Siri、亚马逊的Alexa和谷歌助手一类的智能助手能以可观的准确率回答口头提出的问题，甚至包括从简单的开关灯具（对残疾群体帮助很大）到提供语音对话帮助。智能助手的出现或许可以作为人工智能开始影响我们生活的标志。
+
+* 智能助手的关键是需要能够精确识别语音，而这类系统在某些应用上的精确度已经渐渐增长到可以与人类比肩 [14]。
+
+* 物体识别也经历了漫长的发展过程。在2010年从图像中识别出物体的类别仍是一个相当有挑战性的任务。当年日本电气、伊利诺伊大学香槟分校和罗格斯大学团队在ImageNet基准测试上取得了28%的前五错误率 [15]。到2017年，这个数字降低到了2.25% [16]。研究人员在鸟类识别和皮肤癌诊断上，也取得了同样惊世骇俗的成绩。
+
+* 游戏曾被认为是人类智能最后的堡垒。自使用时间差分强化学习玩双陆棋的TD-Gammon开始，算法和算力的发展催生了一系列在游戏上使用的新算法。与双陆棋不同，国际象棋有更复杂的状态空间和更多的可选动作。“深蓝”用大量的并行、专用硬件和游戏树的高效搜索打败了加里·卡斯帕罗夫 [17]。围棋因其庞大的状态空间被认为是更难的游戏，AlphaGo在2016年用结合深度学习与蒙特卡洛树采样的方法达到了人类水准 [18]。对德州扑克游戏而言，除了巨大的状态空间之外，更大的挑战是游戏的信息并不完全可见，例如看不到对手的牌。而“冷扑大师”用高效的策略体系超越了人类玩家的表现 [19]。以上的例子都体现出了先进的算法是人工智能在游戏上的表现提升的重要原因。
+
+* 机器学习进步的另一个标志是自动驾驶汽车的发展。尽管距离完全的自主驾驶还有很长的路要走，但诸如[Tesla](http://www.tesla.com)、[NVIDIA](http://www.nvidia.com)、 [MobilEye](http://www.mobileye.com)和[Waymo](http://www.waymo.com)这样的公司发布的具有部分自主驾驶功能的产品展示出了这个领域巨大的进步。完全自主驾驶的难点在于它需要将感知、思考和规则整合在同一个系统中。目前，深度学习主要被应用在计算机视觉的部分，剩余的部分还是需要工程师们的大量调试。
+
+以上列出的仅仅是近年来深度学习所取得的成果的冰山一角。机器人学、物流管理、计算生物学、粒子物理学和天文学近年来的发展也有一部分要归功于深度学习。可以看到，深度学习已经逐渐演变成一个工程师和科学家皆可使用的普适工具。
+
+
+## 特点
+
+在描述深度学习的特点之前，我们先回顾并概括一下机器学习和深度学习的关系。机器学习研究如何使计算机系统利用经验改善性能。它是人工智能领域的分支，也是实现人工智能的一种手段。在机器学习的众多研究方向中，表征学习关注如何自动找出表示数据的合适方式，以便更好地将输入变换为正确的输出，而本书要重点探讨的深度学习是具有多级表示的表征学习方法。在每一级（从原始数据开始），深度学习通过简单的函数将该级的表示变换为更高级的表示。因此，深度学习模型也可以看作是由许多简单函数复合而成的函数。当这些复合的函数足够多时，深度学习模型就可以表达非常复杂的变换。
+
+深度学习可以逐级表示越来越抽象的概念或模式。以图像为例，它的输入是一堆原始像素值。深度学习模型中，图像可以逐级表示为特定位置和角度的边缘、由边缘组合得出的花纹、由多种花纹进一步汇合得到的特定部位的模式等。最终，模型能够较容易根据更高级的表示完成给定的任务，如识别图像中的物体。值得一提的是，作为表征学习的一种，深度学习将自动找出每一级表示数据的合适方式。
+
+因此，深度学习的一个外在特点是端到端的训练。也就是说，并不是将单独调试的部分拼凑起来组成一个系统，而是将整个系统组建好之后一起训练。比如说，计算机视觉科学家之前曾一度将特征抽取与机器学习模型的构建分开处理，像是Canny边缘探测 [20] 和SIFT特征提取 [21] 曾占据统治性地位达10年以上，但这也就是人类能找到的最好方法了。当深度学习进入这个领域后，这些特征提取方法就被性能更强的自动优化的逐级过滤器替代了。
+
+相似地，在自然语言处理领域，词袋模型多年来都被认为是不二之选 [22]。词袋模型是将一个句子映射到一个词频向量的模型，但这样的做法完全忽视了单词的排列顺序或者句中的标点符号。不幸的是，我们也没有能力来手工抽取更好的特征。但是自动化的算法反而可以从所有可能的特征中搜寻最好的那个，这也带来了极大的进步。例如，语义相关的词嵌入能够在向量空间中完成如下推理：“柏林 - 德国 + 中国 = 北京”。可以看出，这些都是端到端训练整个系统带来的效果。
+
+除端到端的训练以外，我们也正在经历从含参数统计模型转向完全无参数的模型。当数据非常稀缺时，我们需要通过简化对现实的假设来得到实用的模型。当数据充足时，我们就可以用能更好地拟合现实的无参数模型来替代这些含参数模型。这也使我们可以得到更精确的模型，尽管需要牺牲一些可解释性。
+
+相对其它经典的机器学习方法而言，深度学习的不同在于：对非最优解的包容、对非凸非线性优化的使用，以及勇于尝试没有被证明过的方法。这种在处理统计问题上的新经验主义吸引了大量人才的涌入，使得大量实际问题有了更好的解决方案。尽管大部分情况下需要为深度学习修改甚至重新发明已经存在数十年的工具，但是这绝对是一件非常有意义并令人兴奋的事。
+
+最后，深度学习社区长期以来以在学术界和企业之间分享工具而自豪，并开源了许多优秀的软件库、统计模型和预训练网络。正是本着开放开源的精神，本书的内容和基于它的教学视频可以自由下载和随意分享。我们致力于为所有人降低学习深度学习的门槛，并希望大家从中获益。
+
+
+## 小结
+
+* 机器学习研究如何使计算机系统利用经验改善性能。它是人工智能领域的分支，也是实现人工智能的一种手段。
+* 作为机器学习的一类，表征学习关注如何自动找出表示数据的合适方式。
+* 深度学习是具有多级表示的表征学习方法。它可以逐级表示越来越抽象的概念或模式。
+* 深度学习所基于的神经网络模型和用数据编程的核心思想实际上已经被研究了数百年。
+* 深度学习已经逐渐演变成一个工程师和科学家皆可使用的普适工具。
+
+
+## 练习
+
+* 你现在正在编写的代码有没有可以被“学习”的部分，也就是说，是否有可以被机器学习改进的部分？
+* 你在生活中有没有这样的场景：虽有许多展示如何解决问题的样例，但缺少自动解决问题的算法？它们也许是深度学习的最好猎物。
+* 如果把人工智能的发展看作是新一次工业革命，那么深度学习和数据的关系是否像是蒸汽机与煤炭的关系呢？为什么？
+* 端到端的训练方法还可以用在哪里？物理学，工程学还是经济学？
+* 为什么应该让深度网络模仿人脑结构？为什么不该让深度网络模仿人脑结构？
+
+
+
+## 参考文献
+
+[1] Machinery, C. (1950). Computing machinery and intelligence-AM Turing. Mind, 59(236), 433.
+
+[2] Hebb, D. O. (1949). The organization of behavior; a neuropsycholocigal theory. A Wiley Book in Clinical Psychology., 62-78.
+
+[3] Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., & Salakhutdinov, R. (2014). Dropout: a simple way to prevent neural networks from overfitting. The Journal of Machine Learning Research, 15(1), 1929-1958.
+
+[4] Bishop, C. M. (1995). Training with noise is equivalent to Tikhonov regularization. Neural computation, 7(1), 108-116.
+
+[5] Bahdanau, D., Cho, K., & Bengio, Y. (2014). Neural machine translation by jointly learning to align and translate. arXiv preprint arXiv:1409.0473.
+
+[6] Sukhbaatar, S., Weston, J., & Fergus, R. (2015). End-to-end memory networks. In Advances in neural information processing systems (pp. 2440-2448).
+
+[7] Reed, S., & De Freitas, N. (2015). Neural programmer-interpreters. arXiv preprint arXiv:1511.06279.
+
+[8] Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., … & Bengio, Y. (2014). Generative adversarial nets. In Advances in neural information processing systems (pp. 2672-2680).
+
+[9] Zhu, J. Y., Park, T., Isola, P., & Efros, A. A. (2017). Unpaired image-to-image translation using cycle-consistent adversarial networks. arXiv preprint.
+
+[10] Karras, T., Aila, T., Laine, S., & Lehtinen, J. (2017). Progressive growing of gans for improved quality, stability, and variation. arXiv preprint arXiv:1710.10196.
+
+[11] Li, M. (2017). Scaling Distributed Machine Learning with System and Algorithm Co-design (Doctoral dissertation, PhD thesis, Intel).
+
+[12] You, Y., Gitman, I., & Ginsburg, B. Large batch training of convolutional networks. ArXiv e-prints.
+
+[13] Jia, X., Song, S., He, W., Wang, Y., Rong, H., Zhou, F., … & Chen, T. (2018). Highly Scalable Deep Learning Training System with Mixed-Precision: Training ImageNet in Four Minutes. arXiv preprint arXiv:1807.11205.
+
+[14] Xiong, W., Droppo, J., Huang, X., Seide, F., Seltzer, M., Stolcke, A., … & Zweig, G. (2017, March). The Microsoft 2016 conversational speech recognition system. In Acoustics, Speech and Signal Processing (ICASSP), 2017 IEEE International Conference on (pp. 5255-5259). IEEE.
+
+[15] Lin, Y., Lv, F., Zhu, S., Yang, M., Cour, T., Yu, K., … & Huang, T. (2010). Imagenet classification: fast descriptor coding and large-scale svm training. Large scale visual recognition challenge.
+
+[16] Hu, J., Shen, L., & Sun, G. (2017). Squeeze-and-excitation networks. arXiv preprint arXiv:1709.01507, 7.
+
+[17] Campbell, M., Hoane Jr, A. J., & Hsu, F. H. (2002). Deep blue. Artificial intelligence, 134(1-2), 57-83.
+
+[18] Silver, D., Huang, A., Maddison, C. J., Guez, A., Sifre, L., Van Den Driessche, G., … & Dieleman, S. (2016). Mastering the game of Go with deep neural networks and tree search. nature, 529(7587), 484.
+
+[19] Brown, N., & Sandholm, T. (2017, August). Libratus: The superhuman ai for no-limit poker. In Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence.
+
+[20] Canny, J. (1986). A computational approach to edge detection. IEEE Transactions on pattern analysis and machine intelligence, (6), 679-698.
+
+[21] Lowe, D. G. (2004). Distinctive image features from scale-invariant keypoints. International journal of computer vision, 60(2), 91-110.
+
+[22] Salton, G., & McGill, M. J. (1986). Introduction to modern information retrieval.
+
+
+-----------
+> 注：本节与原书基本相同，为了完整性而搬运过来，[原书传送门](https://zh.d2l.ai/chapter_introduction/deep-learning-intro.html)
--- a/Show More
+++ b/Show More