{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# 2.2 数据操作" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "0.4.1\n" ] } ], "source": [ "import torch\n", "\n", "torch.manual_seed(0)\n", "torch.cuda.manual_seed(0)\n", "print(torch.__version__)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 2.2.1 创建`Tensor`\n", "\n", "创建一个5x3的未初始化的`Tensor`:" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "tensor([[0.0000e+00, 1.0842e-19, 1.6162e+22],\n", " [2.8643e-42, 5.6052e-45, 0.0000e+00],\n", " [0.0000e+00, 0.0000e+00, 0.0000e+00],\n", " [0.0000e+00, 0.0000e+00, 0.0000e+00],\n", " [0.0000e+00, 1.0842e-19, 1.3314e+22]])\n" ] } ], "source": [ "x = torch.empty(5, 3)\n", "print(x)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "创建一个5x3的随机初始化的`Tensor`:\n", "\n" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "tensor([[0.4963, 0.7682, 0.0885],\n", " [0.1320, 0.3074, 0.6341],\n", " [0.4901, 0.8964, 0.4556],\n", " [0.6323, 0.3489, 0.4017],\n", " [0.0223, 0.1689, 0.2939]])\n" ] } ], "source": [ "x = torch.rand(5, 3)\n", "print(x)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "创建一个5x3的long型全0的`Tensor`:\n", "\n" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "tensor([[0, 0, 0],\n", " [0, 0, 0],\n", " [0, 0, 0],\n", " [0, 0, 0],\n", " [0, 0, 0]])\n" ] } ], "source": [ "x = torch.zeros(5, 3, dtype=torch.long)\n", "print(x)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "直接根据数据创建:" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "tensor([5.5000, 3.0000])\n" ] } ], "source": [ "x = torch.tensor([5.5, 3])\n", "print(x)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "还可以通过现有的`Tensor`来创建,此方法会默认重用输入`Tensor`的一些属性,例如数据类型,除非自定义数据类型。" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "tensor([[1., 1., 1.],\n", " [1., 1., 1.],\n", " [1., 1., 1.],\n", " [1., 1., 1.],\n", " [1., 1., 1.]], dtype=torch.float64)\n", "tensor([[ 0.6035, 0.8110, -0.0451],\n", " [ 0.8797, 1.0482, -0.0445],\n", " [-0.7229, 2.8663, -0.5655],\n", " [ 0.1604, -0.0254, 1.0739],\n", " [ 2.2628, -0.9175, -0.2251]])\n" ] } ], "source": [ "x = x.new_ones(5, 3, dtype=torch.float64) # 返回的tensor默认具有相同的torch.dtype和torch.device\n", "print(x)\n", "\n", "x = torch.randn_like(x, dtype=torch.float) # 指定新的数据类型\n", "print(x) " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "我们可以通过`shape`或者`size()`来获取`Tensor`的形状:" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "torch.Size([5, 3])\n", "torch.Size([5, 3])\n" ] } ], "source": [ "print(x.size())\n", "print(x.shape)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "> 注意:返回的torch.Size其实就是一个tuple, 支持所有tuple的操作。" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 2.2.2 操作\n", "### 算术操作\n", "* **加法形式一**" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "tensor([[ 1.3967, 1.0892, 0.4369],\n", " [ 1.6995, 2.0453, 0.6539],\n", " [-0.1553, 3.7016, -0.3599],\n", " [ 0.7536, 0.0870, 1.2274],\n", " [ 2.5046, -0.1913, 0.4760]])\n" ] } ], "source": [ "y = torch.rand(5, 3)\n", "print(x + y)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "* **加法形式二**" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "tensor([[ 1.3967, 1.0892, 0.4369],\n", " [ 1.6995, 2.0453, 0.6539],\n", " [-0.1553, 3.7016, -0.3599],\n", " [ 0.7536, 0.0870, 1.2274],\n", " [ 2.5046, -0.1913, 0.4760]])\n" ] } ], "source": [ "print(torch.add(x, y))" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "tensor([[ 1.3967, 1.0892, 0.4369],\n", " [ 1.6995, 2.0453, 0.6539],\n", " [-0.1553, 3.7016, -0.3599],\n", " [ 0.7536, 0.0870, 1.2274],\n", " [ 2.5046, -0.1913, 0.4760]])\n" ] } ], "source": [ "result = torch.empty(5, 3)\n", "torch.add(x, y, out=result)\n", "print(result)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "* **加法形式三、inplace**" ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "tensor([[ 1.3967, 1.0892, 0.4369],\n", " [ 1.6995, 2.0453, 0.6539],\n", " [-0.1553, 3.7016, -0.3599],\n", " [ 0.7536, 0.0870, 1.2274],\n", " [ 2.5046, -0.1913, 0.4760]])\n" ] } ], "source": [ "# adds x to y\n", "y.add_(x)\n", "print(y)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "> **注:PyTorch操作inplace版本都有后缀\"_\", 例如`x.copy_(y), x.t_()`**\n", "\n", "### 索引\n", "我们还可以使用类似NumPy的索引操作来访问`Tensor`的一部分,需要注意的是:**索引出来的结果与原数据共享内存,也即修改一个,另一个会跟着修改。** " ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "tensor([1.6035, 1.8110, 0.9549])\n", "tensor([1.6035, 1.8110, 0.9549])\n" ] } ], "source": [ "y = x[0, :]\n", "y += 1\n", "print(y)\n", "print(x[0, :]) # 源tensor也被改了" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 改变形状\n", "用`view()`来改变`Tensor`的形状:" ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "torch.Size([5, 3]) torch.Size([15]) torch.Size([3, 5])\n" ] } ], "source": [ "y = x.view(15)\n", "z = x.view(-1, 5) # -1所指的维度可以根据其他维度的值推出来\n", "print(x.size(), y.size(), z.size())" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**注意`view()`返回的新tensor与源tensor共享内存,也即更改其中的一个,另外一个也会跟着改变。**" ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "tensor([[2.6035, 2.8110, 1.9549],\n", " [1.8797, 2.0482, 0.9555],\n", " [0.2771, 3.8663, 0.4345],\n", " [1.1604, 0.9746, 2.0739],\n", " [3.2628, 0.0825, 0.7749]])\n", "tensor([2.6035, 2.8110, 1.9549, 1.8797, 2.0482, 0.9555, 0.2771, 3.8663, 0.4345,\n", " 1.1604, 0.9746, 2.0739, 3.2628, 0.0825, 0.7749])\n" ] } ], "source": [ "x += 1\n", "print(x)\n", "print(y) # 也加了1" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "如果不想共享内存,推荐先用`clone`创造一个副本然后再使用`view`。" ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "tensor([[ 1.6035, 1.8110, 0.9549],\n", " [ 0.8797, 1.0482, -0.0445],\n", " [-0.7229, 2.8663, -0.5655],\n", " [ 0.1604, -0.0254, 1.0739],\n", " [ 2.2628, -0.9175, -0.2251]])\n", "tensor([2.6035, 2.8110, 1.9549, 1.8797, 2.0482, 0.9555, 0.2771, 3.8663, 0.4345,\n", " 1.1604, 0.9746, 2.0739, 3.2628, 0.0825, 0.7749])\n" ] } ], "source": [ "x_cp = x.clone().view(15)\n", "x -= 1\n", "print(x)\n", "print(x_cp)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "另外一个常用的函数就是`item()`, 它可以将一个标量`Tensor`转换成一个Python number:" ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "tensor([2.3466])\n", "2.3466382026672363\n" ] } ], "source": [ "x = torch.randn(1)\n", "print(x)\n", "print(x.item())" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 2.2.3 广播机制" ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "tensor([[1, 2]])\n", "tensor([[1],\n", " [2],\n", " [3]])\n", "tensor([[2, 3],\n", " [3, 4],\n", " [4, 5]])\n" ] } ], "source": [ "x = torch.arange(1, 3).view(1, 2)\n", "print(x)\n", "y = torch.arange(1, 4).view(3, 1)\n", "print(y)\n", "print(x + y)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 2.2.4 运算的内存开销" ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "False\n" ] } ], "source": [ "x = torch.tensor([1, 2])\n", "y = torch.tensor([3, 4])\n", "id_before = id(y)\n", "y = y + x\n", "print(id(y) == id_before)" ] }, { "cell_type": "code", "execution_count": 19, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "True\n" ] } ], "source": [ "x = torch.tensor([1, 2])\n", "y = torch.tensor([3, 4])\n", "id_before = id(y)\n", "y[:] = y + x\n", "print(id(y) == id_before)" ] }, { "cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "True\n" ] } ], "source": [ "x = torch.tensor([1, 2])\n", "y = torch.tensor([3, 4])\n", "id_before = id(y)\n", "torch.add(x, y, out=y) # y += x, y.add_(x)\n", "print(id(y) == id_before)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 2.2.5 `Tensor`和NumPy相互转换\n", "**`numpy()`和`from_numpy()`这两个函数产生的`Tensor`和NumPy array实际是使用的相同的内存,改变其中一个时另一个也会改变!!!**\n", "### `Tensor`转NumPy" ] }, { "cell_type": "code", "execution_count": 21, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "tensor([1., 1., 1., 1., 1.]) [1. 1. 1. 1. 1.]\n", "tensor([2., 2., 2., 2., 2.]) [2. 2. 2. 2. 2.]\n", "tensor([3., 3., 3., 3., 3.]) [3. 3. 3. 3. 3.]\n" ] } ], "source": [ "a = torch.ones(5)\n", "b = a.numpy()\n", "print(a, b)\n", "\n", "a += 1\n", "print(a, b)\n", "b += 1\n", "print(a, b)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### NumPy数组转`Tensor`" ] }, { "cell_type": "code", "execution_count": 22, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[1. 1. 1. 1. 1.] tensor([1., 1., 1., 1., 1.], dtype=torch.float64)\n", "[2. 2. 2. 2. 2.] tensor([2., 2., 2., 2., 2.], dtype=torch.float64)\n", "[3. 3. 3. 3. 3.] tensor([3., 3., 3., 3., 3.], dtype=torch.float64)\n" ] } ], "source": [ "import numpy as np\n", "a = np.ones(5)\n", "b = torch.from_numpy(a)\n", "print(a, b)\n", "\n", "a += 1\n", "print(a, b)\n", "b += 1\n", "print(a, b)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "直接用`torch.tensor()`将NumPy数组转换成`Tensor`,该方法总是会进行数据拷贝,返回的`Tensor`和原来的数据不再共享内存。" ] }, { "cell_type": "code", "execution_count": 23, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[4. 4. 4. 4. 4.] tensor([3., 3., 3., 3., 3.], dtype=torch.float64)\n" ] } ], "source": [ "# 用torch.tensor()转换时不会共享内存\n", "c = torch.tensor(a)\n", "a += 1\n", "print(a, c)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 2.2.6 `Tensor` on GPU" ] }, { "cell_type": "code", "execution_count": 24, "metadata": { "collapsed": true }, "outputs": [], "source": [ "# 以下代码只有在PyTorch GPU版本上才会执行\n", "if torch.cuda.is_available():\n", " device = torch.device(\"cuda\") # GPU\n", " y = torch.ones_like(x, device=device) # 直接创建一个在GPU上的Tensor\n", " x = x.to(device) # 等价于 .to(\"cuda\")\n", " z = x + y\n", " print(z)\n", " print(z.to(\"cpu\", torch.double)) # to()还可以同时更改数据类型" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "collapsed": true }, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python [default]", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.3" } }, "nbformat": 4, "nbformat_minor": 1 }