You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

736 lines
16 KiB

3 years ago
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# 2.2 数据操作"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"0.4.1\n"
]
}
],
"source": [
"import torch\n",
"\n",
"torch.manual_seed(0)\n",
"torch.cuda.manual_seed(0)\n",
"print(torch.__version__)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 2.2.1 创建`Tensor`\n",
"\n",
"创建一个5x3的未初始化的`Tensor`"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"tensor([[0.0000e+00, 1.0842e-19, 1.6162e+22],\n",
" [2.8643e-42, 5.6052e-45, 0.0000e+00],\n",
" [0.0000e+00, 0.0000e+00, 0.0000e+00],\n",
" [0.0000e+00, 0.0000e+00, 0.0000e+00],\n",
" [0.0000e+00, 1.0842e-19, 1.3314e+22]])\n"
]
}
],
"source": [
"x = torch.empty(5, 3)\n",
"print(x)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"创建一个5x3的随机初始化的`Tensor`:\n",
"\n"
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"tensor([[0.4963, 0.7682, 0.0885],\n",
" [0.1320, 0.3074, 0.6341],\n",
" [0.4901, 0.8964, 0.4556],\n",
" [0.6323, 0.3489, 0.4017],\n",
" [0.0223, 0.1689, 0.2939]])\n"
]
}
],
"source": [
"x = torch.rand(5, 3)\n",
"print(x)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"创建一个5x3的long型全0的`Tensor`:\n",
"\n"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"tensor([[0, 0, 0],\n",
" [0, 0, 0],\n",
" [0, 0, 0],\n",
" [0, 0, 0],\n",
" [0, 0, 0]])\n"
]
}
],
"source": [
"x = torch.zeros(5, 3, dtype=torch.long)\n",
"print(x)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"直接根据数据创建:"
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"tensor([5.5000, 3.0000])\n"
]
}
],
"source": [
"x = torch.tensor([5.5, 3])\n",
"print(x)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"还可以通过现有的`Tensor`来创建,此方法会默认重用输入`Tensor`的一些属性,例如数据类型,除非自定义数据类型。"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"tensor([[1., 1., 1.],\n",
" [1., 1., 1.],\n",
" [1., 1., 1.],\n",
" [1., 1., 1.],\n",
" [1., 1., 1.]], dtype=torch.float64)\n",
"tensor([[ 0.6035, 0.8110, -0.0451],\n",
" [ 0.8797, 1.0482, -0.0445],\n",
" [-0.7229, 2.8663, -0.5655],\n",
" [ 0.1604, -0.0254, 1.0739],\n",
" [ 2.2628, -0.9175, -0.2251]])\n"
]
}
],
"source": [
"x = x.new_ones(5, 3, dtype=torch.float64) # 返回的tensor默认具有相同的torch.dtype和torch.device\n",
"print(x)\n",
"\n",
"x = torch.randn_like(x, dtype=torch.float) # 指定新的数据类型\n",
"print(x) "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"我们可以通过`shape`或者`size()`来获取`Tensor`的形状:"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"torch.Size([5, 3])\n",
"torch.Size([5, 3])\n"
]
}
],
"source": [
"print(x.size())\n",
"print(x.shape)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"> 注意返回的torch.Size其实就是一个tuple, 支持所有tuple的操作。"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 2.2.2 操作\n",
"### 算术操作\n",
"* **加法形式一**"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"tensor([[ 1.3967, 1.0892, 0.4369],\n",
" [ 1.6995, 2.0453, 0.6539],\n",
" [-0.1553, 3.7016, -0.3599],\n",
" [ 0.7536, 0.0870, 1.2274],\n",
" [ 2.5046, -0.1913, 0.4760]])\n"
]
}
],
"source": [
"y = torch.rand(5, 3)\n",
"print(x + y)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"* **加法形式二**"
]
},
{
"cell_type": "code",
"execution_count": 9,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"tensor([[ 1.3967, 1.0892, 0.4369],\n",
" [ 1.6995, 2.0453, 0.6539],\n",
" [-0.1553, 3.7016, -0.3599],\n",
" [ 0.7536, 0.0870, 1.2274],\n",
" [ 2.5046, -0.1913, 0.4760]])\n"
]
}
],
"source": [
"print(torch.add(x, y))"
]
},
{
"cell_type": "code",
"execution_count": 10,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"tensor([[ 1.3967, 1.0892, 0.4369],\n",
" [ 1.6995, 2.0453, 0.6539],\n",
" [-0.1553, 3.7016, -0.3599],\n",
" [ 0.7536, 0.0870, 1.2274],\n",
" [ 2.5046, -0.1913, 0.4760]])\n"
]
}
],
"source": [
"result = torch.empty(5, 3)\n",
"torch.add(x, y, out=result)\n",
"print(result)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"* **加法形式三、inplace**"
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"tensor([[ 1.3967, 1.0892, 0.4369],\n",
" [ 1.6995, 2.0453, 0.6539],\n",
" [-0.1553, 3.7016, -0.3599],\n",
" [ 0.7536, 0.0870, 1.2274],\n",
" [ 2.5046, -0.1913, 0.4760]])\n"
]
}
],
"source": [
"# adds x to y\n",
"y.add_(x)\n",
"print(y)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"> **注PyTorch操作inplace版本都有后缀\"_\", 例如`x.copy_(y), x.t_()`**\n",
"\n",
"### 索引\n",
"我们还可以使用类似NumPy的索引操作来访问`Tensor`的一部分,需要注意的是:**索引出来的结果与原数据共享内存,也即修改一个,另一个会跟着修改。** "
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"tensor([1.6035, 1.8110, 0.9549])\n",
"tensor([1.6035, 1.8110, 0.9549])\n"
]
}
],
"source": [
"y = x[0, :]\n",
"y += 1\n",
"print(y)\n",
"print(x[0, :]) # 源tensor也被改了"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### 改变形状\n",
"用`view()`来改变`Tensor`的形状:"
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"torch.Size([5, 3]) torch.Size([15]) torch.Size([3, 5])\n"
]
}
],
"source": [
"y = x.view(15)\n",
"z = x.view(-1, 5) # -1所指的维度可以根据其他维度的值推出来\n",
"print(x.size(), y.size(), z.size())"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"**注意`view()`返回的新tensor与源tensor共享内存也即更改其中的一个另外一个也会跟着改变。**"
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"tensor([[2.6035, 2.8110, 1.9549],\n",
" [1.8797, 2.0482, 0.9555],\n",
" [0.2771, 3.8663, 0.4345],\n",
" [1.1604, 0.9746, 2.0739],\n",
" [3.2628, 0.0825, 0.7749]])\n",
"tensor([2.6035, 2.8110, 1.9549, 1.8797, 2.0482, 0.9555, 0.2771, 3.8663, 0.4345,\n",
" 1.1604, 0.9746, 2.0739, 3.2628, 0.0825, 0.7749])\n"
]
}
],
"source": [
"x += 1\n",
"print(x)\n",
"print(y) # 也加了1"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"如果不想共享内存,推荐先用`clone`创造一个副本然后再使用`view`。"
]
},
{
"cell_type": "code",
"execution_count": 15,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"tensor([[ 1.6035, 1.8110, 0.9549],\n",
" [ 0.8797, 1.0482, -0.0445],\n",
" [-0.7229, 2.8663, -0.5655],\n",
" [ 0.1604, -0.0254, 1.0739],\n",
" [ 2.2628, -0.9175, -0.2251]])\n",
"tensor([2.6035, 2.8110, 1.9549, 1.8797, 2.0482, 0.9555, 0.2771, 3.8663, 0.4345,\n",
" 1.1604, 0.9746, 2.0739, 3.2628, 0.0825, 0.7749])\n"
]
}
],
"source": [
"x_cp = x.clone().view(15)\n",
"x -= 1\n",
"print(x)\n",
"print(x_cp)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"另外一个常用的函数就是`item()`, 它可以将一个标量`Tensor`转换成一个Python number"
]
},
{
"cell_type": "code",
"execution_count": 16,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"tensor([2.3466])\n",
"2.3466382026672363\n"
]
}
],
"source": [
"x = torch.randn(1)\n",
"print(x)\n",
"print(x.item())"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 2.2.3 广播机制"
]
},
{
"cell_type": "code",
"execution_count": 17,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"tensor([[1, 2]])\n",
"tensor([[1],\n",
" [2],\n",
" [3]])\n",
"tensor([[2, 3],\n",
" [3, 4],\n",
" [4, 5]])\n"
]
}
],
"source": [
"x = torch.arange(1, 3).view(1, 2)\n",
"print(x)\n",
"y = torch.arange(1, 4).view(3, 1)\n",
"print(y)\n",
"print(x + y)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 2.2.4 运算的内存开销"
]
},
{
"cell_type": "code",
"execution_count": 18,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"False\n"
]
}
],
"source": [
"x = torch.tensor([1, 2])\n",
"y = torch.tensor([3, 4])\n",
"id_before = id(y)\n",
"y = y + x\n",
"print(id(y) == id_before)"
]
},
{
"cell_type": "code",
"execution_count": 19,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"True\n"
]
}
],
"source": [
"x = torch.tensor([1, 2])\n",
"y = torch.tensor([3, 4])\n",
"id_before = id(y)\n",
"y[:] = y + x\n",
"print(id(y) == id_before)"
]
},
{
"cell_type": "code",
"execution_count": 20,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"True\n"
]
}
],
"source": [
"x = torch.tensor([1, 2])\n",
"y = torch.tensor([3, 4])\n",
"id_before = id(y)\n",
"torch.add(x, y, out=y) # y += x, y.add_(x)\n",
"print(id(y) == id_before)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 2.2.5 `Tensor`和NumPy相互转换\n",
"**`numpy()`和`from_numpy()`这两个函数产生的`Tensor`和NumPy array实际是使用的相同的内存改变其中一个时另一个也会改变**\n",
"### `Tensor`转NumPy"
]
},
{
"cell_type": "code",
"execution_count": 21,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"tensor([1., 1., 1., 1., 1.]) [1. 1. 1. 1. 1.]\n",
"tensor([2., 2., 2., 2., 2.]) [2. 2. 2. 2. 2.]\n",
"tensor([3., 3., 3., 3., 3.]) [3. 3. 3. 3. 3.]\n"
]
}
],
"source": [
"a = torch.ones(5)\n",
"b = a.numpy()\n",
"print(a, b)\n",
"\n",
"a += 1\n",
"print(a, b)\n",
"b += 1\n",
"print(a, b)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### NumPy数组转`Tensor`"
]
},
{
"cell_type": "code",
"execution_count": 22,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[1. 1. 1. 1. 1.] tensor([1., 1., 1., 1., 1.], dtype=torch.float64)\n",
"[2. 2. 2. 2. 2.] tensor([2., 2., 2., 2., 2.], dtype=torch.float64)\n",
"[3. 3. 3. 3. 3.] tensor([3., 3., 3., 3., 3.], dtype=torch.float64)\n"
]
}
],
"source": [
"import numpy as np\n",
"a = np.ones(5)\n",
"b = torch.from_numpy(a)\n",
"print(a, b)\n",
"\n",
"a += 1\n",
"print(a, b)\n",
"b += 1\n",
"print(a, b)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"直接用`torch.tensor()`将NumPy数组转换成`Tensor`,该方法总是会进行数据拷贝,返回的`Tensor`和原来的数据不再共享内存。"
]
},
{
"cell_type": "code",
"execution_count": 23,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"[4. 4. 4. 4. 4.] tensor([3., 3., 3., 3., 3.], dtype=torch.float64)\n"
]
}
],
"source": [
"# 用torch.tensor()转换时不会共享内存\n",
"c = torch.tensor(a)\n",
"a += 1\n",
"print(a, c)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## 2.2.6 `Tensor` on GPU"
]
},
{
"cell_type": "code",
"execution_count": 24,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": [
"# 以下代码只有在PyTorch GPU版本上才会执行\n",
"if torch.cuda.is_available():\n",
" device = torch.device(\"cuda\") # GPU\n",
" y = torch.ones_like(x, device=device) # 直接创建一个在GPU上的Tensor\n",
" x = x.to(device) # 等价于 .to(\"cuda\")\n",
" z = x + y\n",
" print(z)\n",
" print(z.to(\"cpu\", torch.double)) # to()还可以同时更改数据类型"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": true
},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python [default]",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.3"
}
},
"nbformat": 4,
"nbformat_minor": 1
}