Tensor

张量是一种特殊的数据结构，与数组和矩阵非常相似。在PyTorch中，我们使用张量对模型的输入和输出以及模型的参数进行编码
张量类似于NumPy的ndarray，除了张量可以在GPU或其他专用硬件上运行以加速计算

1 2	import torch import numpy as np

Tensor Initialization

张量可以通过多种方式初始化

张量可以直接从数据中创建。数据类型是自动推断的。torch.tensor(data)
1
2
data = [[1, 2], [3, 4]]
x_data = torch.tensor(data)
张量可以从NumPy中的arrays创建，反之亦然。torch.from_numpy(np_array)
1
2
np_array = np.array(data)
x_np = torch.from_numpy(np_array)

从另一个张量，新张量将保留参数张量的属性（形状、数据类型）。torch.ones_like(tensor,type) & torch.rand_like(tensor,type)

x_ones = torch.ones_like(x_data) 
# retains the properties of x_data
# torch.ones_like 是 PyTorch 中的一个函数
# 它根据给定的张量（tensor）创建一个与其形状、数据类型相同的新张量
# 并且所有元素的值都为1
print(f"Ones Tensor: \n {x_ones} \n")

x_rand = torch.rand_like(x_data, dtype=torch.float) 
# overrides the datatype of x_data
# torch.rand_like 是 PyTorch 中的一个函数
# 它根据给定的张量（tensor）创建一个与其形状和数据类型相同的新张量
# 其中元素是从均匀分布（[0, 1)）中随机采样的浮点数
print(f"Random Tensor: \n {x_rand} \n")

out:
Ones Tensor:
tensor([[1, 1],
    [1, 1]])

Random Tensor:
tensor([[0.8823, 0.9150],
    [0.3829, 0.9593]])

随机或恒定值，用shape决定输出张量的维数。(shape)

shape = (2, 3,)
rand_tensor = torch.rand(shape)
ones_tensor = torch.ones(shape)
zeros_tensor = torch.zeros(shape)

print(f"Random Tensor: \n {rand_tensor} \n")
print(f"Ones Tensor: \n {ones_tensor} \n")
print(f"Zeros Tensor: \n {zeros_tensor}")

out:
Random Tensor:
tensor([[0.3904, 0.6009, 0.2566],
    [0.7936, 0.9408, 0.1332]])

Ones Tensor:
tensor([[1., 1., 1.],
    [1., 1., 1.]])

Zeros Tensor:
tensor([[0., 0., 0.],
    [0., 0., 0.]])

Tensor Attributes

张量属性描述了它们的形状、数据类型以及存储它们的设备。tensor.shape() & tensor.dtype() & tensor.device()

tensor = torch.rand(3, 4)

print(f"Shape of tensor: {tensor.shape}")
print(f"Datatype of tensor: {tensor.dtype}")
print(f"Device tensor is stored on: {tensor.device}")

out:
Shape of tensor: torch.Size([3, 4])
Datatype of tensor: torch.float32
Device tensor is stored on: cpu

Tensor Operations

张量操作包括转置、索引、切片、数学运算、线性代数、随机采样等。每一个都可以在GPU上运行（通常比在CPU上运行速度更快）。torch.cuda.is_available() & tensor.to(‘cuda’)

# We move our tensor to the GPU if available
if torch.cuda.is_available():
    tensor = tensor.to('cuda')
    print(f"Device tensor is stored on: {tensor.device}")

out:
Device tensor is stored on: cuda:0

标准numpy类索引和切片：tensor[:,n] & tensor[n,:]

tensor = torch.ones(4, 4)
tensor[:,1] = 0
# tensor[:, 1] 中的 : 是一个切片操作符，表示选取张量的所有行
# 1 是指第二列（索引从 0 开始），表示选择张量的第 1 列
# tensor[:,1] = 0 表示对张量 tensor 的某一部分进行索引，并将这些部分的值设为 0
print(tensor)

out:
tensor([[1., 0., 1., 1.],
        [1., 0., 1., 1.],
        [1., 0., 1., 1.],
        [1., 0., 1., 1.]])
# 对第一列置0

连接张量：使用torch.cat沿给定维度连接一系列张量。另请参见torch.stack，与torch.cat略有不同

t1 = torch.cat([tensor, tensor, tensor], dim=1)
# dim=1：表示沿第 1 维（即列方向）拼接张量
# dim=0：表示沿第 0 维（即行方向）拼接张量
print(t1)

out:
tensor([[1., 0., 1., 1., 1., 0., 1., 1., 1., 0., 1., 1.],
        [1., 0., 1., 1., 1., 0., 1., 1., 1., 0., 1., 1.],
        [1., 0., 1., 1., 1., 0., 1., 1., 1., 0., 1., 1.],
        [1., 0., 1., 1., 1., 0., 1., 1., 1., 0., 1., 1.]])

乘以张量：* & @ & tensor.mul(tensor) & torch.matmul(tensor1, tensor2)

# This computes the element-wise product
print(f"tensor.mul(tensor) \n {tensor.mul(tensor)} \n")
# Alternative syntax:
print(f"tensor * tensor \n {tensor * tensor}")

out:
tensor.mul(tensor)
 tensor([[1., 0., 1., 1.],
        [1., 0., 1., 1.],
        [1., 0., 1., 1.],
        [1., 0., 1., 1.]])

tensor * tensor
 tensor([[1., 0., 1., 1.],
        [1., 0., 1., 1.],
        [1., 0., 1., 1.],
        [1., 0., 1., 1.]])

print(f"tensor.matmul(tensor.T) \n {tensor.matmul(tensor.T)} \n")
# Alternative syntax:
print(f"tensor @ tensor.T \n {tensor @ tensor.T}")
# tensor.T 是一个简洁的语法，用来表示张量的转置

out:
tensor.matmul(tensor.T)
 tensor([[3., 3., 3., 3.],
        [3., 3., 3., 3.],
        [3., 3., 3., 3.],
        [3., 3., 3., 3.]])

tensor @ tensor.T
 tensor([[3., 3., 3., 3.],
        [3., 3., 3., 3.],
        [3., 3., 3., 3.],
        [3., 3., 3., 3.]])

In-place operation(““)：x.copy(y), x.t_(), x.add(n) will change x.

print(tensor, "\n")
tensor.add_(5)
print(tensor)

out:
tensor([[1., 0., 1., 1.],
        [1., 0., 1., 1.],
        [1., 0., 1., 1.],
        [1., 0., 1., 1.]])

tensor([[6., 5., 6., 6.],
        [6., 5., 6., 6.],
        [6., 5., 6., 6.],
        [6., 5., 6., 6.]])

In-place operation 可以节省一些内存，但在计算导数时可能会出现问题，因为会立即丢失历史记录。因此，不鼓励使用它们

Bridge with NumPy

CPU上的tensor和NumPy数组上的张量可以共享它们的底层内存位置，改变一个就会改变另一个

Tensor to NumPy array

t.numpy()

t = torch.ones(5)
print(f"t: {t}")
n = t.numpy()
print(f"n: {n}")

out:
t: tensor([1., 1., 1., 1., 1.])
n: [1. 1. 1. 1. 1.]

t.add_(1)
print(f"t: {t}")
print(f"n: {n}")

t: tensor([2., 2., 2., 2., 2.])
n: [2. 2. 2. 2. 2.]

NumPy array to Tensor

torch.from_numpy(n)

n = np.ones(5)
t = torch.from_numpy(n)

np.add(n, 1, out=n)
print(f"t: {t}")
print(f"n: {n}")

out:
t: tensor([2., 2., 2., 2., 2.], dtype=torch.float64)
n: [2. 2. 2. 2. 2.]

A Gentle Introduction to torch.autograd

torch.autograd是PyTorch的自动微分引擎，为神经网络训练提供动力

Background

神经网络（NN）是对某些输入数据执行的嵌套函数的集合。这些函数由参数（由权重和偏差组成）定义，这些参数在PyTorch中存储在张量中
训练神经网络分为两个步骤：
1. 正向传播：在正向传播中，神经网络对正确的输出做出最佳预测。它通过每个函数运行输入数据来进行猜测
2. 反向传播：在反向传播中，神经网络根据其猜测的误差按比例调整其参数。它通过从输出向后遍历，收集误差相对于函数参数（梯度）的导数，并使用梯度下降优化参数来实现这一点

Usage in PyTorch

例子：我们从torchvision加载一个预训练的resnet18模型。我们创建了一个随机数据张量来表示具有3个通道、高度和宽度为64的单个图像，并将其相应的标签初始化为一些随机值。预训练模型中的标签具有形状（1,1000）

import torch
from torchvision.models import resnet18, ResNet18_Weights
model = resnet18(weights=ResNet18_Weights.DEFAULT)
data = torch.rand(1, 3, 64, 64) # 四维tensor
labels = torch.rand(1, 1000) # 二维tensor

# forward pass：把数据输入到模型中用于预测结果
prediction = model(data) # forward pass

# 计算误差反向传播.backward()：Autograd计算每个模型参数的梯度并将其存储在参数的.grad属性中
loss = (prediction - labels).sum()
# .sum()求和 .sum(dim = 0)是对行求和，.sum(dim = 1)是对列求和
loss.backward() # backward pass

# 加载优化器SGD with a learning rate of 0.01 and momentum of 0.9
# 优化器的任务就是根据这些梯度调整模型参数
# model.parameters() 返回的是模型的所有可训练参数，这些参数需要通过优化器来进行更新
optim = torch.optim.SGD(model.parameters(), lr=1e-2, momentum=0.9)

# 调用.step（）来启动梯度下降。优化器根据存储在.grad中的梯度调整每个参数
optim.step() #gradient descent

Differentiation in Autograd

autograd如何收集梯度？

import torch

# 2. 3.代表是浮点数，requires_grad=True说明需要为这个张量计算梯度
a = torch.tensor([2., 3.], requires_grad=True)
b = torch.tensor([6., 4.], requires_grad=True)

Q = 3*a**3 - b**2

# 假设a和b是NN的参数，Q是误差

# 调用.backward（）时，autograd会计算这些梯度并将其存储在相应张量的.grad属性中
# 需要在Q.backward（）中显式传递一个梯度参数，因为它是一个向量。
# 梯度是一个与Q形状相同的张量，它表示Q相对于自身的梯度
# 我们也可以将Q聚合为标量，并隐式向后调用，如Q.sum（）.backward（）
external_grad = torch.tensor([1., 1.])
Q.backward(gradient=external_grad)

# 梯度现在存放在a.grad和b.grad中
# check if collected gradients are correct
print(9*a**2 == a.grad)
print(-2*b == b.grad)

out:
tensor([True, True])
tensor([True, True])

Optional Reading - Vector Calculus using autograd

Computational Graph

autograd在由Function对象组成的有向无环图（DAG）中记录数据（张量）和所有执行的操作（以及由此产生的新张量）。在这个DAG中，叶子是输入张量，根是输出张量。通过从根到叶跟踪此图，您可以使用链式规则自动计算梯度
In a forward pass, autograd does two things simultaneously:
1. 运行所请求的操作以计算结果张量
2. 在DAG中保持操作的梯度函数
The backward pass kicks off when .backward() is called on the DAG root. autograd then:
1. 根据每个.grad_fn计算梯度
2. 将它们累积在各自张量的.grad属性中
3. 使用链式规则，一直传播到叶张量
DAG的可视化表示。在图中，箭头指向正向传递的方向。节点表示正向传递中每个操作的反向函数。蓝色的叶节点表示我们的叶张量a和b

DAGs在PyTorch中是动态的。每次.backward（）调用后，autograd都会开始填充一个新的图
torch.autograd跟踪所有requires_grad标志设置为True的张量上的操作。对于不需要梯度的张量，将此属性设置为False会将其从梯度计算DAG中排除

x = torch.rand(5, 5)
y = torch.rand(5, 5)
z = torch.rand((5, 5), requires_grad=True)

a = x + y
print(f"Does `a` require gradients?: {a.requires_grad}")
b = x + z
print(f"Does `b` require gradients?: {b.requires_grad}")

out:
Does `a` require gradients?: False
Does `b` require gradients?: True

在神经网络中，不计算梯度的参数通常被称为冻结参数。在微调中，我们冻结了大部分模型，通常只修改分类器层以对新标签进行预测

from torch import nn, optim

model = resnet18(weights=ResNet18_Weights.DEFAULT)

# Freeze all the parameters in the network
for param in model.parameters():
    param.requires_grad = False

# resnet中，分类器是最后一个线性层模型fc。用一个新的线性层（未冻结）替换它，作为分类器
model.fc = nn.Linear(512, 10)

# 除了model.fc的参数外，模型中的所有参数都被冻结了。计算梯度的唯一参数是model.fc的权重和偏差
# Optimize only the classifier
optimizer = optim.SGD(model.parameters(), lr=1e-2, momentum=0.9)
# 尽管我们在优化器中注册了所有参数，但计算梯度的唯一参数是分类器的权重和偏差