自回归模型(Autoregressive Models)

显式概率密度估计(Explicit Density Estimation: Autoregressive Models)

  • 目标:得到显式函数p(x) = f(x,W)

  • 给定数据集x(1),x(2),…,x(n),通过极大似然估计来训练模型:

image.png

  • Autoregressive Models:
    1. 假设每个 x 由多个子部分(维度)组成:x=(x1,x2,…,xT)
    2. 使用链式法则分解概率表达式:
      image.png
    3. 通过将上述方程代入损失函数来求 W∗。给定历史作为已知条件来预测下一个

PixelCNN and PixelRNN

  • 从左上角开始,一次生成一个图像像素,逐像素生成

  • 使用 RNN或CNN计算每个像素的概率,该概率取决于状态和从左侧和上方的 RGB值:

image.png

  • 在每个像素处,预测红色,然后预测蓝色,然后预测绿色:softmax到[0,1,…,255]

image.png

Generative Pretraining from Pixels

  • 使用自回归模型的Transformer:

image.png

变分自编码器(Variational Autoencoder)

Autoencoder

  • Autoencoder:编码器压缩数据(CNN,降采样),解码器重构数据(上采样);通常用于未标记数据的降维

  • 编码器 e 和解码器 d 通常是神经网络

  • PCA 是一种线性自动编码器:

image.png

  • 其中,损失函数ε可以是 L2 损失或交叉熵损失。希望能够恢复原状,和x进行比较

image.png

  • Autoencoder(黄)相比于PCA(蓝)更结构化,效果更好

image.png

Autoencoder is not a Generative Model

  • 如果没有适当的正则化,来自潜在空间的样本可能毫无意义。由于其对隐空间无约束,所以若不是词典内的输入,可能导致不合理输出(位于中间地带)

image.png

  • Reparameterization trick: 𝒛 = 𝝁𝒙 + 𝝈𝒙 ∗ 𝜹, where 𝜹~𝑵(𝟎,𝟏),采样不可微而z可微

  • 经过Reparameterization trick后预测不是点而是预测分布,使得隐空间的区域都有一定覆盖,可以重建出过渡体

  • 𝜹是噪音,可能导致坍缩,且隐空间中间缝隙会增大,所以需要进行KL loss正则化使得𝝁𝒙=0,σx=1靠拢:

image.png

  • 变分自动编码器:自动编码器 + 潜在空间具有良好的特性,可以实现生成过程

image.png

Generative Adversarial Network(GAN)

  • 随机数发生器(Generate uniform random numbers),线性同余:

image.png

  • 其他分布:𝑋 = 𝐹^−1 (U),其中F(X)是X的累积分布

image.png

  • GAN:用神经网络拟合F(X)反函数:

    1. G是生成器:生成样本X和真实样本X(g)尽可能一样,用神经网络完成
    2. D是判别器:通过二分类使X置为0,X(g)置为1
    3. 训练G和D直到判别器的正确率均为1/2说明已经分别不出来生成的X和真实的X(g)(平衡位置)
  • 如果 P(Xg) 已知,则计算 F^−1(x)。但是它是未知的,只有来自此分布的一些样本

  • GAN:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
import torch
import torch.nn as nn
import torch.optim as optim
import torch.utils.data as Data
import torchvision
import torchvision.transforms as transforms
import matplotlib.pyplot as plt
import numpy as np

class Generator(nn.Module):
def __init__(self, input_dim, output_dim):
super(Generator, self).__init__()
self.model = nn.Sequential(
nn.Linear(input_dim, 128),
nn.ReLU(True),
nn.Linear(128, 256),
nn.ReLU(True),
nn.Linear(256, 512),
nn.ReLU(True),
nn.Linear(512, 1024),
nn.ReLU(True),
nn.Linear(1024, output_dim),
nn.Tanh()
)

def forward(self, z):
return self.model(z)

class Discriminator(nn.Module):
def __init__(self, input_dim):
super(Discriminator, self).__init__()
self.model = nn.Sequential(
nn.Linear(input_dim, 1024),
nn.LeakyReLU(0.2, inplace=True),
nn.Linear(1024, 512),
nn.LeakyReLU(0.2, inplace=True),
nn.Linear(512, 256),
nn.LeakyReLU(0.2, inplace=True),
nn.Linear(256, 1),
nn.Sigmoid()
)

def forward(self, x):
return self.model(x)

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

z_dim = 100 # 潜在空间的维度
image_dim = 784 # MNIST图像的维度(28x28)

generator = Generator(z_dim, image_dim).to(device)
discriminator = Discriminator(image_dim).to(device)

criterion = nn.BCELoss()
optimizer_g = optim.Adam(generator.parameters(), lr=0.0002, betas=(0.5, 0.999))
optimizer_d = optim.Adam(discriminator.parameters(), lr=0.0002, betas=(0.5, 0.999))

num_epochs = 100
for epoch in range(num_epochs):
for i, (real_images, _) in enumerate(train_loader):
real_images = real_images.view(-1, image_dim).to(device)
batch_size = real_images.size(0)

# 标签
real_labels = torch.ones(batch_size, 1).to(device)
fake_labels = torch.zeros(batch_size, 1).to(device)

# 训练判别器
optimizer_d.zero_grad()

# 真实图像
outputs = discriminator(real_images)
d_loss_real = criterion(outputs, real_labels)
d_loss_real.backward()

# 生成假图像
z = torch.randn(batch_size, z_dim).to(device)
fake_images = generator(z)
outputs = discriminator(fake_images.detach())
d_loss_fake = criterion(outputs, fake_labels)
d_loss_fake.backward()

d_loss = d_loss_real + d_loss_fake
optimizer_d.step()

# 训练生成器
optimizer_g.zero_grad()
outputs = discriminator(fake_images)
g_loss = criterion(outputs, real_labels)
g_loss.backward()
optimizer_g.step()

print(f'Epoch [{epoch+1}/{num_epochs}], d_loss: {d_loss.item()}, g_loss: {g_loss.item()}')

z = torch.randn(64, z_dim).to(device)
fake_images = generator(z)
fake_images = fake_images.view(-1, 1, 28, 28)
fake_images = fake_images.cpu().detach().numpy()

fig, axes = plt.subplots(8, 8, figsize=(8, 8))
for i, ax in enumerate(axes.flatten()):
ax.imshow(fake_images[i][0], cmap='gray')
ax.axis('off')
plt.show()

image.png

The Generator of DCGAN

image.png

image.png

StyleGAN

image.png

Diffusion Models

  • Diffusion Models通过前向过程和逆向过程来制造仿真:
    1. 前向过程:逐渐添加噪声,直到得到高斯噪声
    2. 逆向过程:逐渐去除噪点,直到我们得到干净的图像

image.png

image.png

Forward Process

  • 马尔可夫链,与历史状态无关;当 αt 增加时,xt 变为高斯噪声:

image.png

image.png

Inverse Process

  • 目标:估计q(xt−1|xt)当 T 很大时,假设它是高斯的

  • 关键观察:q(xt−1|xt,x0)也是高斯的

image.png

image.png

  • Diffusion Models(DDPM):
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
import torch
import torch.nn as nn
import torch.nn.functional as F
import numpy as np

class DiffusionModel(nn.Module):
def __init__(self, image_size, channels, num_timesteps, betas=None):
super(DiffusionModel, self).__init__()
self.image_size = image_size
self.channels = channels
self.num_timesteps = num_timesteps

# 生成beta值(也可以使用其他调度策略)
if betas is None:
self.betas = torch.linspace(1e-4, 0.02, num_timesteps)
else:
self.betas = betas

# 反向过程的参数(可以使用一个神经网络来建模)
self.network = nn.Sequential(
nn.Conv2d(channels, 64, 3, padding=1),
nn.ReLU(),
nn.Conv2d(64, 128, 3, padding=1),
nn.ReLU(),
nn.Conv2d(128, channels, 3, padding=1)
)

def forward(self, x_0, noise_schedule=None):
"""
x_0: 初始图像
noise_schedule: 噪声调度器(如果有自定义噪声调度)
"""
# 通过正向扩散过程生成中间噪声图像
batch_size = x_0.size(0)
device = x_0.device
x_t = x_0

# 如果有自定义噪声调度,则使用自定义的betas
betas = self.betas if noise_schedule is None else noise_schedule

for t in range(self.num_timesteps):
noise = torch.randn_like(x_t)
sqrt_alpha_t = torch.sqrt(1 - betas[t])
sqrt_beta_t = torch.sqrt(betas[t])
x_t = sqrt_alpha_t * x_t + sqrt_beta_t * noise # 正向过程

return x_t

def reverse_process(self, x_t, t, noise=None):
"""
反向过程,给定时刻t的图像x_t,生成一个去噪的图像
"""
if noise is None:
noise = torch.randn_like(x_t)

model_output = self.network(x_t)
x_0 = model_output # 基于网络输出的去噪图像

return x_0

def train_diffusion_model(model, dataloader, optimizer, num_epochs=10):
for epoch in range(num_epochs):
for batch_idx, (images, _) in enumerate(dataloader):
images = images.to(device)

# 获取正向过程生成的图像
noisy_images = model(images)

# 使用模型预测噪声并计算损失
optimizer.zero_grad()
loss = F.mse_loss(noisy_images, images)
loss.backward()
optimizer.step()

if batch_idx % 100 == 0:
print(f"Epoch {epoch}/{num_epochs}, Batch {batch_idx}, Loss: {loss.item()}")

def sample_from_diffusion(model, shape=(1, 3, 32, 32), num_timesteps=1000):
# 从噪声开始生成图像
x_t = torch.randn(shape).to(device)

for t in reversed(range(num_timesteps)):
# 进行反向过程
x_t = model.reverse_process(x_t, t)

return x_t

# 示例:假设你有一个数据集DataLoader
dataloader = ...

# 初始化模型
model = DiffusionModel(image_size=32, channels=3, num_timesteps=1000).to(device)

# 设置优化器
optimizer = torch.optim.Adam(model.parameters(), lr=1e-4)

# 训练模型
train_diffusion_model(model, dataloader, optimizer)

# 从模型中采样
generated_images = sample_from_diffusion(model, shape=(16, 3, 32, 32))

DDPM 2020

image.png

DALL·E 2022

image.png

Stable Diffusion 2022

image.png