自回归模型(Autoregressive Models)

显式概率密度估计(Explicit Density Estimation: Autoregressive Models)

目标：得到显式函数p(x) = f(x,W)
给定数据集x(1),x(2),…,x(n)，通过极大似然估计来训练模型：

Autoregressive Models：
1. 假设每个 x 由多个子部分(维度)组成：x=(x1，x2，…，xT)
2. 使用链式法则分解概率表达式：
3. 通过将上述方程代入损失函数来求 W∗。给定历史作为已知条件来预测下一个

PixelCNN and PixelRNN

从左上角开始，一次生成一个图像像素，逐像素生成
使用 RNN或CNN计算每个像素的概率，该概率取决于状态和从左侧和上方的 RGB值：

在每个像素处，预测红色，然后预测蓝色，然后预测绿色：softmax到[0，1，…，255]

Generative Pretraining from Pixels

使用自回归模型的Transformer:

变分自编码器(Variational Autoencoder)

Autoencoder

Autoencoder：编码器压缩数据(CNN,降采样)，解码器重构数据(上采样);通常用于未标记数据的降维
编码器 e 和解码器 d 通常是神经网络
PCA 是一种线性自动编码器：

其中，损失函数ε可以是 L2 损失或交叉熵损失。希望能够恢复原状，和x进行比较

Autoencoder(黄)相比于PCA(蓝)更结构化，效果更好

Autoencoder is not a Generative Model

如果没有适当的正则化，来自潜在空间的样本可能毫无意义。由于其对隐空间无约束，所以若不是词典内的输入，可能导致不合理输出(位于中间地带)

Reparameterization trick: 𝒛 = 𝝁𝒙 + 𝝈𝒙 ∗ 𝜹, where 𝜹~𝑵(𝟎,𝟏)，采样不可微而z可微
经过Reparameterization trick后预测不是点而是预测分布，使得隐空间的区域都有一定覆盖，可以重建出过渡体
𝜹是噪音，可能导致坍缩，且隐空间中间缝隙会增大，所以需要进行KL loss正则化使得𝝁𝒙=0，σx=1靠拢：

变分自动编码器：自动编码器 + 潜在空间具有良好的特性，可以实现生成过程

Generative Adversarial Network(GAN)

随机数发生器(Generate uniform random numbers)，线性同余：

其他分布：𝑋 = 𝐹^−1 (U)，其中F(X)是X的累积分布

GAN:用神经网络拟合F(X)反函数：
1. G是生成器：生成样本X和真实样本X(g)尽可能一样，用神经网络完成
2. D是判别器：通过二分类使X置为0，X(g)置为1
3. 训练G和D直到判别器的正确率均为1/2说明已经分别不出来生成的X和真实的X(g)(平衡位置)
如果 P（Xg）已知，则计算 F^−1（x）。但是它是未知的，只有来自此分布的一些样本
GAN:

import torch
import torch.nn as nn
import torch.optim as optim
import torch.utils.data as Data
import torchvision
import torchvision.transforms as transforms
import matplotlib.pyplot as plt
import numpy as np

class Generator(nn.Module):
    def __init__(self, input_dim, output_dim):
        super(Generator, self).__init__()
        self.model = nn.Sequential(
            nn.Linear(input_dim, 128),
            nn.ReLU(True),
            nn.Linear(128, 256),
            nn.ReLU(True),
            nn.Linear(256, 512),
            nn.ReLU(True),
            nn.Linear(512, 1024),
            nn.ReLU(True),
            nn.Linear(1024, output_dim),
            nn.Tanh()
        )

    def forward(self, z):
        return self.model(z)

class Discriminator(nn.Module):
    def __init__(self, input_dim):
        super(Discriminator, self).__init__()
        self.model = nn.Sequential(
            nn.Linear(input_dim, 1024),
            nn.LeakyReLU(0.2, inplace=True),
            nn.Linear(1024, 512),
            nn.LeakyReLU(0.2, inplace=True),
            nn.Linear(512, 256),
            nn.LeakyReLU(0.2, inplace=True),
            nn.Linear(256, 1),
            nn.Sigmoid()
        )

    def forward(self, x):
        return self.model(x)

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

z_dim = 100  # 潜在空间的维度
image_dim = 784  # MNIST图像的维度（28x28）

generator = Generator(z_dim, image_dim).to(device)
discriminator = Discriminator(image_dim).to(device)

criterion = nn.BCELoss()
optimizer_g = optim.Adam(generator.parameters(), lr=0.0002, betas=(0.5, 0.999))
optimizer_d = optim.Adam(discriminator.parameters(), lr=0.0002, betas=(0.5, 0.999))

num_epochs = 100
for epoch in range(num_epochs):
    for i, (real_images, _) in enumerate(train_loader):
        real_images = real_images.view(-1, image_dim).to(device)
        batch_size = real_images.size(0)

        # 标签
        real_labels = torch.ones(batch_size, 1).to(device)
        fake_labels = torch.zeros(batch_size, 1).to(device)

        # 训练判别器
        optimizer_d.zero_grad()

        # 真实图像
        outputs = discriminator(real_images)
        d_loss_real = criterion(outputs, real_labels)
        d_loss_real.backward()

        # 生成假图像
        z = torch.randn(batch_size, z_dim).to(device)
        fake_images = generator(z)
        outputs = discriminator(fake_images.detach())
        d_loss_fake = criterion(outputs, fake_labels)
        d_loss_fake.backward()

        d_loss = d_loss_real + d_loss_fake
        optimizer_d.step()

        # 训练生成器
        optimizer_g.zero_grad()
        outputs = discriminator(fake_images)
        g_loss = criterion(outputs, real_labels)
        g_loss.backward()
        optimizer_g.step()

    print(f'Epoch [{epoch+1}/{num_epochs}], d_loss: {d_loss.item()}, g_loss: {g_loss.item()}')

z = torch.randn(64, z_dim).to(device)
fake_images = generator(z)
fake_images = fake_images.view(-1, 1, 28, 28)
fake_images = fake_images.cpu().detach().numpy()

fig, axes = plt.subplots(8, 8, figsize=(8, 8))
for i, ax in enumerate(axes.flatten()):
    ax.imshow(fake_images[i][0], cmap='gray')
    ax.axis('off')
plt.show()

The Generator of DCGAN

StyleGAN

Diffusion Models

Diffusion Models通过前向过程和逆向过程来制造仿真：
1. 前向过程：逐渐添加噪声，直到得到高斯噪声
2. 逆向过程：逐渐去除噪点，直到我们得到干净的图像

Forward Process

马尔可夫链，与历史状态无关;当 αt 增加时，xt 变为高斯噪声：

Inverse Process

目标：估计q(xt−1|xt)当 T 很大时，假设它是高斯的
关键观察：q(xt−1|xt，x0)也是高斯的

Diffusion Models(DDPM):

import torch
import torch.nn as nn
import torch.nn.functional as F
import numpy as np

class DiffusionModel(nn.Module):
    def __init__(self, image_size, channels, num_timesteps, betas=None):
        super(DiffusionModel, self).__init__()
        self.image_size = image_size
        self.channels = channels
        self.num_timesteps = num_timesteps
        
        # 生成beta值（也可以使用其他调度策略）
        if betas is None:
            self.betas = torch.linspace(1e-4, 0.02, num_timesteps)
        else:
            self.betas = betas
        
        # 反向过程的参数（可以使用一个神经网络来建模）
        self.network = nn.Sequential(
            nn.Conv2d(channels, 64, 3, padding=1),
            nn.ReLU(),
            nn.Conv2d(64, 128, 3, padding=1),
            nn.ReLU(),
            nn.Conv2d(128, channels, 3, padding=1)
        )

    def forward(self, x_0, noise_schedule=None):
        """
        x_0: 初始图像
        noise_schedule: 噪声调度器（如果有自定义噪声调度）
        """
        # 通过正向扩散过程生成中间噪声图像
        batch_size = x_0.size(0)
        device = x_0.device
        x_t = x_0
        
        # 如果有自定义噪声调度，则使用自定义的betas
        betas = self.betas if noise_schedule is None else noise_schedule
        
        for t in range(self.num_timesteps):
            noise = torch.randn_like(x_t)
            sqrt_alpha_t = torch.sqrt(1 - betas[t])
            sqrt_beta_t = torch.sqrt(betas[t])
            x_t = sqrt_alpha_t * x_t + sqrt_beta_t * noise  # 正向过程
        
        return x_t

    def reverse_process(self, x_t, t, noise=None):
        """
        反向过程，给定时刻t的图像x_t，生成一个去噪的图像
        """
        if noise is None:
            noise = torch.randn_like(x_t)
        
        model_output = self.network(x_t)
        x_0 = model_output  # 基于网络输出的去噪图像
        
        return x_0

def train_diffusion_model(model, dataloader, optimizer, num_epochs=10):
    for epoch in range(num_epochs):
        for batch_idx, (images, _) in enumerate(dataloader):
            images = images.to(device)
            
            # 获取正向过程生成的图像
            noisy_images = model(images)
            
            # 使用模型预测噪声并计算损失
            optimizer.zero_grad()
            loss = F.mse_loss(noisy_images, images)
            loss.backward()
            optimizer.step()

            if batch_idx % 100 == 0:
                print(f"Epoch {epoch}/{num_epochs}, Batch {batch_idx}, Loss: {loss.item()}")

def sample_from_diffusion(model, shape=(1, 3, 32, 32), num_timesteps=1000):
    # 从噪声开始生成图像
    x_t = torch.randn(shape).to(device)
    
    for t in reversed(range(num_timesteps)):
        # 进行反向过程
        x_t = model.reverse_process(x_t, t)
    
    return x_t

# 示例：假设你有一个数据集DataLoader
dataloader = ...

# 初始化模型
model = DiffusionModel(image_size=32, channels=3, num_timesteps=1000).to(device)

# 设置优化器
optimizer = torch.optim.Adam(model.parameters(), lr=1e-4)

# 训练模型
train_diffusion_model(model, dataloader, optimizer)

# 从模型中采样
generated_images = sample_from_diffusion(model, shape=(16, 3, 32, 32))