目录
10. Basic CNN
回顾:全连接神经网络
本节:处理图像时常用的二维卷积神经网络
图像
卷积(Convolution)
卷积层(Convolutional Layer)
Padding(填充)
Stride(步长)
最大池化层(Max Pooling Layer)
一个简单的卷积神经网络
如何使用GPU?
Exercise
11. Advanced CNN
GoogLeNet
Inception Module
1×1 卷积
Inception Module 的实现
ResNet
Deep Residual Learning
Residual Network
简单利用残差块的网络
Exercise
接下来的路怎么走
10. Basic CNNCNN(Convolutional Neural Network):卷积神经网络
回顾:全连接神经网络- 定义:网络中用的都是线性层,且为串型连接
- 输入和每一个输出,任意两个节点间都存在权重,即每一个输入节点都要参与下一层每一个输出节点的计算上
- 丧失了一些原有的空间信息(两个点在图像中原本为相邻点,但展平之后可能距离很远)
import torch class Net(torch.nn.Module): def __init__(self): super(Net,self).__init__() self.l1 = torch.nn.Linear(784,512) self.l2 = torch.nn.Linear(512,256) self.l3 = torch.nn.Linear(256,128) self.l4 = torch.nn.Linear(128,64) self.l5 = torch.nn.Linear(64,10) def forward(self,x): x = x.view(-1,784) x = F.relu(self.l1(x)) x = F.relu(self.l2(x)) x = F.relu(self.l3(x)) x = F.relu(self.l4(x)) return self.l5(x) model = Net()
--------------------------------------------------------------------------------------------------------------------------------
本节:处理图像时常用的二维卷积神经网络
- 卷积层:保留图像的空间特征,把图像按照原始空间结构进行保存
- 下采样:通道数不变,图像的宽度和高度会发生改变(目的:减少数据量,降低运算需求)
两者合称为特征提取层:通过卷积运算找到某种特征
经过特征提取后,变为一个向量,再经过全连接网络去做分类(分类器)
--------------------------------------------------------------------------------------------------------------------------------
图像图像的表示方法:
- 栅格图像:RGB 图像即为一个一个格子,每个格子里都有颜色值
-
矢量图像
卷积:
最后输出的通道数取决于卷积核的个数
---------------------------------------------------------------------------------------------------------------------------------
卷积(Convolution)(1)单通道(Single Input Channel)
- Input:1×5×5
- Kernel:3×3

(2)多通道(3 Input Channels)
图像里的一个patch = 3×3×3的张量
输入的通道数 = 卷积核的数量
(3)多通道(N Input Channels)
(4)多通道(N Input Channels and M Output Channels)
若有m个卷积核,最后输出的通道数为m
总结:
- 每一个卷积核的通道数 = 输入通道数
- 卷积核的总数 = 输出通道数
- 卷积核大小自己定,与图像大小无关
- 对每一个图像块做运算时,用的都是相同的卷积核(共享权重机制)
--------------------------------------------------------------------------------------------------------------------------------
卷积层(Convolutional Layer)import torch in_channels ,out_channels = 5,10 # 输入通道数n=5,输出通道数m=10 width , height = 100,100 # 图像大小 kernel_size = 3 # 卷积核大小,若为常数3,即3×3;若为元组(5,3),即5×3 batch_size = 1 input = torch.randn(batch_size,in_channels,width,height) # randn为正态分布采样随机数 conv_layer = torch.nn.Conv2d(in_channels,out_channels,kernel_size=kernel_size) output = conv_layer(input) print(input.shape) print(output.shape) print(conv_layer.weight.shape)
卷积层对输入图像的宽度和高度没有要求,对输入通道数有要求
卷积核大小为 3×3 时,图像大小(宽和高)都会减小 2 个单位
--------------------------------------------------------------------------------------------------------------------------------
Padding(填充)若希望 Output 大小不变(与 Input 保持一致),可以采用填充(padding)的方式
- 若卷积核为3×3,3整除2为1,则 padding=1
- 若卷积核为5×5,5整除2为2,则 padding=2
- 以此类推

import torch input = [3,4,6,5,7, 2,4,6,8,2, 1,6,7,8,4, 9,7,4,6,2, 3,7,5,4,1] input = torch.Tensor(input).view(1,1,5,5) # B,C,W,H 其中batch_size=1意味着一次送入一张照片 conv_layer = torch.nn.Conv2d(1,1,kernel_size=3,padding=1,bias=False) # bias:进行完卷积后对通道加上偏置量 kernel = torch.Tensor([1,2,3,4,5,6,7,8,9]).view(1,1,3,3) # out_channels, in_channels, kernel_width, kernel_height conv_layer.weight.data = kernel.data # 初始化卷积层权重;kernel为张量,所以要用.data output = conv_layer(input) print(output)
有效降低图像的宽、高度
import torch input = [3,4,6,5,7, 2,4,6,8,2, 1,6,7,8,4, 9,7,4,6,2, 3,7,5,4,1] input = torch.Tensor(input).view(1,1,5,5) conv_layer = torch.nn.Conv2d(1,1,kernel_size=3,stride=2,bias=False) kernel = torch.Tensor([1,2,3,4,5,6,7,8,9]).view(1,1,3,3) conv_layer.weight.data = kernel.data output = conv_layer(input) print(output)
---------------------------------------------------------------------------------------------------------------------------------
最大池化层(Max Pooling Layer)为下采样的一种,最大池化的特点是:无权重,通道数不变,用 2×2 的 maxpooling,图像的大小会变为之前的一半
import torch input = [3,4,6,5, 2,4,6,8, 1,6,7,8, 9,7,4,6] input = torch.Tensor(input).view(1,1,4,4) maxpooling_layer = torch.nn.MaxPool2d(kernel_size=2) # kernel_size=2 默认 stride=2 output = maxpooling_layer(input) print(output)
---------------------------------------------------------------------------------------------------------------------------------
一个简单的卷积神经网络
卷积和池化不在乎输入图像的大小,但最后的分类器在乎:对每一个样本来说元素个数
import torch class Net(torch.nn.Module): def __init__(self): super(Net,self).__init__() self.conv1 = torch.nn.Conv2d(1,10,kernel_size=5) self.conv2 = torch.nn.Conv2d(10,20,kernel_size=5) self.pooling = torch.nn.MaxPool2d(2) # 无权重,做一次就行 self.fc = torch.nn.Linear(320,10) def forward(self,x): # Flatten data from (n,1,28,28) to (n,784) batch_size = x.size(0) x = self.pooling(F.relu(self.conv1(x))) x = self.pooling(F.relu(self.conv2(x))) x = x.view(batch_size,-1) # Flatten,采用view()变为全连接网络需要的输入 x= self.fc(x) return x # 最后一层不做激活,因为要算交叉熵损失 model = Net()如何使用GPU?
1、Move Model to GPU
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu") # cuda:0 表示第一块显卡,取决于显卡的数量 model.to(device)
2、Move Tensors to GPU
完整代码:
# 0.导包 import torch from torchvision import transforms # 对图像进行原始处理的工具 from torchvision import datasets from torch.utils.data import DataLoader import torch.nn.functional as F # 为了使用函数 relu() import torch.optim as optim # 为了构建优化器 # 1.准备数据 batch_size = 64 transform = transforms.Compose([transforms.ToTensor(), # PIL Image 转换为 Tensor transforms.Normalize((0.1307, ),(0.3081, ))]) # 归一化到0-1分布,其中mean=0.1307,std=0.3081 train_dataset = datasets.MNIST(root='../dataset/mnist',train=True,download=True,transform=transform) train_loader = DataLoader(train_dataset,shuffle=True,batch_size=batch_size) test_dataset = datasets.MNIST(root='../dataset/mnist',train=False,download=True,transform=transform) test_loader = DataLoader(test_dataset,shuffle=False,batch_size=batch_size) # 2.设计模型 import torch class Net(torch.nn.Module): def __init__(self): super(Net,self).__init__() self.conv1 = torch.nn.Conv2d(1,10,kernel_size=5) self.conv2 = torch.nn.Conv2d(10,20,kernel_size=5) self.pooling = torch.nn.MaxPool2d(2) # 无权重,做一次就行 self.fc = torch.nn.Linear(320,10) def forward(self,x): # Flatten data from (n,1,28,28) to (n,784) batch_size = x.size(0) x = self.pooling(F.relu(self.conv1(x))) x = self.pooling(F.relu(self.conv2(x))) x = x.view(batch_size,-1) # Flatten,采用view()变为全连接网络需要的输入 x= self.fc(x) return x # 最后一层不做激活,因为要算交叉熵损失 model = Net() device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu") # cuda:0 表示第一块显卡,取决于显卡的数量 model.to(device) # 2.设计模型 import torch class Net(torch.nn.Module): def __init__(self): super(Net,self).__init__() self.conv1 = torch.nn.Conv2d(1,10,kernel_size=5) self.conv2 = torch.nn.Conv2d(10,20,kernel_size=5) self.pooling = torch.nn.MaxPool2d(2) # 无权重,做一次就行 self.fc = torch.nn.Linear(320,10) def forward(self,x): # Flatten data from (n,1,28,28) to (n,784) batch_size = x.size(0) x = self.pooling(F.relu(self.conv1(x))) x = self.pooling(F.relu(self.conv2(x))) x = x.view(batch_size,-1) # Flatten,采用view()变为全连接网络需要的输入 x= self.fc(x) return x # 最后一层不做激活,因为要算交叉熵损失 model = Net() device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu") # cuda:0 表示第一块显卡,取决于显卡的数量 model.to(device) # 3.构建损失和优化器 criterion = torch.nn.CrossEntropyLoss() optimizer = optim.SGD(model.parameters(),lr=0.01,momentum=0.5) # 4.训练 def train(epoch): # 把一轮循环封装到函数里 running_loss = 0 for batch_idx, data in enumerate(train_loader,0): inputs,target = data inputs,target = inputs.to(device),target.to(device) optimizer.zero_grad() # 前馈 反馈 更新 outputs = model(inputs) loss = criterion(outputs,target) loss.backward() optimizer.step() running_loss += loss.item() if batch_idx % 300 == 299: # 每300批量输出一次 print('[%d,%5d]loss: %.3f' % (epoch+1,batch_idx+1,running_loss/2000)) running_loss = 0 # 5.测试 epoch_list = [] accuracy_list = [] def test(): correct = 0 total = 0 with torch.no_grad(): for data in test_loader: inputs,target = data inputs,target = inputs.to(device),target.to(device) outputs = model(inputs) _,predicted = torch.max(outputs.data,dim=1) # 求每一行里max的下标,对应着分类,其中dim=1为行,dim=0为列 total += target.size(0) # (N,1),取N correct += (predicted == target).sum().item() print('Accuracy on test set: %d %% [%d/%d]' %(100*correct/total,correct,total)) accuracy_list.append(correct/total) if __name__ == '__main__': for epoch in range(10): train(epoch) test() epoch_list.append(epoch)
画图:
import matplotlib.pyplot as plt plt.plot(epoch_list,accuracy_list) plt.ylabel('Accuracy') plt.xlabel('Epoch') plt.grid() plt.show()
---------------------------------------------------------------------------------------------------------------------------------
Exercise
-
卷积神经网络、多层感知机、全连接网络 —— 串型结构(输出是下一层的输入)
如下图的例子,用了2个卷积层、2个池化层、2个全连接层,近似LeNet5
- 高级CNN —— 可能会有分支等结构
--------------------------------------------------------------------------------------------------------------------------------
GoogLeNet
- Inception Module(块):蓝蓝红+四个蓝,封装为一个类
- 减少代码冗余,减少重复:使用函数(如C语言),或构造自己的类(如面向对象方法)
构造神经网络时,有的超参数是比较难选的,如卷积核大小。GoogLeNet 的出发点是:在一个块里把几种卷积都用一下,将来好用的那一种卷积核权重就会变得比较大,其他路线的权重会相对变小,即提供了几种候选的卷积神经网络的配置,通过训练自动找到卷积最优的组合
四条路径,四个张量要拼接在一起,必须要保证它们的宽高度一致
(batch,channel,width,height)走不同路径,channel 可以不同,width 和 height 必须保持一致
- 后3条,padding 即可
- 第1条,均值池化时为了保证W和H不变,人为指定 stride=1,padding=某个值(如3×3做均值,padding=1)
最大池化会导致图像变为原来的一半
1×1 卷积卷积个数取决于输入张量的通道
作用:改变通道数量
- C×W×H 通过 1×1 卷积,变为 1×W×H
- 若希望输出的通道数为m,则使用m个 3个1×1卷积 叠加在一起的卷积
输出中的每一个元素,都包含了输入通道所有相同位置的信息(信息融合)
Why 1×1 Convolution
1×1 卷积又叫 Network in Network
- 运算量降低:如图,运算量只有以前的十分之一
- 改变通道数量
(1)4条分支
4个分支 (B,C,W,H) 只有C不一样,它们的通道数分别为:24、16、24、24


代码:
import torch import torch.nn as nn from torch.nn import Conv2d # 第一条分支 # init self.branch_pool = nn.Conv2d(in_channels,24,kernel_size=1) # forward branch_pool = F.avg_pool2d(x,kernel_size=3,stride=1,padding=1) # 平均池化,要保证Average后图像的宽度和高度不变,因为最后四条输出要拼接 branch_pool = self.branch_pool(branch_pool) # 第二条分支 self.branch1x1 = nn.Conv2d(in_channels,16,kernel_size=1) # init branch1x1 = self.branch1x1(x) # forward # 第三条分支 # init self.branch5x5_1 = nn.Conv2d(in_channels,16,kernel_size=1) # 输出16是下一个的输入 self.branch5x5_2 = nn.Conv2d(16,24,kernel_size=5,padding=2) # 为了保证图像的H和W不变,padding=2 # forward branch5x5 = self.branch5x5_1(x) branch5x5 = self.branch5x5_2(branch5x5) # 第四条分支 # init self.branch3x3_1 = nn.Conv2d(in_channels,16,kernel_size=1) self.branch3x3_2 = nn.Conv2d(16,24,kernel_size=3,padding=1) # 注意输出通道与输入通道的匹配 self.branch3x3_3 = nn.Conv2d(24,24,kernel_size=3,padding=1) # forward branch3x3 = self.branch3x3_1(x) branch3x3 = self.branch3x3_2(branch3x3) branch3x3 = self.branch3x3_3(branch3x3)
(2)拼接 Concatenate

代码:
outputs = [branch1x1,branch5x5,branch3x3,branch_pool] # 列表 return torch.cat(outputs,dim=1) # (b,c,w,h) dim=1表示c
--------------------------------------------------------------------------------------------------------------------------------
重点代码:
(1)Inception
把 Inception 抽象为类,构建网络时就可以调用
import torch import torch.nn as nn from torch.nn import Conv2d # Inception class InceptionA(nn.Module): # 初始输入通道并未写死,作为构造函数里初始化的参数,目的是为了实例化时可指明输入通道 def __init__(self,in_channels): super(InceptionA,self).__init__() # 分支1 self.branch_pool = nn.Conv2d(in_channels,24,kernel_size=1) # 分支2 self.branch1x1 = nn.Conv2d(in_channels,16,kernel_size=1) # 分支3 self.branch5x5_1 = nn.Conv2d(in_channels,16,kernel_size=1) self.branch5x5_2 = nn.Conv2d(16,24,kernel_size=5,padding=2) # 分支4 self.branch3x3_1 = nn.Conv2d(in_channels,16,kernel_size=1) self.branch3x3_2 = nn.Conv2d(16,24,kernel_size=3,padding=1) self.branch3x3_3 = nn.Conv2d(24,24,kernel_size=3,padding=1) def forward(self,x): # 分支1 branch_pool = F.avg_pool2d(x,kernel_size=3,stride=1,padding=1) branch_pool = self.branch_pool(branch_pool) # 分支2 branch1x1 = self.branch1x1(x) # 分支3 branch5x5 = self.branch5x5_1(x) branch5x5 = self.branch5x5_2(branch5x5) # 分支4 branch3x3 = self.branch3x3_1(x) branch3x3 = self.branch3x3_2(branch3x3) branch3x3 = self.branch3x3_3(branch3x3) outputs = [branch1x1,branch5x5,branch3x3,branch_pool] return torch.cat(outputs,dim=1)
(2)用2个Inception模块
# 用2个Inception模块 class Net(nn.Module): def __init__(self): super(Net,self).__init__() self.conv1 = nn.Conv2d(1,10,kernel_size=5) self.conv2 = nn.Conv2d(88,20,kernel_size=5) # 88是由incep1来的 self.incep1 = InceptionA(in_channels=10) self.incep2 = InceptionA(in_channels=20) self.mp = nn.MaxPool2d(2) # 经过MaxPooling后图像的宽高度一直在减小 self.fc = nn.Linear(1408,10) # fc 全连接 def forward(self,x): in_size = x.size(0) x = F.relu(self.mp(self.conv1(x))) # 卷积—>池化—>relu x = self.incep1(x) # 输入通道=10,输出通道=88(三个分支输出通道为24,一个为16,所以为24×3+16=88) x = F.relu(self.mp(self.conv2(x))) # 输入88,输出20 x = self.incep2(x) # 输出88 x = x.view(in_size,-1) # 变为向量 x = self.fc(x) # 全连接做分类 return x
- 1408怎么来的? 根据MNIST数据集28×28的宽度和高度,经过网络后,到fc层,inception2层的输出每张图像包含1408个元素
-
如何计算得到1408?
开发中实际不去计算(为了保证网络不出错),而是在定义模块时先去掉3行:
self.fc = nn.Linear(1408,10) x = x.view(in_size,-1) x = self.fc(x)
根据输入构造MNIST大小的随机张量输入,实例化后计算一下,看输出的size即可
--------------------------------------------------------------------------------------------------------------------------------
完整代码:
# 0.导包 import torch from torchvision import transforms # 对图像进行原始处理的工具 from torchvision import datasets from torch.utils.data import DataLoader import torch.nn.functional as F # 为了使用函数 relu() import torch.optim as optim # 为了构建优化器 # 1.准备数据 batch_size = 64 transform = transforms.Compose([transforms.ToTensor(), # PIL Image 转换为 Tensor transforms.Normalize((0.1307, ),(0.3081, ))]) # 归一化到0-1分布,其中mean=0.1307,std=0.3081 train_dataset = datasets.MNIST(root='../dataset/mnist',train=True,download=True,transform=transform) train_loader = DataLoader(train_dataset,shuffle=True,batch_size=batch_size) test_dataset = datasets.MNIST(root='../dataset/mnist',train=False,download=True,transform=transform) test_loader = DataLoader(test_dataset,shuffle=False,batch_size=batch_size) import torch import torch.nn as nn from torch.nn import Conv2d # Inception class InceptionA(nn.Module): # 初始输入通道并未写死,作为构造函数里初始化的参数,目的是为了实例化时可指明输入通道 def __init__(self,in_channels): super(InceptionA,self).__init__() # 分支1 self.branch_pool = nn.Conv2d(in_channels,24,kernel_size=1) # 分支2 self.branch1x1 = nn.Conv2d(in_channels,16,kernel_size=1) # 分支3 self.branch5x5_1 = nn.Conv2d(in_channels,16,kernel_size=1) self.branch5x5_2 = nn.Conv2d(16,24,kernel_size=5,padding=2) # 分支4 self.branch3x3_1 = nn.Conv2d(in_channels,16,kernel_size=1) self.branch3x3_2 = nn.Conv2d(16,24,kernel_size=3,padding=1) self.branch3x3_3 = nn.Conv2d(24,24,kernel_size=3,padding=1) def forward(self,x): # 分支1 branch_pool = F.avg_pool2d(x,kernel_size=3,stride=1,padding=1) branch_pool = self.branch_pool(branch_pool) # 分支2 branch1x1 = self.branch1x1(x) # 分支3 branch5x5 = self.branch5x5_1(x) branch5x5 = self.branch5x5_2(branch5x5) # 分支4 branch3x3 = self.branch3x3_1(x) branch3x3 = self.branch3x3_2(branch3x3) branch3x3 = self.branch3x3_3(branch3x3) outputs = [branch1x1,branch5x5,branch3x3,branch_pool] return torch.cat(outputs,dim=1) # 2.设计模型,用2个Inception模块 class Net(nn.Module): def __init__(self): super(Net,self).__init__() self.conv1 = nn.Conv2d(1,10,kernel_size=5) self.conv2 = nn.Conv2d(88,20,kernel_size=5) # 88是由incep1来的 self.incep1 = InceptionA(in_channels=10) self.incep2 = InceptionA(in_channels=20) self.mp = nn.MaxPool2d(2) # 经过MaxPooling后图像的宽高度一直在减小 self.fc = nn.Linear(1408,10) # fc 全连接 def forward(self,x): in_size = x.size(0) x = F.relu(self.mp(self.conv1(x))) # 卷积—>池化—>relu x = self.incep1(x) # 输入通道=10,输出通道=88(三个分支输出通道为24,一个为16,所以为24×3+16=88) x = F.relu(self.mp(self.conv2(x))) # 输入88,输出20 x = self.incep2(x) # 输出88 x = x.view(in_size,-1) # 变为向量 x = self.fc(x) # 全连接做分类 return x model = Net() device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu") # cuda:0 表示第一块显卡,取决于显卡的数量 model.to(device)
# 3.构建损失和优化器 criterion = torch.nn.CrossEntropyLoss() optimizer = optim.SGD(model.parameters(),lr=0.01,momentum=0.5) # 4.训练 def train(epoch): # 把一轮循环封装到函数里 running_loss = 0 for batch_idx, data in enumerate(train_loader,0): inputs,target = data inputs,target = inputs.to(device),target.to(device) optimizer.zero_grad() # 前馈 反馈 更新 outputs = model(inputs) loss = criterion(outputs,target) loss.backward() optimizer.step() running_loss += loss.item() if batch_idx % 300 == 299: # 每300批量输出一次 print('[%d,%5d]loss: %.3f' % (epoch+1,batch_idx+1,running_loss/2000)) running_loss = 0 # 5.测试 epoch_list = [] accuracy_list = [] def test(): correct = 0 total = 0 with torch.no_grad(): for data in test_loader: inputs,target = data inputs,target = inputs.to(device),target.to(device) outputs = model(inputs) _,predicted = torch.max(outputs.data,dim=1) # 求每一行里max的下标,对应着分类,其中dim=1为行,dim=0为列 total += target.size(0) # (N,1),取N correct += (predicted == target).sum().item() print('Accuracy on test set: %d %% [%d/%d]' %(100*correct/total,correct,total)) accuracy_list.append(correct/total) if __name__ == '__main__': for epoch in range(10): train(epoch) test() epoch_list.append(epoch)
性能提高不多的主要原因:最后的全连接层层数较少,但是怎么改变卷积层来提高性能才是最主要的
import matplotlib.pyplot as plt plt.plot(epoch_list,accuracy_list) plt.ylabel('Accuracy') plt.xlabel('Epoch') plt.grid() plt.show()

图中可以看出 Accuracy 最大值并不在最后一轮,所以:
- 训练神经网络时不是训练轮数越多越好,可能会过拟合,观察 test 决定网络训练多少轮合适
- 若某一次测试集准确率达峰值,将当前网络的参数做备份、存盘,这便是泛化性能最好的网络
---------------------------------------------------------------------------------------------------------------------------------
ResNet把 3×3 卷积一直堆下去,性能会不会变好?
结果发现:20层卷积比56层卷积性能要好
可能原因:梯度消失
- 反向传播:链式法则把一连串的梯度乘起来
- 若梯度都小于1,连乘后梯度g趋于0
- 权重更新公式:w = w - αg,若g趋于0,则w得不到更新,离输入较近的块没法得到充分训练
解决梯度消失的方法:加锁、逐层训练。但由于深度学习里层数太多,难以用该方法训练。故有了ResNet 的提出
---------------------------------------------------------------------------------------------------------------------------------
Deep Residual Learning
Residual net
-
先和x相加(输出与x的输入的张量维度:C、H、W必须都一样,才能相加),再激活
- 解决梯度消失的问题,离输入近的那些层能得到充分的训练
残差网络的两种结构:
- 串型
- 跳连接(两个一组)

x 经过最大池化层可以转化为同样大小
--------------------------------------------------------------------------------------------------------------------------------
简单利用残差块的网络
kernel_size为5时,经过卷积后,图像的宽高度各减小4个单位
Weight Layer:
- 第一个 Weight Layer:先卷积再激活
- 第二个 Weight Layer:先卷积,再加x,最后激活(卷积层输入、输出通道要与x保持一致)
import torch import torch.nn as nn from torch.nn import Conv2d class ResidualBlock(nn.Module): def __init__(self,channels): super(ResidualBlock,self).__init__() self.channels = channels self.conv1 = nn.Conv2d(channels,channels,kernel_size=3,padding=1) # padding=1是为了保证图像输出大小不变(因为卷积核为3,所以3整除2为1) self.conv2 = nn.Conv2d(channels,channels,kernel_size=3,padding=1) def forward(self,x): y = F.relu(self.conv1(x)) y = self.conv2(y) return F.relu(x+y) # 即F(x)+x,注意是先求和再激活
利用2个 residual block 实现简单的残差网络
class Net(nn.Module): def __init__(self): super(Net,self).__init__() self.conv1 = nn.Conv2d(1,16,kernel_size=5) self.conv2 = nn.Conv2d(16,32,kernel_size=5) self.mp = nn.MaxPool2d(2) self.rblock1 = ResidualBlock(16) # 括号里的数字是输入维度 self.rblock2 = ResidualBlock(32) self.fc = nn.Linear(512,10) def forward(self,x): in_size = x.size(0) x = self.mp(F.relu(self.conv1(x))) x = self.rblock1(x) x = self.mp(F.relu(self.conv2(x))) x = self.rblock2(x) x = view(in_size,-1) x = self.fc(x) return x
注意事项:
- 网络结构非常复杂时,可以用新的类去封装它。若有不同的运行分支,可以分开计算,最后拼接到一起
- 构造网络时的超参数及网络里的 size 要算出来,但若想检验结果是否算的都对,可以创建完网络后,写一个简单的测试方法,先把其他行注释掉,看输出结果和预期的张量大小是否一致
- 逐步式渐增(增量式开发网络):渐渐增加网络的规模,保证每一步加上一个新的模块后,输出张量都是对的(保持每一层的结构符合预期)
完整代码:
# 0.导包 import torch from torchvision import transforms # 对图像进行原始处理的工具 from torchvision import datasets from torch.utils.data import DataLoader import torch.nn.functional as F # 为了使用函数 relu() import torch.optim as optim # 为了构建优化器 # 1.准备数据 batch_size = 64 transform = transforms.Compose([transforms.ToTensor(), # PIL Image 转换为 Tensor transforms.Normalize((0.1307, ),(0.3081, ))]) # 归一化到0-1分布,其中mean=0.1307,std=0.3081 train_dataset = datasets.MNIST(root='../dataset/mnist',train=True,download=True,transform=transform) train_loader = DataLoader(train_dataset,shuffle=True,batch_size=batch_size) test_dataset = datasets.MNIST(root='../dataset/mnist',train=False,download=True,transform=transform) test_loader = DataLoader(test_dataset,shuffle=False,batch_size=batch_size) # Residual Block import torch.nn as nn from torch.nn import Conv2d class ResidualBlock(nn.Module): def __init__(self,channels): super(ResidualBlock,self).__init__() self.channels = channels self.conv1 = nn.Conv2d(channels,channels,kernel_size=3,padding=1) self.conv2 = nn.Conv2d(channels,channels,kernel_size=3,padding=1) def forward(self,x): y = F.relu(self.conv1(x)) y = self.conv2(y) return F.relu(x+y) # 2. 设计模型 class Net(nn.Module): def __init__(self): super(Net,self).__init__() self.conv1 = nn.Conv2d(1,16,kernel_size=5) self.conv2 = nn.Conv2d(16,32,kernel_size=5) self.mp = nn.MaxPool2d(2) self.rblock1 = ResidualBlock(16) self.rblock2 = ResidualBlock(32) self.fc = nn.Linear(512,10) def forward(self,x): in_size = x.size(0) x = self.mp(F.relu(self.conv1(x))) x = self.rblock1(x) x = self.mp(F.relu(self.conv2(x))) x = self.rblock2(x) x = x.view(in_size,-1) x = self.fc(x) return x model = Net() device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu") # cuda:0 表示第一块显卡,取决于显卡的数量 model.to(device)
# 3.构建损失和优化器 criterion = torch.nn.CrossEntropyLoss() optimizer = optim.SGD(model.parameters(),lr=0.01,momentum=0.5) # 4.训练 def train(epoch): # 把一轮循环封装到函数里 running_loss = 0 for batch_idx, data in enumerate(train_loader,0): inputs,target = data inputs,target = inputs.to(device),target.to(device) optimizer.zero_grad() # 前馈 反馈 更新 outputs = model(inputs) loss = criterion(outputs,target) loss.backward() optimizer.step() running_loss += loss.item() if batch_idx % 300 == 299: # 每300批量输出一次 print('[%d,%5d]loss: %.3f' % (epoch+1,batch_idx+1,running_loss/2000)) running_loss = 0 # 5.测试 epoch_list = [] accuracy_list = [] def test(): correct = 0 total = 0 with torch.no_grad(): for data in test_loader: inputs,target = data inputs,target = inputs.to(device),target.to(device) outputs = model(inputs) _,predicted = torch.max(outputs.data,dim=1) # 求每一行里max的下标,对应着分类,其中dim=1为行,dim=0为列 total += target.size(0) # (N,1),取N correct += (predicted == target).sum().item() print('Accuracy on test set: %d %% [%d/%d]' %(100*correct/total,correct,total)) accuracy_list.append(correct/total) if __name__ == '__main__': for epoch in range(10): train(epoch) test() epoch_list.append(epoch)
import matplotlib.pyplot as plt plt.plot(epoch_list,accuracy_list) plt.ylabel('Accuracy') plt.xlabel('Epoch') plt.grid() plt.show()
---------------------------------------------------------------------------------------------------------------------------------
Exercise1、He K, Zhang X, Ren S, et al. Identity Mappings in Deep Residual Networks[C]
- 有关 residual block 块的设计
2、Huang G, Liu Z, Laurens V D M, et al. Densely Connected Convolutional Networks[J]. 2016:2261-2269.
- DenseNet
--------------------------------------------------------------------------------------------------------------------------------
接下来的路怎么走- 理论角度深入理解,从数学和工程学角度,推荐《深度学习》花书
- 阅读 pytorch 文档(API Reference),通读一遍
- 复现经典工作,光下载代码并跑题只是会配置环境,完全不足够,需要循环读代码和写代码的过程(读代码需要读系统架构,包括训练架构、测试架构、数据读取、损失函数构建等等)
- 扩充视野,读相关领域论文,组装小模块