残差网络(ResNet)由何恺明等人提出,解决了深层神经网络训练误差不降反升的问题。其核心是残差块,通过拟合残差映射简化优化,输入可跨层传播。ResNet沿用VGG的3×3卷积设计,含4个残差块模块,结构简单。如ResNet-18有18层,还有更深的型号。在Fashion-MNIST上训练验证了其有效性,深刻影响了深度神经网络设计。
☞☞☞AI 智能聊天, 问答助手, AI 智能搜索, 免费无限量使用 DeepSeek R1 模型☜☜☜

让我们先思考一个问题:对神经网络模型添加新的层,充分训练后的模型是否只可能更有效地降低训练误差?理论上,原模型解的空间只是新模型解的空间的子空间。也就是说,如果我们能将新添加的层训练成恒等映射f(x)=x,新模型和原模型将同样有效。由于新模型可能得出更优的解来拟合训练数据集,因此添加层似乎更容易降低训练误差。然而在实践中,添加过多的层后训练误差往往不降反升。即使利用批量归一化带来的数值稳定性使训练深层模型更加容易,该问题仍然存在。针对这一问题,何恺明等人提出了残差网络(ResNet) [1]。它在2015年的ImageNet图像识别挑战赛夺魁,并深刻影响了后来的深度神经网络的设计。
让我们聚焦于神经网络局部。如图5.9所示,设输入为x。假设我们希望学出的理想映射为f(x),从而作为图5.9上方激活函数的输入。左图虚线框中的部分需要直接拟合出该映射f(x),而右图虚线框中的部分则需要拟合出有关恒等映射的残差映射f(x)−x。残差映射在实际中往往更容易优化。以本节开头提到的恒等映射作为我们希望学出的理想映射f(x)。我们只需将图5.9中右图虚线框内上方的加权运算(如仿射)的权重和偏差参数学成0,那么f(x)即为恒等映射。实际中,当理想映射f(x)极接近于恒等映射时,残差映射也易于捕捉恒等映射的细微波动。图5.9右图也是ResNet的基础块,即残差块(residual block)。在残差块中,输入可通过跨层的数据线路更快地向前传播。
ResNet沿用了VGG全3×3卷积层的设计。残差块里首先有2个有相同输出通道数的3×3卷积层。每个卷积层后接一个批量归一化层和ReLU激活函数。然后我们将输入跳过这2个卷积运算后直接加在最后的ReLU激活函数前。这样的设计要求2个卷积层的输出与输入形状一样,从而可以相加。如果想改变通道数,就需要引入一个额外的1×1卷积层来将输入变换成需要的形状后再做相加运算。
残差块的实现如下。它可以设定输出通道数、是否使用额外的1×1卷积层来修改通道数以及卷积层的步幅。
import paddleimport paddle.nn as nnimport numpy as npimport warnings
warnings.filterwarnings("ignore", category=Warning) # 过滤报警信息class Residual(nn.Layer):
def __init__(self, num_channels, num_filters, use_1x1conv=False, stride=1):
super(Residual, self).__init__()
self.use_1x1conv = use_1x1conv
model = [
nn.Conv2D(num_channels, num_filters, 3, stride=stride, padding=1),
nn.BatchNorm2D(num_filters),
nn.ReLU(),
nn.Conv2D(num_filters, num_filters, 3, stride=1, padding=1),
nn.BatchNorm2D(num_filters),
]
self.model = nn.Sequential(*model) if use_1x1conv:
model_1x1 = [nn.Conv2D(num_channels, num_filters, 1, stride=stride)]
self.model_1x1 = nn.Sequential(*model_1x1) def forward(self, X):
Y = self.model(X) if self.use_1x1conv:
X = self.model_1x1(X) return paddle.nn.functional.relu(X + Y)下面我们来查看输入和输出形状一致的情况。
blk = Residual(3, 3)
X = paddle.to_tensor(np.random.uniform(-1., 1., [4, 3, 6, 6]).astype('float32'))
Y = blk(X)print(Y.shape)[4, 3, 6, 6]
blk = Residual(3, 6, use_1x1conv=True, stride=2)
X = paddle.to_tensor(np.random.uniform(-1., 1., [4, 3, 6, 6]).astype('float32'))
Y = blk(X)print(Y.shape)[4, 6, 3, 3]
GoogLeNet在后面接了4个由Inception块组成的模块。ResNet则使用4个由残差块组成的模块,每个模块使用若干个同样输出通道数的残差块。第一个模块的通道数同输入通道数一致。由于之前已经使用了步幅为2的最大池化层,所以无须减小高和宽。之后的每个模块在第一个残差块里将上一个模块的通道数翻倍,并将高和宽减半。
下面我们来实现这个模块。注意,这里对第一个模块做了特别处理。
class ResnetBlock(nn.Layer):
def __init__(self, num_channels, num_filters, num_residuals, first_block=False):
super(ResnetBlock, self).__init__()
model = [] for i in range(num_residuals): if i == 0: if not first_block:
model += [Residual(num_channels, num_filters, use_1x1conv=True, stride=2)] else:
model += [Residual(num_channels, num_filters)] else:
model += [Residual(num_filters, num_filters)]
self.model = nn.Sequential(*model) def forward(self, X):
return self.model(X)class ResNet(nn.Layer):
def __init__(self, num_classes=10):
super(ResNet, self).__init__() # ResNet的前两层跟之前介绍的GoogLeNet中的一样:
# 在输出通道数为64、步幅为2的7×77卷积层后接步幅为2的3×3的最大池化层。
# 不同之处在于ResNet每个卷积层后增加的批量归一化层。
model = [
nn.Conv2D(1, 64, 7, stride=2, padding=3),
nn.BatchNorm2D(64),
nn.ReLU(),
nn.MaxPool2D(kernel_size=3, stride=2, padding=1)
] # 接着我们为ResNet加入所有残差块。这里每个模块使用2个残差块。
model += [
ResnetBlock(64, 64, 2, first_block=True),
ResnetBlock(64, 128, 2),
ResnetBlock(128, 256, 2),
ResnetBlock(256, 512, 2)
] # 最后,与GoogLeNet一样,加入全局平均池化层后接上全连接层输出。
model += [
nn.AdaptiveAvgPool2D(output_size=1),
nn.Flatten(start_axis=1, stop_axis=-1),
nn.Linear(512, num_classes),
]
self.model = nn.Sequential(*model) def forward(self, X):
Y = self.model(X) return Y
rn = ResNet()
X = paddle.to_tensor(np.random.uniform(-1., 1., [4, 1, 96, 96]).astype('float32'))
Y = rn(X)print(Y.shape)[4, 10]
这里每个模块里有4个卷积层(不计算1×1卷积层),加上最开始的卷积层和最后的全连接层,共计18层。这个模型通常也被称为ResNet-18。通过配置不同的通道数和模块里的残差块数可以得到不同的ResNet模型,例如更深的含152层的ResNet-152。虽然ResNet的主体架构跟GoogLeNet的类似,但ResNet结构更简单,修改也更方便。这些因素都导致了ResNet迅速被广泛使用。
在训练ResNet之前,我们来观察一下输入形状在ResNet不同模块之间的变化。
resnet = ResNet(10) param_info = paddle.summary(resnet, (1, 1, 96, 96))print(param_info)
-------------------------------------------------------------------------------
Layer (type) Input Shape Output Shape Param #
===============================================================================
Conv2D-26 [[1, 1, 96, 96]] [1, 64, 48, 48] 3,200
BatchNorm2D-22 [[1, 64, 48, 48]] [1, 64, 48, 48] 256
ReLU-12 [[1, 64, 48, 48]] [1, 64, 48, 48] 0
MaxPool2D-2 [[1, 64, 48, 48]] [1, 64, 24, 24] 0
Conv2D-27 [[1, 64, 24, 24]] [1, 64, 24, 24] 36,928
BatchNorm2D-23 [[1, 64, 24, 24]] [1, 64, 24, 24] 256
ReLU-13 [[1, 64, 24, 24]] [1, 64, 24, 24] 0
Conv2D-28 [[1, 64, 24, 24]] [1, 64, 24, 24] 36,928
BatchNorm2D-24 [[1, 64, 24, 24]] [1, 64, 24, 24] 256
Residual-11 [[1, 64, 24, 24]] [1, 64, 24, 24] 0
Conv2D-29 [[1, 64, 24, 24]] [1, 64, 24, 24] 36,928
BatchNorm2D-25 [[1, 64, 24, 24]] [1, 64, 24, 24] 256
ReLU-14 [[1, 64, 24, 24]] [1, 64, 24, 24] 0
Conv2D-30 [[1, 64, 24, 24]] [1, 64, 24, 24] 36,928
BatchNorm2D-26 [[1, 64, 24, 24]] [1, 64, 24, 24] 256
Residual-12 [[1, 64, 24, 24]] [1, 64, 24, 24] 0
ResnetBlock-5 [[1, 64, 24, 24]] [1, 64, 24, 24] 0
Conv2D-31 [[1, 64, 24, 24]] [1, 128, 12, 12] 73,856
BatchNorm2D-27 [[1, 128, 12, 12]] [1, 128, 12, 12] 512
ReLU-15 [[1, 128, 12, 12]] [1, 128, 12, 12] 0
Conv2D-32 [[1, 128, 12, 12]] [1, 128, 12, 12] 147,584
BatchNorm2D-28 [[1, 128, 12, 12]] [1, 128, 12, 12] 512
Conv2D-33 [[1, 64, 24, 24]] [1, 128, 12, 12] 8,320
Residual-13 [[1, 64, 24, 24]] [1, 128, 12, 12] 0
Conv2D-34 [[1, 128, 12, 12]] [1, 128, 12, 12] 147,584
BatchNorm2D-29 [[1, 128, 12, 12]] [1, 128, 12, 12] 512
ReLU-16 [[1, 128, 12, 12]] [1, 128, 12, 12] 0
Conv2D-35 [[1, 128, 12, 12]] [1, 128, 12, 12] 147,584
BatchNorm2D-30 [[1, 128, 12, 12]] [1, 128, 12, 12] 512
Residual-14 [[1, 128, 12, 12]] [1, 128, 12, 12] 0
ResnetBlock-6 [[1, 64, 24, 24]] [1, 128, 12, 12] 0
Conv2D-36 [[1, 128, 12, 12]] [1, 256, 6, 6] 295,168
BatchNorm2D-31 [[1, 256, 6, 6]] [1, 256, 6, 6] 1,024
ReLU-17 [[1, 256, 6, 6]] [1, 256, 6, 6] 0
Conv2D-37 [[1, 256, 6, 6]] [1, 256, 6, 6] 590,080
BatchNorm2D-32 [[1, 256, 6, 6]] [1, 256, 6, 6] 1,024
Conv2D-38 [[1, 128, 12, 12]] [1, 256, 6, 6] 33,024
Residual-15 [[1, 128, 12, 12]] [1, 256, 6, 6] 0
Conv2D-39 [[1, 256, 6, 6]] [1, 256, 6, 6] 590,080
BatchNorm2D-33 [[1, 256, 6, 6]] [1, 256, 6, 6] 1,024
ReLU-18 [[1, 256, 6, 6]] [1, 256, 6, 6] 0
Conv2D-40 [[1, 256, 6, 6]] [1, 256, 6, 6] 590,080
BatchNorm2D-34 [[1, 256, 6, 6]] [1, 256, 6, 6] 1,024
Residual-16 [[1, 256, 6, 6]] [1, 256, 6, 6] 0
ResnetBlock-7 [[1, 128, 12, 12]] [1, 256, 6, 6] 0
Conv2D-41 [[1, 256, 6, 6]] [1, 512, 3, 3] 1,180,160
BatchNorm2D-35 [[1, 512, 3, 3]] [1, 512, 3, 3] 2,048
ReLU-19 [[1, 512, 3, 3]] [1, 512, 3, 3] 0
Conv2D-42 [[1, 512, 3, 3]] [1, 512, 3, 3] 2,359,808
BatchNorm2D-36 [[1, 512, 3, 3]] [1, 512, 3, 3] 2,048
Conv2D-43 [[1, 256, 6, 6]] [1, 512, 3, 3] 131,584
Residual-17 [[1, 256, 6, 6]] [1, 512, 3, 3] 0
Conv2D-44 [[1, 512, 3, 3]] [1, 512, 3, 3] 2,359,808
BatchNorm2D-37 [[1, 512, 3, 3]] [1, 512, 3, 3] 2,048
ReLU-20 [[1, 512, 3, 3]] [1, 512, 3, 3] 0
Conv2D-45 [[1, 512, 3, 3]] [1, 512, 3, 3] 2,359,808
BatchNorm2D-38 [[1, 512, 3, 3]] [1, 512, 3, 3] 2,048
Residual-18 [[1, 512, 3, 3]] [1, 512, 3, 3] 0
ResnetBlock-8 [[1, 256, 6, 6]] [1, 512, 3, 3] 0
AdaptiveAvgPool2D-2 [[1, 512, 3, 3]] [1, 512, 1, 1] 0
Flatten-3 [[1, 512, 1, 1]] [1, 512] 0
Linear-2 [[1, 512]] [1, 10] 5,130
===============================================================================
Total params: 11,186,186
Trainable params: 11,170,570
Non-trainable params: 15,616
-------------------------------------------------------------------------------
Input size (MB): 0.04
Forward/backward pass size (MB): 10.77
Params size (MB): 42.67
Estimated Total Size (MB): 53.47
-------------------------------------------------------------------------------
{'total_params': 11186186, 'trainable_params': 11170570}下面我们在Fashion-MNIST数据集上训练ResNet。
import paddleimport paddle.vision.transforms as Tfrom paddle.vision.datasets import FashionMNIST# 数据集处理transform = T.Compose([
T.Resize(96),
T.Transpose(),
T.Normalize([127.5], [127.5]),
])
train_dataset = FashionMNIST(mode='train', transform=transform)
val_dataset = FashionMNIST(mode='test', transform=transform)# 模型定义model = paddle.Model(ResNet(10))# 设置训练模型所需的optimizer, loss, metricmodel.prepare(
paddle.optimizer.Adam(learning_rate=0.001, parameters=model.parameters()),
paddle.nn.CrossEntropyLoss(),
paddle.metric.Accuracy(topk=(1, 5)))# 启动训练、评估model.fit(train_dataset, val_dataset, epochs=2, batch_size=64, log_freq=100)The loss value printed in the log is the current step, and the metric is the average value of previous step. Epoch 1/2 step 100/938 - loss: 0.3613 - acc_top1: 0.7612 - acc_top5: 0.9820 - 3s/step step 200/938 - loss: 0.4060 - acc_top1: 0.7972 - acc_top5: 0.9881 - 3s/step step 300/938 - loss: 0.3635 - acc_top1: 0.8171 - acc_top5: 0.9908 - 3s/step step 400/938 - loss: 0.3369 - acc_top1: 0.8292 - acc_top5: 0.9926 - 3s/step step 500/938 - loss: 0.2733 - acc_top1: 0.8390 - acc_top5: 0.9937 - 3s/step step 600/938 - loss: 0.1964 - acc_top1: 0.8469 - acc_top5: 0.9943 - 3s/step
以上就是《动手学深度学习》Paddle 版源码-5.11章(ResNet)的详细内容,更多请关注php中文网其它相关文章!
每个人都需要一台速度更快、更稳定的 PC。随着时间的推移,垃圾文件、旧注册表数据和不必要的后台进程会占用资源并降低性能。幸运的是,许多工具可以让 Windows 保持平稳运行。
Copyright 2014-2025 https://www.php.cn/ All Rights Reserved | php.cn | 湘ICP备2023035733号