稠密连接网络(DenseNet)是由 ResNet 跨层连接设计引申而来。它的模块间通过在通道维连结输出,实现稠密连接。DenseNet 主要由稠密块和过渡层构成,前者实现输入输出的连结,后者控制通道数。文中还给出了 DenseNet 在 PaddlePaddle 中的实现代码及训练过程。
☞☞☞AI 智能聊天, 问答助手, AI 智能搜索, 免费无限量使用 DeepSeek R1 模型☜☜☜

ResNet中的跨层连接设计引申出了数个后续工作。本节我们介绍其中的一个:稠密连接网络(DenseNet) [1]。 它与ResNet的主要区别如图5.10所示。
图5.10中将部分前后相邻的运算抽象为模块A和模块B。与ResNet的主要区别在于,DenseNet里模块B的输出不是像ResNet那样和模块A的输出相加,而是在通道维上连结。这样模块A的输出可以直接传入模块B后面的层。在这个设计里,模块A直接跟模块B后面的所有层连接在了一起。这也是它被称为“稠密连接”的原因。
DenseNet的主要构建模块是稠密块(dense block)和过渡层(transition layer)。前者定义了输入和输出是如何连结的,后者则用来控制通道数,使之不过大。
DenseNet使用了ResNet改良版的“批量归一化、激活和卷积”结构(参见上一节的练习),我们首先在BNConv函数里实现这个结构。
import paddleimport paddle.nn as nnimport numpy as npimport warnings
warnings.filterwarnings("ignore", category=Warning) # 过滤报警信息class BNConv(nn.Layer):
def __init__(self, num_channels, num_filters):
super(BNConv, self).__init__()
model = [
nn.BatchNorm2D(num_channels),
nn.ReLU(),
nn.Conv2D(num_channels, num_filters, 3, stride=1, padding=1)
]
self.model = nn.Sequential(*model) def forward(self, X):
return self.model(X)class DenseBlock(nn.Layer):
def __init__(self, num_channels, num_layers, growth_rate):
super(DenseBlock, self).__init__()
self.dense_blocks = [] for i in range(num_layers):
block = self.add_sublayer(str(i), BNConv(num_channels + i * growth_rate, growth_rate))
self.dense_blocks.append(block) def forward(self, X):
for block in self.dense_blocks:
X = paddle.concat([X, block(X)], axis=1) return X在下面的例子中,我们定义一个有2个输出通道数为10的卷积块。使用通道数为3的输入时,我们会得到通道数为3+2×10=23的输出。卷积块的通道数控制了输出通道数相对于输入通道数的增长,因此也被称为增长率(growth rate)。
blk = DenseBlock(3, 2, 10)
X = paddle.to_tensor(np.random.uniform(-1., 1., [4, 3, 8, 8]).astype('float32'))
Y = blk(X)print(Y.shape)[4, 23, 8, 8]
由于每个稠密块都会带来通道数的增加,使用过多则会带来过于复杂的模型。过渡层用来控制模型复杂度。它通过1×1卷积层来减小通道数,并使用步幅为2的平均池化层减半高和宽,从而进一步降低模型复杂度。
class TransitionLayer(nn.Layer):
def __init__(self, num_channels, num_filters):
super(TransitionLayer, self).__init__()
model = [
nn.BatchNorm2D(num_channels),
nn.ReLU(),
nn.Conv2D(num_channels, num_filters, 1, stride=1),
nn.AvgPool2D(kernel_size=2, stride=2)
]
self.model = nn.Sequential(*model) def forward(self, X):
return self.model(X)对上一个例子中稠密块的输出使用通道数为10的过渡层。此时输出的通道数减为10,高和宽均减半。
blk = TransitionLayer(23, 10) Y2 = blk(Y)print(Y2.shape)
[4, 10, 4, 4]
我们来构造DenseNet模型。
class DenseNet(nn.Layer):
def __init__(self, num_classes=10):
super(DenseNet, self).__init__() # DenseNet首先使用同ResNet一样的单卷积层和最大池化层。
model = [
nn.Conv2D(1, 64, 7, stride=2, padding=3),
nn.BatchNorm2D(64),
nn.ReLU(),
nn.MaxPool2D(kernel_size=3, stride=2, padding=1)
] # 类似于ResNet接下来使用的4个残差块,DenseNet使用的是4个稠密块。
# 同ResNet一样,我们可以设置每个稠密块使用多少个卷积层。
# 这里我们设成4,从而与上一节的ResNet-18保持一致。
# 稠密块里的卷积层通道数(即增长率)设为32,所以每个稠密块将增加128个通道。
# ResNet里通过步幅为2的残差块在每个模块之间减小高和宽。这里我们则使用过渡层来减半高和宽,并减半通道数。
num_channels, growth_rate = 64, 32 # num_channels为当前的通道数
num_convs_in_dense_blocks = [4, 4, 4, 4] for i, num_convs in enumerate(num_convs_in_dense_blocks):
model += [DenseBlock(num_channels, num_convs, growth_rate)] # 上一个稠密块的输出通道数
num_channels += num_convs * growth_rate # 在稠密块之间加入通道数减半的过渡层
if i != len(num_convs_in_dense_blocks) - 1:
model += [TransitionLayer(num_channels, num_channels // 2)]
num_channels //= 2
# 同ResNet一样,最后接上全局池化层和全连接层来输出。
model += [
nn.AdaptiveAvgPool2D(output_size=1),
nn.Flatten(start_axis=1, stop_axis=-1),
nn.Linear(num_channels, num_classes),
]
self.model = nn.Sequential(*model) def forward(self, X):
Y = self.model(X) return Y
dn = DenseNet(10)
X = paddle.to_tensor(np.random.uniform(-1., 1., [4, 1, 96, 96]).astype('float32'))
Y = dn(X)print(Y.shape)[4, 10]
import paddleimport paddle.vision.transforms as Tfrom paddle.vision.datasets import FashionMNIST# 数据集处理transform = T.Compose([
T.Resize(96),
T.Transpose(),
T.Normalize([127.5], [127.5]),
])
train_dataset = FashionMNIST(mode='train', transform=transform)
val_dataset = FashionMNIST(mode='test', transform=transform)# 模型定义model = paddle.Model(DenseNet(10))# 设置训练模型所需的optimizer, loss, metricmodel.prepare(
paddle.optimizer.Adam(learning_rate=0.001, parameters=model.parameters()),
paddle.nn.CrossEntropyLoss(),
paddle.metric.Accuracy(topk=(1, 5)))# 启动训练、评估model.fit(train_dataset, val_dataset, epochs=2, batch_size=64, log_freq=100)The loss value printed in the log is the current step, and the metric is the average value of previous step. Epoch 1/2 step 100/938 - loss: 0.4821 - acc_top1: 0.7208 - acc_top5: 0.9803 - 2s/step step 200/938 - loss: 0.3669 - acc_top1: 0.7677 - acc_top5: 0.9875 - 2s/step step 300/938 - loss: 0.3581 - acc_top1: 0.7905 - acc_top5: 0.9905 - 2s/step step 400/938 - loss: 0.3215 - acc_top1: 0.8042 - acc_top5: 0.9922 - 2s/step step 500/938 - loss: 0.3757 - acc_top1: 0.8154 - acc_top5: 0.9933 - 2s/step step 600/938 - loss: 0.2171 - acc_top1: 0.8244 - acc_top5: 0.9940 - 2s/step step 700/938 - loss: 0.2634 - acc_top1: 0.8314 - acc_top5: 0.9946 - 2s/step step 800/938 - loss: 0.5456 - acc_top1: 0.8378 - acc_top5: 0.9950 - 2s/step step 900/938 - loss: 0.1972 - acc_top1: 0.8429 - acc_top5: 0.9955 - 2s/step step 938/938 - loss: 0.3683 - acc_top1: 0.8438 - acc_top5: 0.9955 - 2s/step Eval begin... The loss value printed in the log is the current batch, and the metric is the average value of previous step. step 100/157 - loss: 0.2342 - acc_top1: 0.8733 - acc_top5: 0.9972 - 584ms/step step 157/157 - loss: 0.2860 - acc_top1: 0.8744 - acc_top5: 0.9976 - 582ms/step Eval samples: 10000 Epoch 2/2 step 100/938 - loss: 0.2843 - acc_top1: 0.8853 - acc_top5: 0.9988 - 2s/step step 200/938 - loss: 0.3403 - acc_top1: 0.8915 - acc_top5: 0.9990 - 2s/step step 300/938 - loss: 0.1578 - acc_top1: 0.8936 - acc_top5: 0.9991 - 2s/step step 400/938 - loss: 0.0841 - acc_top1: 0.8938 - acc_top5: 0.9989 - 2s/step step 500/938 - loss: 0.3375 - acc_top1: 0.8961 - acc_top5: 0.9988 - 2s/step step 600/938 - loss: 0.2915 - acc_top1: 0.8964 - acc_top5: 0.9988 - 2s/step step 700/938 - loss: 0.1506 - acc_top1: 0.8967 - acc_top5: 0.9988 - 2s/step step 800/938 - loss: 0.4000 - acc_top1: 0.8976 - acc_top5: 0.9988 - 2s/step step 900/938 - loss: 0.1751 - acc_top1: 0.8983 - acc_top5: 0.9987 - 2s/step step 938/938 - loss: 0.3205 - acc_top1: 0.8985 - acc_top5: 0.9987 - 2s/step Eval begin... The loss value printed in the log is the current batch, and the metric is the average value of previous step. step 100/157 - loss: 0.1528 - acc_top1: 0.8883 - acc_top5: 0.9989 - 592ms/step step 157/157 - loss: 0.1586 - acc_top1: 0.8918 - acc_top5: 0.9990 - 596ms/step Eval samples: 10000
以上就是《动手学深度学习》Paddle 版源码-5.12章(DenseNet)的详细内容,更多请关注php中文网其它相关文章!
每个人都需要一台速度更快、更稳定的 PC。随着时间的推移,垃圾文件、旧注册表数据和不必要的后台进程会占用资源并降低性能。幸运的是,许多工具可以让 Windows 保持平稳运行。
Copyright 2014-2025 https://www.php.cn/ All Rights Reserved | php.cn | 湘ICP备2023035733号