本文先讲解注意力机制,包括定义、意义、公式、计算方式,以及自注意力和多头注意力机制;接着阐述Encoder-Decoder原理;再介绍Transformer原理,包括整体架构、Encoder、Decoder等;最后用飞桨框架实现ViT模型进行车辆图像分类,展示了数据处理、模型构建、训练及评估过程,验证集准确率达0.8416667。
☞☞☞AI 智能聊天, 问答助手, AI 智能搜索, 免费无限量使用 DeepSeek R1 模型☜☜☜

注意力机制(Attention Mechanism)是机器学习中的一种数据处理方法,广泛应用在自然语言处理、图像识别及语音识别等各种不同类型的机器学习任务中。
注意力这个词本来是属于人类才有的动作。也就是说,注意力机制可以看做是一种仿生,是机器通过对人类阅读、听说中的注意力行为进行模拟。那为何要对注意力进行仿生,按理说,计算机理应拥有无限的记忆力和注意力。这是因为,人脑在进行阅读任务、读图任务时,并不是严格的解码过程,而是接近于一种模式识别。大脑会自动忽略低可能、低价值的信息。如果计算机不能模拟人类的注意力状态,就可能让无关的信息对处理结果造成干扰,最后导致处理结果偏离实际的应用场景。为了让计算机更加适应人类交流场景,必须教会计算机选择遗忘和关联上下文,这种机制就是所谓的注意力机制。
Google 2017年论文中,Attention Is All You Need曾经为Attention做了一个抽象定义:
Attention(Q,K,V)=softmax(\frac{QK^T}{\sqrt{d_k}})V
注意力是将一个查询和键值对映射到输出的方法,Q、K、V均为向量,输出通过对V进行加权求和得到,权重就是Q、K相似度。
典型的attention包含Q K V三部分:Q是Quer指代输入的信息,Key和Value成组出现,通常是源语言、原始文本等已有的信息。通过计算Q与K的相关性(矩阵的知识,相关性计算下面讲述两种处理方式),得出不同的K对输出的重要程度,再与对应的V相乘求和,所得到的输出attention的值。(如下图所示)
看到这,大家也许开始产生一些疑问,究竟上图中的a值(即Q与K的相关性)是如何计算的呢,下面我将讲解两种方式:Dot-product、Additive













Attention(Q,K,V)=softmax(\frac{QK^T}{\sqrt{d_k}})V






分为3个部分





直接把图像分成固定大小的patchs,然后通过线性变换得到patch embedding,这就类比NLP的words和word embedding,由于transformer的输入就是a sequence of token embeddings,所以将图像的patch embeddings送入transformer后就能够进行特征提取从而分类了。
图片分块,展开,做线性变换(形成与时间序列一致的输入)
在块序列首位置添加虚拟开始块,用作后续的图像分类特征
使用Transformer-Encoder进行块编码
虚拟块表示作为分类向量,通过MLP进行分类

利用飞桨框架搭建一个Vision Transformer模型,包含不同车辆的图像进行分类。标签值说明:1=“汽车”',2=“摩托车”,3=“货车”。
import osimport zipfileimport randomimport paddlefrom paddle import fluidimport numpy as npimport matplotlib.pyplot as pltimport PIL.Image as Imagefrom paddle.io import Dataset import numpy as np import sysimport paddle.nn as nnfrom multiprocessing import cpu_count from paddle.nn import MaxPool2D,Conv2D,BatchNorm2Dfrom paddle.nn import Linear import randomfrom paddle.nn.initializer import TruncatedNormal, Constant from paddle.nn import TransformerEncoderLayer, TransformerEncoderfrom paddle.regularizer import L2Decay
/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/matplotlib/__init__.py:107: DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated, and in 3.8 it will stop working from collections import MutableMapping /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/matplotlib/rcsetup.py:20: DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated, and in 3.8 it will stop working from collections import Iterable, Mapping /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/matplotlib/colors.py:53: DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated, and in 3.8 it will stop working from collections import Sized
'''
参数配置
'''train_parameters = { "input_size": [3, 120, 120], #输入图片的shape
"class_dim": 3, #分类数
"src_path":"/home/aistudio/data/data93289/Data.zip", #原始数据集路径
"target_path":"/home/aistudio/work/", #要解压的路径
"train_list_path": "/home/aistudio/data/train.txt", #train.txt路径
"eval_list_path": "/home/aistudio/data/eval.txt", #eval.txt路径
"label_dict":{'0':'汽车','1':'摩托车','2':'货车'}, #标签字典
"num_epochs": 40, #训练轮数
"train_batch_size": 32, #训练时每个批次的大小
"learning_strategy": { #优化函数相关的配置
"lr": 1.0e-5 #超参数学习率
},
'skip_steps': 50, #每N个批次打印一次结果
'save_steps': 500, #每N个批次保存一次模型参数
"checkpoints": "/home/aistudio/work/checkpoints" #保存的路径}def seed_paddle(seed=1024):
seed = int(seed)
random.seed(seed)
os.environ['PYTHONHASHSEED'] = str(seed)
np.random.seed(seed)
paddle.seed(seed)
seed_paddle(seed=1024)(1)解压原始数据集,查看数据
(2)按照比例划分训练集与验证集
(3)乱序,生成数据列表
(4)定义数据读取器
# 解压原始数据集def unzip_data(src_path,target_path):
'''
解压原始数据集,将src_path路径下的zip包解压至target_path目录下
'''
if(not os.path.isdir(os.path.join(target_path,'Data'))):
z = zipfile.ZipFile(src_path, 'r')
z.extractall(path=target_path)
z.close() print('数据集解压完成') else: print('文件已存在')def get_data_list(target_path, train_list_path, eval_list_path):
'''
生成数据列表
'''
data_dir = 'work/Data'
all_data_list = [] for im in os.listdir(data_dir):
img_path = os.path.join(data_dir, im)
img_label = str(int(im.split('_')[0])-1)
all_data_list.append(img_path + '\t' + img_label + '\n') # 对训练列表进行乱序
random.shuffle(all_data_list)
with open(train_list_path, 'a') as f1: with open(eval_list_path, 'a') as f2: for ind, img_path_label in enumerate(all_data_list): #划分测试集和训练集
if ind % 10 == 0:
f2.write(img_path_label)
else:
f1.write(img_path_label) print ('生成数据列表完成!')class dataset(Dataset):
def __init__(self, data_path, mode='train'):
"""
数据读取器
:param data_path: 数据集所在路径
:param mode: train or eval
"""
super().__init__()
self.data_path = data_path
self.img_paths = []
self.labels = [] if mode == 'train': with open(os.path.join(self.data_path, "train.txt"), "r", encoding="utf-8") as f:
self.info = f.readlines() for img_info in self.info:
img_path, label = img_info.strip().split('\t')
self.img_paths.append(img_path)
self.labels.append(int(label)) else: with open(os.path.join(self.data_path, "eval.txt"), "r", encoding="utf-8") as f:
self.info = f.readlines() for img_info in self.info:
img_path, label = img_info.strip().split('\t')
self.img_paths.append(img_path)
self.labels.append(int(label)) def __getitem__(self, index):
"""
获取一组数据
:param index: 文件索引号
:return:
"""
# 第一步打开图像文件并获取label值
img_path = self.img_paths[index]
img = Image.open(img_path) if img.mode != 'RGB':
img = img.convert('RGB')
img = np.array(img).astype('float32')
img = img.transpose((2, 0, 1)) / 255
label = self.labels[index]
label = np.array([label], dtype="int64") return img, label def print_sample(self, index: int = 0):
print("文件名", self.img_paths[index], "\t标签值", self.labels[index]) def __len__(self):
return len(self.img_paths)#参数初始化src_path=train_parameters['src_path'] target_path=train_parameters['target_path'] train_list_path=train_parameters['train_list_path'] eval_list_path=train_parameters['eval_list_path']#解压原始数据到指定路径unzip_data(src_path,target_path)
文件已存在
#查看数据path1='/home/aistudio/work/Data/1_96.png'path2='/home/aistudio/work/Data/2_832.png'path3='/home/aistudio/work/Data/3_1398.png'# 汽车img1 = Image.open(path1) plt.imshow(img1) #根据数组绘制图像plt.show()# 摩托车img2 = Image.open(path2) plt.imshow(img2) #根据数组绘制图像plt.show() #显示图像# 货车img3 = Image.open(path3) plt.imshow(img3) #根据数组绘制图像plt.show() #显示图像
<Figure size 432x288 with 1 Axes>
<Figure size 432x288 with 1 Axes>
<Figure size 432x288 with 1 Axes>
#每次生成数据列表前,首先清空train.txt和eval.txtwith open(train_list_path, 'w') as f:
f.seek(0)
f.truncate()
with open(eval_list_path, 'w') as f:
f.seek(0)
f.truncate()
#生成数据列表 get_data_list(target_path,train_list_path,eval_list_path)#训练数据加载train_dataset = dataset('/home/aistudio/data',mode='train')
train_loader = paddle.io.DataLoader(train_dataset,
batch_size=train_parameters['train_batch_size'],
shuffle=True
)#测试数据加载eval_dataset = dataset('/home/aistudio/data',mode='eval')
eval_loader = paddle.io.DataLoader(eval_dataset,
batch_size=train_parameters['train_batch_size'],
shuffle=False
)生成数据列表完成!
# 定义ViTtrunc_normal_ = TruncatedNormal(std=.02)
zeros_ = Constant(value=0.)
ones_ = Constant(value=1.)
dim =256heads=16patch_size = 8num_layers = 3num_patch = int((120/patch_size) * (120/patch_size))# x[int] -> tuple(x, x)def to_2tuple(x):
return tuple([x] * 2)# 独立层,即什么操作都没有的网络层class Identity(nn.Layer):
def __init__(self):
super(Identity, self).__init__() def forward(self, input):
return inputclass PatchEmbed(nn.Layer):
def __init__(self, img_size=120, patch_size=patch_size, in_chans=3, embed_dim=dim):
super().__init__()
img_size = to_2tuple(img_size)
patch_size = to_2tuple(patch_size)
num_patches = (img_size[1] // patch_size[1]) * \
(img_size[0] // patch_size[0])
self.img_size = img_size
self.patch_size = patch_size
self.num_patches = num_patches
self.proj = nn.Conv2D(in_chans, embed_dim,
kernel_size=patch_size, stride=patch_size) def forward(self, x):
B, C, H, W = x.shape # 分块线性变换 + 向量展平 + 维度转置
x = self.proj(x).flatten(2).transpose((0, 2, 1)) return xclass Attention(nn.Layer):
def __init__(self, dim, num_heads=8, qkv_bias=False, qk_scale=None, attn_drop=0., proj_drop=0.):
super().__init__()
self.num_heads = num_heads
head_dim = dim // num_heads
self.scale = qk_scale or head_dim ** -0.5
self.qkv = nn.Linear(dim, dim * 3, bias_attr=qkv_bias)
self.attn_drop = nn.Dropout(attn_drop)
self.proj = nn.Linear(dim, dim)
self.proj_drop = nn.Dropout(proj_drop) def forward(self, x):
B, N, C = x.shape # 线性变换
qkv = self.qkv(x).reshape((B, N, 3, self.num_heads, C //
self.num_heads)).transpose((2, 0, 3, 1, 4))
# 分割 query key value
q, k, v = qkv[0], qkv[1], qkv[2]
# Matmul + Scale
attn = (q.matmul(k.transpose((0, 1, 3, 2)))) * self.scale # SoftMax
attn = nn.functional.softmax(attn, axis=-1)
# Attention Dropout
attn = self.attn_drop(attn)
# Matmul
x = (attn.matmul(v)).transpose((0, 2, 1, 3)).reshape((B, N, C)) # 线性变换
x = self.proj(x) # Linear Dropout
x = self.proj_drop(x) return xclass Mlp(nn.Layer):
def __init__(self, in_features, hidden_features=None, out_features=None, act_layer=nn.GELU, drop=0.):
super().__init__()
out_features = out_features or in_features
hidden_features = hidden_features or in_features
self.fc1 = nn.Linear(in_features, hidden_features)
self.act = act_layer()
self.fc2 = nn.Linear(hidden_features, out_features)
self.drop = nn.Dropout(drop) def forward(self, x):
x = self.fc1(x)
x = self.act(x)
x = self.drop(x)
x = self.fc2(x)
x = self.drop(x) return xdef drop_path(x, drop_prob=0., training=False):
"""Drop paths (Stochastic Depth) per sample (when applied in main path of residual blocks).
the original name is misleading as 'Drop Connect' is a different form of dropout in a separate paper...
"""
if drop_prob == 0. or not training: return x
keep_prob = paddle.to_tensor(1 - drop_prob)
shape = (x.shape[0],) + (1,) * (x.ndim - 1)
random_tensor = keep_prob + paddle.rand(shape, dtype=x.dtype)
random_tensor = paddle.floor(random_tensor) # binarize
output = x.divide(keep_prob) * random_tensor return outputclass DropPath(nn.Layer):
"""Drop paths (Stochastic Depth) per sample (when applied in main path of residual blocks).
"""
def __init__(self, drop_prob=None):
super(DropPath, self).__init__()
self.drop_prob = drop_prob def forward(self, x):
return drop_path(x, self.drop_prob, self.training)# Block类实现Transformer encoder的一个层class Block(nn.Layer):
# 定义Transformer层
def __init__(self, dim, num_heads, mlp_ratio=4., qkv_bias=False, qk_scale=None, drop=0., attn_drop=0.,
drop_path=0., act_layer=nn.GELU, norm_layer='nn.LayerNorm', epsilon=1e-5):
super().__init__() # 此处添加代码
self.norml = eval(norm_layer)(dim, epsilon=epsilon)
self.attn = Attention(
dim, num_heads=num_heads, qkv_bias=qkv_bias,
qk_scale=qk_scale, attn_drop=attn_drop, proj_drop=drop
)
self.drop_path = DropPath(drop_path) if drop_path > 0. else Identity()
self.norm2 = eval(norm_layer)(dim, epsilon=epsilon)
mlp_hidden_dim = int(dim * mlp_ratio)
self.mlp = Mlp(in_features=dim, hidden_features=mlp_hidden_dim, act_layer=act_layer, drop=drop) def forward(self, x):
# 此处添加代码
x = x + self.drop_path(self.attn(self.norml(x)))
x = x + self.drop_path(self.mlp(self.norm2(x))) return xclass VisionTransformer(nn.Layer):
def __init__(self, img_size=120, patch_size=patch_size, in_chans=3, class_dim=train_parameters['class_dim'], embed_dim=dim, depth=num_layers,
num_heads=heads, mlp_ratio=4, qkv_bias=False, qk_scale=None, drop_rate=0., attn_drop_rate=0.,
drop_path_rate=0., norm_layer='nn.LayerNorm', epsilon=1e-5, **args):
super().__init__()
self.class_dim = class_dim
self.num_features = self.embed_dim = embed_dim
self.patch_embed = PatchEmbed(
img_size=img_size, patch_size=patch_size, in_chans=in_chans, embed_dim=embed_dim)
num_patches = self.patch_embed.num_patches
self.pos_embed = self.create_parameter(
shape=(1, num_patches + 1, embed_dim), default_initializer=zeros_)
self.add_parameter("pos_embed", self.pos_embed)
self.cls_token = self.create_parameter(
shape=(1, 1, embed_dim), default_initializer=zeros_)
self.add_parameter("cls_token", self.cls_token)
self.pos_drop = nn.Dropout(p=drop_rate)
dpr = [x for x in paddle.linspace(0, drop_path_rate, depth)]
self.blocks = nn.LayerList([
Block(
dim=embed_dim, num_heads=num_heads, mlp_ratio=mlp_ratio, qkv_bias=qkv_bias, qk_scale=qk_scale,
drop=drop_rate, attn_drop=attn_drop_rate, drop_path=dpr[i], norm_layer=norm_layer, epsilon=epsilon) for i in range(depth)])
self.norm = eval(norm_layer)(embed_dim, epsilon=epsilon) # Classifier head
self.head = nn.Linear(
embed_dim, class_dim) if class_dim > 0 else Identity()
trunc_normal_(self.pos_embed)
trunc_normal_(self.cls_token)
self.apply(self._init_weights)
# 参数初始化
def _init_weights(self, m):
if isinstance(m, nn.Linear):
trunc_normal_(m.weight) if isinstance(m, nn.Linear) and m.bias is not None:
zeros_(m.bias) elif isinstance(m, nn.LayerNorm):
zeros_(m.bias)
ones_(m.weight)
# 获取图像特征
def forward_features(self, x):
B = x.shape[0]
# Image Patch Embedding
x = self.patch_embed(x)
# 分类 tokens
cls_tokens = self.cls_token.expand((B, -1, -1))
# 拼接 Embedding 和 分类 tokens
x = paddle.concat((cls_tokens, x), axis=1)
# 加入位置嵌入 Position Embedding
x = x + self.pos_embed
# Embedding Dropout
x = self.pos_drop(x) # Transformer Encoder
# 由多个基础模块组成
for blk in self.blocks:
x = blk(x)
# Norm
x = self.norm(x)
# 提取分类 tokens 的输出
return x[:, 0]
def forward(self, x):
x = paddle.reshape(x, shape=[-1, 3,120,120]) # 获取图像特征
x = self.forward_features(x)
# 图像分类
x = self.head(x)
return xdef draw_process(title,color,iters,data,label):
plt.title(title, fontsize=24)
plt.xlabel("iter", fontsize=20)
plt.ylabel(label, fontsize=20)
plt.plot(iters, data,color=color,label=label)
plt.legend()
plt.grid()
plt.show()model = VisionTransformer()
model.train()
cross_entropy = paddle.nn.CrossEntropyLoss()
optimizer = paddle.optimizer.Adam(learning_rate=0.0001, parameters=model.parameters())
steps = 0Iters, total_loss, total_acc = [], [], []for epo in range(train_parameters['num_epochs']): for _, data in enumerate(train_loader()):
steps += 1
x_data = data[0]
y_data = data[1]
predicts = model(x_data)
loss = cross_entropy(predicts, y_data)
acc = paddle.metric.accuracy(predicts, y_data)
loss.backward()
optimizer.step()
optimizer.clear_grad() if steps % train_parameters["skip_steps"] == 0:
Iters.append(steps)
total_loss.append(loss.numpy()[0])
total_acc.append(acc.numpy()[0]) #打印中间过程
print('epo: {}, step: {}, loss is: {}, acc is: {}'\
.format(epo, steps, loss.numpy(), acc.numpy())) #保存模型参数
if steps % train_parameters["save_steps"] == 0:
save_path = train_parameters["checkpoints"]+"/"+"save_dir_" + str(steps) + '.pdparams'
print('save model to: ' + save_path)
paddle.save(model.state_dict(),save_path)
paddle.save(model.state_dict(),train_parameters["checkpoints"]+"/"+"save_dir_final.pdparams")
draw_process("trainning loss","red",Iters,total_loss,"trainning loss")
draw_process("trainning acc","green",Iters,total_acc,"trainning acc")epo: 1, step: 50, loss is: [1.0511264], acc is: [0.375] epo: 2, step: 100, loss is: [1.0749162], acc is: [0.375] epo: 3, step: 150, loss is: [0.88865685], acc is: [0.65625] epo: 4, step: 200, loss is: [0.5504421], acc is: [0.78125] epo: 5, step: 250, loss is: [0.48934796], acc is: [0.8125] epo: 6, step: 300, loss is: [0.2699679], acc is: [0.875] epo: 7, step: 350, loss is: [0.11708505], acc is: [0.9375] epo: 8, step: 400, loss is: [0.22028327], acc is: [0.90625] epo: 9, step: 450, loss is: [0.28658137], acc is: [0.8333333] epo: 11, step: 500, loss is: [0.07324414], acc is: [0.96875] save model to: /home/aistudio/work/checkpoints/save_dir_500.pdparams epo: 12, step: 550, loss is: [0.04826813], acc is: [1.] epo: 13, step: 600, loss is: [0.01948055], acc is: [1.] epo: 14, step: 650, loss is: [0.03476083], acc is: [1.] epo: 15, step: 700, loss is: [0.02033299], acc is: [1.] epo: 16, step: 750, loss is: [0.19120577], acc is: [0.9375] epo: 17, step: 800, loss is: [0.10908449], acc is: [0.9375] epo: 18, step: 850, loss is: [0.14570343], acc is: [0.9375] epo: 19, step: 900, loss is: [0.01192897], acc is: [1.] epo: 21, step: 950, loss is: [0.00465367], acc is: [1.] epo: 22, step: 1000, loss is: [0.02442005], acc is: [1.] save model to: /home/aistudio/work/checkpoints/save_dir_1000.pdparams epo: 23, step: 1050, loss is: [0.00399405], acc is: [1.] epo: 24, step: 1100, loss is: [0.0027994], acc is: [1.] epo: 25, step: 1150, loss is: [0.00248999], acc is: [1.] epo: 26, step: 1200, loss is: [0.00211915], acc is: [1.] epo: 27, step: 1250, loss is: [0.00202085], acc is: [1.] epo: 28, step: 1300, loss is: [0.00186031], acc is: [1.] epo: 29, step: 1350, loss is: [0.00167076], acc is: [1.] epo: 31, step: 1400, loss is: [0.00155913], acc is: [1.] epo: 32, step: 1450, loss is: [0.00164985], acc is: [1.] epo: 33, step: 1500, loss is: [0.00145937], acc is: [1.] save model to: /home/aistudio/work/checkpoints/save_dir_1500.pdparams epo: 34, step: 1550, loss is: [0.00126303], acc is: [1.] epo: 35, step: 1600, loss is: [0.00122269], acc is: [1.] epo: 36, step: 1650, loss is: [0.00106174], acc is: [1.] epo: 37, step: 1700, loss is: [0.00095195], acc is: [1.] epo: 38, step: 1750, loss is: [0.00092876], acc is: [1.] epo: 39, step: 1800, loss is: [0.00114523], acc is: [1.]
/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/matplotlib/cbook/__init__.py:2349: DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated, and in 3.8 it will stop working if isinstance(obj, collections.Iterator): /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/matplotlib/cbook/__init__.py:2366: DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated, and in 3.8 it will stop working return list(data) if isinstance(data, collections.MappingView) else data
<Figure size 432x288 with 1 Axes>
<Figure size 432x288 with 1 Axes>
'''
模型评估
'''model__state_dict = paddle.load('work/checkpoints/save_dir_final.pdparams')
model_eval = VisionTransformer()
model_eval.set_state_dict(model__state_dict)
model_eval.eval()
accs = []for _, data in enumerate(eval_loader()):
x_data = data[0]
y_data = data[1]
predicts = model_eval(x_data)
acc = paddle.metric.accuracy(predicts, y_data)
accs.append(acc.numpy()[0])print('模型在验证集上的准确率为:',np.mean(accs))模型在验证集上的准确率为: 0.8416667
以上就是教学transformer原理及应用的详细内容,更多请关注php中文网其它相关文章!
每个人都需要一台速度更快、更稳定的 PC。随着时间的推移,垃圾文件、旧注册表数据和不必要的后台进程会占用资源并降低性能。幸运的是,许多工具可以让 Windows 保持平稳运行。
Copyright 2014-2025 https://www.php.cn/ All Rights Reserved | php.cn | 湘ICP备2023035733号