飞桨常规赛:英雄联盟大师预测赛1月第四名方案分享-人工智能-PHP中文网

本文围绕英雄联盟对局胜负预测展开，使用18万条训练数据和2万条测试数据，涵盖击杀、伤害等多维度特征。通过数据预处理、EDA分析，采用逻辑回归、随机森林等模型及模型融合，结合神经网络模型，以准确率为指标，最终生成预测结果并按要求格式提交。

☞☞☞AI 智能聊天, 问答助手, AI 智能搜索, 免费无限量使用 DeepSeek R1 模型☜☜☜

飞桨常规赛:英雄联盟大师预测赛1月第四名方案分享 - php中文网

赛事介绍

实时对战游戏是人工智能研究领域的一个热点。由于游戏复杂性、部分可观察和动态实时变化战局等游戏特点使得研究变得比较困难。我们可以在选择英雄阶段预测胜负概率，也可以在比赛期间根据比赛实时数据进行建模。那么我们英雄联盟对局进行期间，能知道自己的胜率吗？

赛事任务

比赛数据使用了英雄联盟玩家的实时游戏数据，记录下用户在游戏中对局数据（如击杀数、住物理伤害）。希望参赛选手能从数据集中挖掘出数据的规律，并预测玩家在本局游戏中的输赢情况。

赛题训练集案例如下：

训练集18万数据；
测试集2万条数据；

import pandas as pdimport numpy as nptrain = pd.read_csv('train.csv.zip')

登录后复制

对于数据集中每一行为一个玩家的游戏数据，数据字段如下所示：

Med-PaLM

来自 Google Research 的大型语言模型，专为医学领域设计。

221

查看详情

id：玩家记录id
win：是否胜利，标签变量
kills：击杀次数
deaths：死亡次数
assists：助攻次数
largestkillingspree：最大 killing spree（游戏术语，意味大杀特杀。当你连续杀死三个对方英雄而中途没有死亡时）
largestmultikill：最大mult ikill（游戏术语，短时间内多重击杀）
longesttimespentliving：最长存活时间
doublekills：doublekills次数
triplekills：doublekills次数
quadrakills：quadrakills次数
pentakills：pentakills次数
totdmgdealt：总伤害
magicdmgdealt：魔法伤害
physicaldmgdealt：物理伤害
truedmgdealt：真实伤害
largestcrit：最大暴击伤害
totdmgtochamp：对对方玩家的伤害
magicdmgtochamp：对对方玩家的魔法伤害
physdmgtochamp：对对方玩家的物理伤害
truedmgtochamp：对对方玩家的真实伤害
totheal：治疗量
totunitshealed：痊愈的总单位
dmgtoturrets：对炮塔的伤害
timecc：法控时间
totdmgtaken：承受的伤害
magicdmgtaken：承受的魔法伤害
physdmgtaken：承受的物理伤害
truedmgtaken：承受的真实伤害
wardsplaced：侦查守卫放置次数
wardskilled：侦查守卫摧毁次数
firstblood：是否为firstblood 测试集中label字段win为空，需要选手预测。

评审规则

数据说明

选手需要提交测试集队伍排名预测，具体的提交格式如下：

win0110

登录后复制

评估指标

本次竞赛的使用准确率进行评分，数值越高精度越高，评估代码参考：

from sklearn.metrics import accuracy_score
y_pred = [0, 2, 1, 3]y_true = [0, 1, 2, 3]accuracy_score(y_true, y_pred)

登录后复制

1)加载数据

In [ ]

#!pip install numpy==1.19#!pip install -U scikit-learn numpy

登录后复制

In [ ]

import sklearn

登录后复制

In [1]

import pandas as pdimport paddleimport numpy as np
%pylab inlineimport seaborn as sns

train_df_raw = pd.read_csv('data/data137276/train.csv.zip')
test_df_raw = pd.read_csv('data/data137276/test.csv.zip')

train_df = train_df_raw.drop(['id', 'timecc'], axis=1)
test_df = test_df_raw.drop(['id', 'timecc'], axis=1)

登录后复制

In [ ]

train_df_raw

登录后复制

In [ ]

train_df

登录后复制

        win  kills  deaths  assists  largestkillingspree  largestmultikill  \
0         0      1       5        2                    0                 1   
1         0      5       8        7                    3                 1   
2         1      1       6       16                    0                 1   
3         0      1       2        0                    0                 1   
4         0      4      11       25                    0                 1   
...     ...    ...     ...      ...                  ...               ...   
179995    1      1       6       12                    0                 1   
179996    1      7       3        4                    5                 1   
179997    1      9       0        9                    9                 1   
179998    1     14       1        5                   10                 2   
179999    1      4       4        2                    2                 1   

        longesttimespentliving  doublekills  triplekills  quadrakills  ...  \
0                          569            0            0            0  ...   
1                          880            0            0            0  ...   
2                          593            0            0            0  ...   
3                          381            0            0            0  ...   
4                          455            0            0            0  ...   
...                        ...          ...          ...          ...  ...   
179995                     362            0            0            0  ...   
179996                     574            0            0            0  ...   
179997                       0            0            0            0  ...   
179998                     980            3            0            0  ...   
179999                     559            0            0            0  ...   

        totheal  totunitshealed  dmgtoturrets  totdmgtaken  magicdmgtaken  \
0           849               2             0         7819           2178   
1           642               4           303        24637           5607   
2          2326               3           329        18749           3651   
3          1555               1             0        12134           1739   
4          6630               8             0        27891          14068   
...         ...             ...           ...          ...            ...   
179995     3559               3          5751        14786           2374   
179996     2529               2          8907        11019           3933   
179997    11494               4          6627        14279           3661   
179998     6555               1          1943        19165           4818   
179999      608               1          1590        10992           7681   

        physdmgtaken  truedmgtaken  wardsplaced  wardskilled  firstblood  
0               5239           401            4            1           0  
1              17635          1394           10            0           0  
2              14834           263            7            1           0  
3              10318            76            8            1           0  
4              12749          1073           34            2           0  
...              ...           ...          ...          ...         ...  
179995         12309           102           12            1           0  
179996          6533           552            7            2           0  
179997         10617             0            7            2           1  
179998         14110           236            6            0           0  
179999          3065           246            7            1           0  

[180000 rows x 30 columns]

登录后复制

In [ ]

#查看标签train_df['win']

登录后复制

In [ ]

#查看数据内容train_df.columns

登录后复制

In [ ]

train_df.info()

登录后复制

2)EDA数据分析

2.1异常值处理

In [ ]

#缺失值print(type(train_df.isnull()))
train_df.isnull()

登录后复制

In [ ]

#查看缺失值个数train_df.isnull().sum()

登录后复制

In [ ]

#查看缺失值比例train_df.isnull().mean(axis=0)

登录后复制

In [ ]

train_df['win'].value_counts().plot(kind='bar')

登录后复制

In [ ]

sns.distplot(train_df['kills'])

登录后复制

In [ ]

sns.distplot(train_df['deaths'])

登录后复制

In [ ]

sns.boxplot(y='kills', x='win', data=train_df)

登录后复制

In [ ]

plt.scatter(train_df['kills'], train_df['deaths'])
plt.xlabel('kills')
plt.ylabel('deaths')

登录后复制

In [ ]

for col in train_df.columns[1:]:
    train_df[col] /= train_df[col].max()
    test_df[col] /= test_df[col].max()

登录后复制

3)数据集

In [47]

from sklearn.model_selection import train_test_split 
from sklearn.model_selection import KFold,cross_validate

登录后复制

In [41]

#取出标签x=train_df.drop(['win'], axis=1)
y=train_df.win

登录后复制

In [42]

登录后复制

        kills  deaths  assists  largestkillingspree  largestmultikill  \
0           1       5        2                    0                 1   
1           5       8        7                    3                 1   
2           1       6       16                    0                 1   
3           1       2        0                    0                 1   
4           4      11       25                    0                 1   
...       ...     ...      ...                  ...               ...   
179995      1       6       12                    0                 1   
179996      7       3        4                    5                 1   
179997      9       0        9                    9                 1   
179998     14       1        5                   10                 2   
179999      4       4        2                    2                 1   

        longesttimespentliving  doublekills  triplekills  quadrakills  \
0                          569            0            0            0   
1                          880            0            0            0   
2                          593            0            0            0   
3                          381            0            0            0   
4                          455            0            0            0   
...                        ...          ...          ...          ...   
179995                     362            0            0            0   
179996                     574            0            0            0   
179997                       0            0            0            0   
179998                     980            3            0            0   
179999                     559            0            0            0   

        pentakills  ...  totheal  totunitshealed  dmgtoturrets  totdmgtaken  \
0                0  ...      849               2             0         7819   
1                0  ...      642               4           303        24637   
2                0  ...     2326               3           329        18749   
3                0  ...     1555               1             0        12134   
4                0  ...     6630               8             0        27891   
...            ...  ...      ...             ...           ...          ...   
179995           0  ...     3559               3          5751        14786   
179996           0  ...     2529               2          8907        11019   
179997           0  ...    11494               4          6627        14279   
179998           0  ...     6555               1          1943        19165   
179999           0  ...      608               1          1590        10992   

        magicdmgtaken  physdmgtaken  truedmgtaken  wardsplaced  wardskilled  \
0                2178          5239           401            4            1   
1                5607         17635          1394           10            0   
2                3651         14834           263            7            1   
3                1739         10318            76            8            1   
4               14068         12749          1073           34            2   
...               ...           ...           ...          ...          ...   
179995           2374         12309           102           12            1   
179996           3933          6533           552            7            2   
179997           3661         10617             0            7            2   
179998           4818         14110           236            6            0   
179999           7681          3065           246            7            1   

        firstblood  
0                0  
1                0  
2                0  
3                0  
4                0  
...            ...  
179995           0  
179996           0  
179997           1  
179998           0  
179999           0  

[180000 rows x 29 columns]

登录后复制

In [43]

登录后复制

0         0
1         0
2         1
3         0
4         0
         ..
179995    1
179996    1
179997    1
179998    1
179999    1
Name: win, Length: 180000, dtype: int64

登录后复制

In [44]

print('特征向量形状{}'.format(x.shape))print('标签形状{}'.format(y.shape))print('标签类别{}'.format(np.unique(y)))print('测试集特征形状{}'.format(test_df.shape))

登录后复制

特征向量形状(180000, 29)
标签形状(180000,)
标签类别[0 1]
测试集特征形状(20000, 29)

登录后复制

In [48]

#数据集划分 /这里分出的test部分用于二次验证Xtrain,Xtest,Ytrain,Ytest=train_test_split(x,y,test_size=0.2,random_state=1412)

登录后复制

In [49]

#验证指验证集，而非测试集的特征向量。print('用于训练的特征向量形状{}'.format(Xtrain.shape))print('用于训练的标签形状{}'.format(Ytrain.shape))print('用于验证的特征向量形状{}'.format(Xtest.shape))print('用于验证的标签形状{}'.format(Ytest.shape))

登录后复制

用于训练的特征向量形状(144000, 29)
用于训练的标签形状(144000,)
用于验证的特征向量形状(36000, 29)
用于验证的标签形状(36000,)

登录后复制

In [50]

def individual_estimators(estimators):
    train_score=[]
    cv_mean=[]
    test_score=[]    for estimator in estimators:
        cv=KFold(n_splits=5,shuffle=True,random_state=1412)
        results=cross_validate(estimator[1],Xtrain,Ytrain
                                ,cv=cv
                                ,scoring="accuracy"
                                ,n_jobs=8
                                ,return_train_score=True
                                ,verbose=False)
        test=estimator[1].fit(Xtrain,Ytrain).score(Xtest,Ytest)
        train_score.append(results["train_score"].mean())
        cv_mean.append(results["test_score"].mean())
        test_score.append(test)    for i in range(len(estimators)):        print("-------------------------------------------")        print(
            estimators[i]
            ,"\n train_score_mean:{}".format(train_score[i])
            ,"\n cv_mean:{}".format(cv_mean[i])
            ,"\n test_score:{}".format(test_score[i])
            ,"\n")

登录后复制

In [51]

 def fusion_estimators(estimators):
   
    cv=KFold(n_splits=5,shuffle=True,random_state=1412)
    results=cross_validate(clf,Xtrain,Ytrain
                            ,cv=cv
                            ,scoring="accuracy"
                            ,n_jobs=-1
                            ,return_train_score=True
                            ,verbose=False)
    test=clf.fit(Xtrain,Ytrain).score(Xtest,Ytest)    print("++++++++++++++++++++++++++++++++++++++++++++++")    print(        "\n train_score_mean:{}".format(results["train_score"].mean())
        ,"\n cv_mean:{}".format(results["test_score"].mean())
        ,"\n test_score:{}".format(test)
        )

登录后复制

4)模型

In [46]

from sklearn.neighbors import KNeighborsClassifier as KNNCfrom sklearn.tree import DecisionTreeClassifier as DTRfrom sklearn.ensemble import RandomForestClassifier as RFCfrom sklearn.ensemble import GradientBoostingClassifier as GBCfrom sklearn.linear_model import LogisticRegression as LogiRfrom sklearn.ensemble import VotingClassifier

登录后复制

4.a为什么模型融合比集成算法更好？

虽然每一个弱分类器并不强，但都能代表一组其对应的假设空间。真实世界的数据分布是多远随机的复杂系统，往往其中一种并不能有一个好的近似结果。模型融合是一种简单粗暴的办法，考虑多重分布的组合。当然，模型融合的结果并不一定好，只是大部分时间是好的。

4.1弱分类器与集成

In [52]

clf1=LogiR(max_iter=3000,random_state=1412,n_jobs=8)
clf2=RFC(n_estimators=100,random_state=1412,n_jobs=8)
clf3=GBC(n_estimators=100,random_state=1412)

登录后复制

In [53]

estimators=[("Logistic Regression",clf1),("RandomForest",clf2),("GBDT",clf3)]
clf=VotingClassifier(estimators,voting="soft")

登录后复制

4.1.1对弱分类器分别进行评估

In [ ]

individual_estimators(estimators)

登录后复制

4.1.2对融合算法评估

In [54]

logi=LogiR(max_iter=3000,n_jobs=8)
fusion_estimators(logi)

登录后复制

In [55]

test_predict_sklearn=clf.predict(test_df)
test_predict_sklearn=clf.predict_proba(test_df)

登录后复制

In [56]

print(test_predict_sklearn.shape)print(test_predict_sklearn)

登录后复制

(20000, 2)
[[0.87535621 0.12412464379]
 [0.77675525 0.22324475]
 [0.16242339 0.83757661]
 ...
 [0.94152587 0.05847413]
 [0.90214731 0.09785269]
 [0.10380786 0.89619214]]

登录后复制

4.2网络模型

In [9]

import paddle.fluid

登录后复制

In [26]

class MyModel(paddle.nn.Layer):
    # self代表类的实例自身
    def __init__(self):
        # 初始化父类中的一些参数
        super(MyModel, self).__init__()
        self.fc1 = paddle.nn.Linear(in_features=29, out_features=30)
        self.hidden1=paddle.fluid.BatchNorm(30)
        self.relu1=paddle.nn.ReLU()
        self.fc2 = paddle.nn.Linear(in_features=30, out_features=8)
        self.relu2=paddle.nn.LeakyReLU()
        self.fc3 = paddle.nn.Linear(in_features=8, out_features=6)
        self.relu3=paddle.nn.Sigmoid()
        self.fc4 = paddle.nn.Linear(in_features=6, out_features=4)
        self.fc5=paddle.nn.Linear(in_features=4, out_features=2)
        self.softmax = paddle.nn.Softmax()    # 网络的前向计算
    def forward(self, inputs):
        x = self.fc1(inputs)        #x = self.relu1(x)
        x = self.hidden1(x)
        x = self.fc2(x)
        x = self.relu2(x)
        x = self.fc3(x)
        x = self.relu3(x)
        
        x=self.fc4(x)
        x=self.fc5(x)        #x=self.fc6(x)
        x = self.softmax(x)        return x

登录后复制

In [27]

model = MyModel()
model.train()
opt = paddle.optimizer.SGD(learning_rate=0.01, parameters=model.parameters())

登录后复制

In [37]

EPOCH_NUM = 10   # 设置外层循环次数BATCH_SIZE = 100  # 设置batch大小training_data = train_df.iloc[:-1000,].values.astype(np.float32)
val_data = train_df.iloc[-1000:, ].values.astype(np.float32)# 定义外层循环for epoch_id in range(EPOCH_NUM):    # 在每轮迭代开始之前，将训练数据的顺序随机的打乱
    
    np.random.shuffle(training_data)    
    # 将训练数据进行拆分，每个batch包含10条数据
    mini_batches = [training_data[k:k+BATCH_SIZE] for k in range(0, len(training_data), BATCH_SIZE)]    
    # 定义内层循环
    for iter_id, mini_batch in enumerate(mini_batches):
        x_data = np.array(mini_batch[:, 1:]) # 获得当前批次训练数据
        y_label = np.array(mini_batch[:, :1]) # 获得当前批次训练标签
       
        # 将numpy数据转为飞桨动态图tensor的格式
        features = paddle.to_tensor(x_data)
        y_label = paddle.to_tensor(y_label)
        label=np.zeros([len(y_label),2])        for i in range(len(y_label)):            if y_label[i]==0:
                label[i,0]=1
            elif y_label[i]==1:
                label[i,1]=1
        label=paddle.to_tensor(label,dtype=float32)        # 前向计算
        predicts = model(features)        # 计算损失
        loss = paddle.nn.functional.softmax_with_cross_entropy(predicts, label,soft_label=True)
        avg_loss = paddle.mean(loss)        
        # 反向传播，计算每层参数的梯度值
        avg_loss.backward()        
        # 更新参数，根据设置好的学习率迭代一步
        opt.step()        # 清空梯度变量，以备下一轮计算
        opt.clear_grad()

登录后复制

/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/fluid/data_feeder.py:51: DeprecationWarning: `np.bool` is a deprecated alias for the builtin `bool`. To silence this warning, use `bool` by itself. Doing this will not modify any behavior and is safe. If you specifically wanted the numpy scalar type, use `np.bool_` here.
Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
  np.bool, np.float16, np.uint16, np.float32, np.float64, np.int8,

登录后复制

In [38]

model.eval()
test_data = paddle.to_tensor(test_df.values.astype(np.float32))
test_predict_dl = model(test_data)

登录后复制

In [39]

test_predict_dl

登录后复制

Tensor(shape=[20000, 2], dtype=float32, place=CPUPlace, stop_gradient=False,
       [[0.31092143, 0.68907863],
        [0.89762008, 0.10237990],
        [0.00382155, 0.99617851],
        ...,
        [0.97896796, 0.02103199],
        [0.98377025, 0.01622973],
        [0.00828540, 0.99171454]])

登录后复制

In [58]

test_predict_sklearn

登录后复制

array([[0.87535621, 0.12412464379],
       [0.77675525, 0.22324475],
       [0.16242339, 0.83757661],
       ...,
       [0.94152587, 0.05847413],
       [0.90214731, 0.09785269],
       [0.10380786, 0.89619214]])

登录后复制

In [65]

#控制融合比例test_predict_=(1/4*(np.array(test_predict_dl)))+(3/4*(test_predict_sklearn))

登录后复制

In [66]

test_predict=np.zeros([len(test_predict_)])for i in range(len(test_predict_)):    if test_predict_[i,0]>test_predict_[i,1]:
        test_predict[i]=0
    elif test_predict_[i,0]<test_predict_[i,1]:
        test_predict[i]=1

登录后复制

In [67]

test_predict

登录后复制

array([0., 0., 1., ..., 0., 0., 1.])

登录后复制

In [70]

pd.DataFrame({'win':
              test_predict
             }).to_csv('submission.csv', index=None)

!zip submission.zip submission.csv

登录后复制

  adding: submission.csv (deflated 94%)

登录后复制

以上就是飞桨常规赛:英雄联盟大师预测赛1月第四名方案分享的详细内容，更多请关注php中文网其它相关文章！

大家都在看：

使用spaCy和Python进行文本分类：自然语言处理教程如何用AI快速制作GIF动图 AI视频转GIF一键生成工具【教程】如何用AI自动写Excel复杂公式 AI表格数据分析助手使用技巧【教程】利用Docling和Python构建开源AI知识库（含教程） OpenCV图像平移教程：使用Python进行图像转换