机器学习项目三：XGBoost人体卡路里消耗预测-人工智能-PHP中文网

随着健康理念深入人心，为满足健身爱好者测量力量训练卡路里消耗的需求，项目搭建了卡路里消耗预测系统。该系统基于XGBoost回归算法，通过导入相关数据集，经数据探索分析、模型训练与预测，最终实现根据用户身体数据实时预测能量消耗。

☞☞☞AI 智能聊天, 问答助手, AI 智能搜索, 免费无限量使用 DeepSeek R1 模型☜☜☜

机器学习项目三：xgboost人体卡路里消耗预测 - php中文网

一、项目背景

1.1 什么是卡里路

相信健身热爱运动，减肥的的朋友对这个名词一定不陌生！卡路里（Calorie），简称卡，缩写为cal，其定义为在1个大气压下，将1克水提升1摄氏度所需要的热量；卡路里（calorie）是一种热量单位，被广泛使用在营养计量和健身手册上，国际标准的能量单位是焦耳（joule）

机器学习项目三：XGBoost人体卡路里消耗预测 - php中文网

2.2 项目介绍

如今，随着健康生活的理念越来越深入人心，越来越多的健身爱好者希望能够测量出参加力量训练时候的卡路里的消耗及之后的饮食调节，然而，传统的获取运动过程中人体的能量消耗不仅程序繁琐，且还需要额外的设备，因此我们急切需要找到一种能够方便并有效的检测力量训练时的卡路里的消耗和动作识别的方法,我们在获取用户的一系列身体数据后，就能实时在线的预测出人体消耗的能量日常健身过程中，尤其是力量训练时，人体将消耗大量的卡路里。为了有助于训练后的营养补充和膳食搭配，为人体能力代谢，特别是喜欢运动的人群做出实时的能量消耗预测，提供一个快速，准确的人体卡路里消耗预测！为此我们搭建了一个预测人体消耗卡路里的系统！该系统使用机器学习XGBoost回归算法，可以根据用户的性别、年龄、身高、体重、锻炼持续时间、心率，身体温度这几项数据就可以实时在线的评估一个人的能量消耗。

二、导入依赖库

In [1]

import numpy as np 
import pandas as pdimport matplotlib.pyplot as pltimport seaborn as snsfrom sklearn import metricsfrom sklearn.model_selection import train_test_splitfrom xgboost import XGBRegressor

登录后复制

In [2]

#导入数据集calories = pd.read_csv(r"work/calories.csv")
calories.head()

登录后复制

    User_ID  Calories
0  14733363     231.0
1  14861698      66.0
2  11179863      26.0
3  16180408      71.0
4  17771927      35.0

登录后复制

In [3]

exercise = pd.read_csv("work/exercise.csv")
exercise.head()

登录后复制

    User_ID  Gender  Age  Height  Weight  Duration  Heart_Rate  Body_Temp
0  14733363    male   68   190.0    94.0      29.0       105.0       40.8
1  14861698  female   20   166.0    60.0      14.0        94.0       40.3
2  11179863    male   69   179.0    79.0       5.0        88.0       38.7
3  16180408  female   34   179.0    71.0      13.0       100.0       40.5
4  17771927  female   27   154.0    58.0      10.0        81.0       39.8

登录后复制

In [4]

# 合并数据集df = pd.concat([exercise,calories.Calories],axis=1)
df.head()

登录后复制

    User_ID  Gender  Age  Height  Weight  Duration  Heart_Rate  Body_Temp  \
0  14733363    male   68   190.0    94.0      29.0       105.0       40.8   
1  14861698  female   20   166.0    60.0      14.0        94.0       40.3   
2  11179863    male   69   179.0    79.0       5.0        88.0       38.7   
3  16180408  female   34   179.0    71.0      13.0       100.0       40.5   
4  17771927  female   27   154.0    58.0      10.0        81.0       39.8   

   Calories  
0     231.0  
1      66.0  
2      26.0  
3      71.0  
4      35.0

登录后复制

In [5]

df.shape

登录后复制

(15000, 9)

登录后复制

三、数据探索分析---EDA

3.1 数据描述

In [6]

df.describe()#查看数据的信息

登录后复制

            User_ID           Age        Height        Weight      Duration  \
count  1.500000e+04  15000.000000  15000.000000  15000.000000  15000.000000   
mean   1.497736e+07     42.789800    174.465133     74.966867     15.530600   
std    2.872851e+06     16.980264     14.258114     15.035657      8.319203   
min    1.000116e+07     20.000000    123.000000     36.000000      1.000000   
25%    1.247419e+07     28.000000    164.000000     63.000000      8.000000   
50%    1.499728e+07     39.000000    175.000000     74.000000     16.000000   
75%    1.744928e+07     56.000000    185.000000     87.000000     23.000000   
max    1.999965e+07     79.000000    222.000000    132.000000     30.000000   

         Heart_Rate     Body_Temp      Calories  
count  15000.000000  15000.000000  15000.000000  
mean      95.518533     40.025453     89.539533  
std        9.583328      0.779230     62.456978  
min       67.000000     37.100000      1.000000  
25%       88.000000     39.600000     35.000000  
50%       96.000000     40.200000     79.000000  
75%      103.000000     40.600000    138.000000  
max      128.000000     41.500000    314.000000

登录后复制

3.2 判断是否有缺失值

In [7]

df.isnull().sum()

登录后复制

User_ID       0
Gender        0
Age           0
Height        0
Weight        0
Duration      0
Heart_Rate    0
Body_Temp     0
Calories      0
dtype: int64

登录后复制

In [8]

df.columns

登录后复制

Index(['User_ID', 'Gender', 'Age', 'Height', 'Weight', 'Duration',
       'Heart_Rate', 'Body_Temp', 'Calories'],
      dtype='object')

登录后复制

In [9]

# 连续变量constant_features = [ 'Age', 'Height', 'Weight', 'Duration',       'Heart_Rate', 'Body_Temp']

登录后复制

In [10]

df.head()

登录后复制

    User_ID  Gender  Age  Height  Weight  Duration  Heart_Rate  Body_Temp  \
0  14733363    male   68   190.0    94.0      29.0       105.0       40.8   
1  14861698  female   20   166.0    60.0      14.0        94.0       40.3   
2  11179863    male   69   179.0    79.0       5.0        88.0       38.7   
3  16180408  female   34   179.0    71.0      13.0       100.0       40.5   
4  17771927  female   27   154.0    58.0      10.0        81.0       39.8   

   Calories  
0     231.0  
1      66.0  
2      26.0  
3      71.0  
4      35.0

登录后复制

3.3 画出概率密度图

此处采用了两种画法，一种是matplotlib里面的画法，一种是，seaborn里面的画法

In [11]

def kde_plot_array(df):
    """
    绘制概率密度图矩阵函数
    df:要绘制图像的dataframe
    绘制各个字段的概率密度分布，最终返回图像的show()
    """
    plt.figure(figsize = (24,20))    # subplots_adjust(left = 0,bottom = 0,top = 1.4,right = 1)
    for num,col in zip(range(len(df.columns)),df.columns):
        plt.subplot(round(len(df.columns)/2,0),2,num+1)        # sns.set(font = 'FangSong',font_scale = 1.6)
        # index = columns
        sns.kdeplot(df[col],shade = True,label = col,alpha = 0.7)
        plt.legend()
        plt.title('{}'.format(col))    return plt.show()

登录后复制

In [12]

kde_plot_array(df[constant_features])

登录后复制

<Figure size 2400x2000 with 6 Axes>

登录后复制

3.4 查看特征分布

In [13]

sns.countplot(df['Gender']) #此处可以看出男女性别分布，基本一样

登录后复制

<matplotlib.axes._subplots.AxesSubplot at 0x7fa7213ee5d0>

登录后复制

<Figure size 640x480 with 1 Axes>

登录后复制

In [14]

def display(df):
    '''用seaborn的displot函数查看变量分布'''
    plt.figure(figsize = (24,20))    # subplots_adjust(left = 0,bottom = 0,top = 1.4,right = 1)
    for num,col in zip(range(len(df.columns)),df.columns):
      
        plt.subplot(round(len(df.columns)/2,0),2,num+1)        # plt.figure(figsize=(20,12))
        # sns.set(font = 'FangSong',font_scale = 1.6)
        # index = columns
        # sns.kdeplot(df[col],shade = True,label = col,alpha = 0.7)
        sns.distplot(df[col])        # plt.legend()
        plt.title('{}'.format(col))    return plt.show()

登录后复制

In [15]

display(df[constant_features])

登录后复制

<Figure size 2400x2000 with 6 Axes>

登录后复制

In [16]

#离散变量编码，此处用labelencoder也可，本文直接用的df的replace函数，更方便df.replace({'Gender':{'male':0,"female":1}}, inplace = True)

登录后复制

3.5 创建特征和标签

In [17]

X = df.drop(['User_ID','Calories'],axis=1).values
y = df.Calories

登录后复制

In [18]

print(X)

登录后复制

[[  0.   68.  190.  ...  29.  105.   40.8]
 [  1.   20.  166.  ...  14.   94.   40.3]
 [  0.   69.  179.  ...   5.   88.   38.7]
 ...
 [  1.   43.  159.  ...  16.   90.   40.1]
 [  0.   78.  193.  ...   2.   84.   38.3]
 [  0.   63.  173.  ...  18.   92.   40.5]]

登录后复制

3.6划分数据集

In [19]

X_train ,X_test ,y_train ,y_test = train_test_split(X,y,test_size=0.2,random_state=2)

登录后复制

In [20]

print(X_train.shape,X_test.shape)print(y_train.shape,y_test.shape)

登录后复制

(12000, 7) (3000, 7)
(12000,) (3000,)

登录后复制

三、模型训练

In [32]

model = XGBRegressor(random_state=42) #本次项目选用XGBoost算法model.fit(X_train,y_train)
X_preds = model.predict(X_train)

登录后复制

四、模型预测

预测部分直接调佣XGBOOST的预测函数，即可得出预测值,我们可以选用其他

In [22]

preds = model.predict(X_test)

登录后复制

In [23]

#查看预测值preds

登录后复制

array([127.823784, 226.00154 ,  38.66253 , ..., 144.3636  ,  22.767195,
        89.87375 ], dtype=float32)

登录后复制

4.1 可视化预测与真实值

可以看出预测值和真实值十分接近，证明了我们模型的有效性

通义灵码

阿里云出品的一款基于通义大模型的智能编码辅助工具，提供代码智能生成、研发智能问答能力

304

查看详情

In [24]

plt.scatter(y_test,preds)
plt.xlabel('y_test')
plt.ylabel('preds')
plt.title('y_test VS preds')
plt.show()

登录后复制

<Figure size 640x480 with 1 Axes>

登录后复制

4.1 打印绝对误差

In [25]

mae = metrics.mean_absolute_error(y_test,preds)
mae

登录后复制

1.4807048829992613

登录后复制

4.2 打印均方根误差

In [26]

Rmse = np.sqrt(metrics.mean_squared_error(y_test,preds))
Rmse

登录后复制

2.12938076108955

登录后复制

4.3 打印均方根误差

可以看出r2——score十分接近1，可见模型预测的效果很好

In [27]

preds_R2_score = metrics.r2_score(y_test,preds)
preds_R2_score

登录后复制

0.9988455491362879

登录后复制

五、构建预测系统

即用户输入对应的数据，即可根据输入预测出人体消耗的卡路里值，还可以部署到设备中，开发一套能量消耗预测系统

In [33]

input_data = (1 , 20 , 166.0 ,  60.0 , 14.0 , 94.0 ,40.3)# 转化为numpy数组input_data_as_numpy_array = np.asarray(input_data)# reshape 成array二维input_data_reshaped = input_data_as_numpy_array.reshape(1,-1)

prediction = model.predict(input_data_reshaped)print(prediction)print('此人卡路里消耗值为{} '.format(prediction[0]))

登录后复制

[64.68266]
此人卡路里消耗值为64.68266296386719

登录后复制

项目总结

本项目只采用了XGBoost回归算法，后续还可尝试更多的回归算法，或者是深度学习神经网络算法，不断对模型调优，提高预测精度

以上就是机器学习项目三：XGBoost人体卡路里消耗预测的详细内容，更多请关注php中文网其它相关文章！

大家都在看：

稿定设计AI抠图怎样上传图片_稿定设计AI抠图上传入口与格式要求【教程】 AI写作鱼怎样一键生成产品描述_AI写作鱼描述生成与卖点提炼【指南】豆包AI能否用提示词控制回答语气_豆包AI语气控制提示词技巧【方法】批改网ai检测工具能否检测PPT内容_批改网ai检测工具PPT检测支持【步骤】阿里旅行AI怎么抢热门线路票_阿里旅行AI热门线路优先级设置【方法】