本项目前后端分离,前端基于Vue+Vue-router+Vuex+Element-ui+Axios,参考小米商城实现。后端基于Node.js(Koa框架)+Mysql实现。前端包含了11个页面:首页、登录、注册、全部商品、商品详情页、关于我们、我的收藏、购物车、订单结算页面、我的订单以及错误处理页面。实现了商品的展示、商品分类查询、关键字搜索商品、商品详细信息展示、登录、注册、用户购物车、订单结算
1
pacemaker将错误分成3类:soft,hard和fatal,后两种属于环境或配置问题,如果没有人工干预是不可能自动修复的。一般的故障都采用ocf_err_generic作为返回值,比如,服务进程crash,网络不通等,ocf_err_generic属于soft类型。
TableB.3.Types of recovery performed by the cluster
| Type | Description | Action Taken by the Cluster |
|---|---|---|
| soft | A transient error occurred | Restart the resource or move it to a new location |
| hard | A non-transient error that may be specific to the current node occurred | Move the resource elsewhere and prevent it from being retried on the current node |
| fatal | A non-transient error that will be common to all cluster nodes (eg. a bad configuration was specified) | Stop the resource and prevent it from being started on any cluster node |
TableB.4.OCF Return Codes and their Recovery Types
| RC | OCF Alias | Description | RT |
|---|---|---|---|
| 0 | OCF_SUCCESS | Success. The command completed successfully. This is the expected result for all start, stop, promote and demote commands. | soft |
| 1 | OCF_ERR_GENERIC | Generic "there was a problem" error code. | soft |
| 2 | OCF_ERR_ARGS | The resource’s configuration is not valid on this machine. Eg. refers to a location/tool not found on the node. | hard |
| 3 | OCF_ERR_UNIMPLEMENTED | The requested action is not implemented. | hard |
| 4 | OCF_ERR_PERM | The resource agent does not have sufficient privileges to complete the task. | hard |
| 5 | OCF_ERR_INSTALLED | The tools required by the resource are not installed on this machine. | hard |
| 6 | OCF_ERR_CONFIGURED | The resource’s configuration is invalid. Eg. required parameters are missing. | fatal |
| 7 | OCF_NOT_RUNNING | The resource is safely stopped. The cluster will not attempt to stop a resource that returns this for any action. | N/A |
| 8 | OCF_RUNNING_MASTER | The resource is running in Master mode. | soft |
| 9 | OCF_FAILED_MASTER | The resource is in Master mode but has failed. The resource will be demoted, stopped and then started (and possibly promoted) again. | soft |
| other | NA | Custom error code. | soft |
每个资源的操作(operation)有一个on-fail属性,用于控制如何进行出错处理。
http://clusterlabs.org/doc/en-US/Pacemaker/1.1-plugin/html-single/Pacemaker_Explained/index.html#_monitoring_resources_for_failure
Table5.3.Properties of an Operation
| Field | Description |
|---|---|
| id | Your name for the action. Must be unique. |
| name | The action to perform. Common values: monitor, start, stop |
| interval | How frequently (in seconds) to perform the operation. Default value: 0, meaning never. |
| timeout | How long to wait before declaring the action has failed. |
| on-fail | The action to take if this action ever fails. Allowed values:* ignore - Pretend the resource did not fail* block - Don’t perform any further operations on the resource* stop - Stop the resource and do not start it elsewhere* restart - Stop the resource and start it again (possibly on a different node)* fence - STONITH the node on which the resource failed* standby - Move all resources away from the node on which the resource failedThe default for the stop operation is fence when STONITH is enabled and block otherwise. All other operations default to stop. |
| enabled | If false, the operation is treated as if it does not exist. Allowed values: true, false |
但是,实际测试验证后,发现不管如何设置on-fail,效果都不会变,也就是说永远是缺省行为。
以下是让Resource Agent的各个操作返回OCF_ERR_GENERIC时资源管理器的处理:
| 操作 | 错误处理 | 对应的on-fail值 |
|---|---|---|
| start |
设置fail-count=1000000 在本节点上调用stop 在其它节点上start该资源 |
restart |
| stop |
设置fail-count=1000000 阻止该资源的进一步操作,该资源成为unmanaged FAILED状态,如下 dummy(ocf::heartbeat:Dummy2):Started srdsdevapp69 (unmanaged) FAILED |
block |
| monitor |
设置fail-count+=1 在本节点上依次调用stop,start,monitor。如果monitor依然出错,重复stop,start,monitor,直到fail-count达到migration-threshold后,保持资源为stop状态。 |
restart |
| promote |
设置fail-count+=1 在本节点上依次调用demote,stop,start。 在其它节点上调用promote以提升其它节点上的资源为master |
restart |
| demote |
设置fail-count+=1 在本节点上依次调用stop,start,demote。如果demote依然出错,重复stop,start,demote,直到fail-count达到migration-threshold后,保持资源为stop状态。 |
restart |
| notify | 无视 | ignore |
注1:超时的处理与OCF_ERR_GENERIC相同
注2:Pacemaker不会对已经stop了的资源调用post stop notify。
注3:测试环境Pacemaker 1.1.7-6 ,CentOS 6.3
上面关于错误处理的测试结果,可以给Resource Agent编写者提供几点启示:
每个人都需要一台速度更快、更稳定的 PC。随着时间的推移,垃圾文件、旧注册表数据和不必要的后台进程会占用资源并降低性能。幸运的是,许多工具可以让 Windows 保持平稳运行。
C++高性能并发应用_C++如何开发性能关键应用
Java AI集成Deep Java Library_Java怎么集成AI模型部署
Golang后端API开发_Golang如何高效开发后端和API
Python异步并发改进_Python异步编程有哪些新改进
C++系统编程内存管理_C++系统编程怎么与Rust竞争内存安全
Java GraalVM原生镜像构建_Java怎么用GraalVM构建高效原生镜像
Python FastAPI异步API开发_Python怎么用FastAPI构建异步API
C++现代C++20/23/26特性_现代C++有哪些新标准特性如modules和coroutines
Copyright 2014-2025 https://www.php.cn/ All Rights Reserved | php.cn | 湘ICP备2023035733号