我们有一个由三个成员组成的 MongoDB 复制集:
"members" : [
{
"_id" : 6,
"host" : "10.0.0.17:27017",
"arbiterOnly" : false,
"buildIndexes" : true,
"hidden" : false,
"priority" : 2,
"tags" : {
},
"slaveDelay" : NumberLong(0),
"votes" : 1
},
{
"_id" : 7,
"host" : "10.0.0.18:27017",
"arbiterOnly" : false,
"buildIndexes" : true,
"hidden" : false,
"priority" : 2,
"tags" : {
},
"slaveDelay" : NumberLong(0),
"votes" : 1
},
{
"_id" : 8,
"host" : "10.0.0.19:27017",
"arbiterOnly" : false,
"buildIndexes" : true,
"hidden" : false,
"priority" : 2,
"tags" : {
},
"slaveDelay" : NumberLong(0),
"votes" : 1
}
],
集群处于中等负载下,每秒不超过几十个请求。
db.serverStatus()
在主报告上,几乎所有事务都回滚:
"transaction begins" : 2625009877,
"transaction checkpoint currently running" : 0,
"transaction checkpoint generation" : 22618,
"transaction checkpoint max time (msecs)" : 5849,
"transaction checkpoint min time (msecs)" : 153,
"transaction checkpoint most recent time (msecs)" : 1869,
"transaction checkpoint scrub dirty target" : 0,
"transaction checkpoint scrub time (msecs)" : 0,
"transaction checkpoint total time (msecs)" : 11017082,
"transaction checkpoints" : 22617,
"transaction checkpoints skipped because database was clean" : 0,
"transaction failures due to cache overflow" : 0,
"transaction fsync calls for checkpoint after allocating the transaction ID" : 22617,
"transaction fsync duration for checkpoint after allocating the transaction ID (usecs)" : 354402,
"transaction range of IDs currently pinned" : 0,
"transaction range of IDs currently pinned by a checkpoint" : 0,
"transaction range of IDs currently pinned by named snapshots" : 0,
"transaction range of timestamps currently pinned" : 8589934583,
"transaction range of timestamps pinned by the oldest timestamp" : 8589934583,
"transaction sync calls" : 0,
"transactions committed" : 30213144,
"transactions rolled back" : 2594972913,
"update conflicts" : 578
基本上,我的问题是:这里发生了什么?有这么多事务和这么多回滚是正常的吗?如果不是,那么根本原因是什么并且需要修复它?
更新。:我们升级到3.6.8-2.0
(这是3.6系列中最新的Percona包),问题仍然存在。