AskOverflow.Dev

AskOverflow.Dev Logo AskOverflow.Dev Logo

AskOverflow.Dev Navigation

  • 主页
  • 系统&网络
  • Ubuntu
  • Unix
  • DBA
  • Computer
  • Coding
  • LangChain

Mobile menu

Close
  • 主页
  • 系统&网络
    • 最新
    • 热门
    • 标签
  • Ubuntu
    • 最新
    • 热门
    • 标签
  • Unix
    • 最新
    • 标签
  • DBA
    • 最新
    • 标签
  • Computer
    • 最新
    • 标签
  • Coding
    • 最新
    • 标签
主页 / dba / 问题 / 231453
Accepted
Nicolas Payart
Nicolas Payart
Asked: 2019-03-07 05:13:23 +0800 CST2019-03-07 05:13:23 +0800 CST 2019-03-07 05:13:23 +0800 CST

MongoDB stepDown 在 PSA 架构中失败

  • 772

我已经使用 3-Member Primary-Secondary-Arbiter Architecture 设置了一个 MongoDB 集群

环境:

  • LXC集装箱
  • Linux Debian 扩展版 (9.8)
  • MongoDB 服务器版本:4.0.6

MongoDB 容器:

  • lxc-mongodb-01(主要)
  • lxc-mongodb-02(二级)
  • lxc-mongodb-03(仲裁者)

复制状态

一切似乎都工作正常并且复制工作正常:

np:PRIMARY> rs.printSlaveReplicationInfo()
source: lxc-mongodb-02:27017
    syncedTo: Wed Mar 06 2019 12:08:27 GMT+0100 (CET)
    0 secs (0 hrs) behind the primary 

切换失败

但是,当我尝试使用 rs.stepDown() 切换主要/次要时,它会失败并显示“No electable secondary caught up”错误消息:

np:PRIMARY> rs.stepDown(60, 30)
{
    "operationTime" : Timestamp(1551870647, 1),
    "ok" : 0,
    "errmsg" : "No electable secondaries caught up as of 2019-03-06T12:11:19.140+0100Please use the replSetStepDown command with the argument {force: true} to force node to step down.",
    "code" : 262,
    "codeName" : "ExceededTimeLimit",
    "$clusterTime" : {
        "clusterTime" : Timestamp(1551870647, 1),
        "signature" : {
            "hash" : BinData(0,"+/jQR8cG+y/bPtoF7gnv2Pmn2BY="),
            "keyId" : NumberLong("6653042051040411649")
        }
    }
}

请注意,这是一个非生产集群,因此没有正在进行的事务。

来自 server01(主要)的日志:

2019-03-06T12:08:07.709+0100 I ACCESS   [conn17] Successfully authenticated as principal root on admin
2019-03-06T12:10:49.140+0100 I COMMAND  [conn17] Attempting to step down in response to replSetStepDown command
2019-03-06T12:11:19.140+0100 I COMMAND  [conn17] command admin.$cmd appName: "MongoDB Shell" command: replSetStepDown { replSetStepDown: 60.0, secondaryCatchUpPeriodSecs: 30.0, lsid: { id: UUID("8941645a-c582-4353-b216-6e5ee91c08b0") }, $clusterTime: { clusterTime: Timestamp(1551870507, 1), signature: { hash: BinData(0, 484DDC04A03F9CBEDA0E5FA5E4F438F414E43E8F), keyId: 6653042051040411649 } }, $db: "admin" } numYields:0 ok:0 errMsg:"No electable secondaries caught up as of 2019-03-06T12:11:19.140+0100Please use the replSetStepDown command with the argument {force: true} to force node to step down." errName:ExceededTimeLimit errCode:262 reslen:385 locks:{ Global: { acquireCount: { r: 2, W: 2 } } } protocol:op_msg 29999ms

来自 server02(辅助)的日志:

2019-03-06T12:10:52.278+0100 I REPL     [replication-1] Restarting oplog query due to error: InterruptedDueToReplStateChange: error in fetcher batch callback :: caused by :: operation was interrupted. Last fetched optime (with hash): { ts: Timestamp(1551870647, 1), t: 8 }[-3124663669138993987]. Restarts remaining: 1
2019-03-06T12:10:52.278+0100 I REPL     [replication-1] Scheduled new oplog query Fetcher source: lxc-mongodb-01:27017 database: local query: { find: "oplog.rs", filter: { ts: { $gte: Timestamp(1551870647, 1) } }, tailable: true, oplogReplay: true, awaitData: true, maxTimeMS: 2000, batchSize: 13981010, term: 8, readConcern: { afterClusterTime: Timestamp(1551870647, 1) } } query metadata: { $replData: 1, $oplogQueryData: 1, $readPreference: { mode: "secondaryPreferred" } } active: 1 findNetworkTimeout: 7000ms getMoreNetworkTimeout: 10000ms shutting down?: 0 first: 1 firstCommandScheduler: RemoteCommandRetryScheduler request: RemoteCommand 6603 -- target:lxc-mongodb-01:27017 db:local cmd:{ find: "oplog.rs", filter: { ts: { $gte: Timestamp(1551870647, 1) } }, tailable: true, oplogReplay: true, awaitData: true, maxTimeMS: 2000, batchSize: 13981010, term: 8, readConcern: { afterClusterTime: Timestamp(1551870647, 1) } } active: 1 callbackHandle.valid: 1 callbackHandle.cancelled: 0 attempt: 1 retryPolicy: RetryPolicyImpl maxAttempts: 1 maxTimeMillis: -1ms
2019-03-06T12:10:52.279+0100 W REPL     [rsBackgroundSync] Fetcher stopped querying remote oplog with error: InvalidSyncSource: Sync source cannot be behind me, and if I am up-to-date with the sync source, it must have a higher lastOpCommitted. My last fetched oplog optime: { ts: Timestamp(1551870647, 1), t: 8 }, latest oplog optime of sync source: { ts: Timestamp(1551870647, 1), t: 8 }, my lastOpCommitted: { ts: Timestamp(1551870647, 1), t: 8 }, lastOpCommitted of sync source: { ts: Timestamp(1551870647, 1), t: 8 }
2019-03-06T12:10:52.279+0100 I REPL     [rsBackgroundSync] Clearing sync source lxc-mongodb-01:27017 to choose a new one.
2019-03-06T12:10:52.279+0100 I REPL     [rsBackgroundSync] could not find member to sync from
2019-03-06T12:10:57.276+0100 I REPL     [SyncSourceFeedback] SyncSourceFeedback error sending update to lxc-mongodb-01:27017: InvalidSyncSource: Sync source was cleared. Was lxc-mongodb-01:27017
2019-03-06T12:11:27.284+0100 I REPL     [rsBackgroundSync] sync source candidate: lxc-mongodb-01:27017
2019-03-06T12:11:27.286+0100 I REPL     [rsBackgroundSync] Changed sync source from empty to lxc-mongodb-01:27017
2019-03-06T12:11:28.833+0100 I NETWORK  [LogicalSessionCacheRefresh] Starting new replica set monitor for np/lxc-mongodb-01:27017,lxc-mongodb-02:27017

来自 server03(仲裁者)的日志:

2019-03-06T12:11:29.428+0100 I NETWORK  [LogicalSessionCacheRefresh] Starting new replica set monitor for np/lxc-mongodb-01:27017,lxc-mongodb-02:27017
2019-03-06T12:11:29.429+0100 I NETWORK  [LogicalSessionCacheRefresh] Starting new replica set monitor for np/lxc-mongodb-01:27017,lxc-mongodb-02:27017

查看文档和一些线程,我尝试调整了一些设置,但没有成功:

replication.enableMajorityReadConcern = false
writeConcernMajorityJournalDefault = false

问题

那么,要使 stepDown 按预期工作,我缺少什么?

编辑 2019 年 7 月 3 日

这是rs.status()主要的输出:

np:PRIMARY> rs.status()
{
    "set" : "np",
    "date" : ISODate("2019-03-07T08:08:17.623Z"),
    "myState" : 1,
    "term" : NumberLong(8),
    "syncingTo" : "",
    "syncSourceHost" : "",
    "syncSourceId" : -1,
    "heartbeatIntervalMillis" : NumberLong(2000),
    "optimes" : {
        "lastCommittedOpTime" : {
            "ts" : Timestamp(1551946089, 1),
            "t" : NumberLong(8)
        },
        "readConcernMajorityOpTime" : {
            "ts" : Timestamp(1551946089, 1),
            "t" : NumberLong(8)
        },
        "appliedOpTime" : {
            "ts" : Timestamp(1551946089, 1),
            "t" : NumberLong(8)
        },
        "durableOpTime" : {
            "ts" : Timestamp(1551946089, 1),
            "t" : NumberLong(8)
        }
    },
    "members" : [
        {
            "_id" : 0,
            "name" : "lxc-mongodb-01:27017",
            "health" : 1,
            "state" : 1,
            "stateStr" : "PRIMARY",
            "uptime" : 75954,
            "optime" : {
                "ts" : Timestamp(1551946089, 1),
                "t" : NumberLong(8)
            },
            "optimeDate" : ISODate("2019-03-07T08:08:09Z"),
            "syncingTo" : "",
            "syncSourceHost" : "",
            "syncSourceId" : -1,
            "infoMessage" : "",
            "electionTime" : Timestamp(1551870155, 1),
            "electionDate" : ISODate("2019-03-06T11:02:35Z"),
            "configVersion" : 4,
            "self" : true,
            "lastHeartbeatMessage" : ""
        },
        {
            "_id" : 1,
            "name" : "lxc-mongodb-03:27017",
            "health" : 1,
            "state" : 7,
            "stateStr" : "ARBITER",
            "uptime" : 75952,
            "lastHeartbeat" : ISODate("2019-03-07T08:08:16.005Z"),
            "lastHeartbeatRecv" : ISODate("2019-03-07T08:08:17.410Z"),
            "pingMs" : NumberLong(0),
            "lastHeartbeatMessage" : "",
            "syncingTo" : "",
            "syncSourceHost" : "",
            "syncSourceId" : -1,
            "infoMessage" : "",
            "configVersion" : 4
        },
        {
            "_id" : 2,
            "name" : "lxc-mongodb-02:27017",
            "health" : 1,
            "state" : 2,
            "stateStr" : "SECONDARY",
            "uptime" : 75952,
            "optime" : {
                "ts" : Timestamp(1551946089, 1),
                "t" : NumberLong(8)
            },
            "optimeDurable" : {
                "ts" : Timestamp(1551946089, 1),
                "t" : NumberLong(8)
            },
            "optimeDate" : ISODate("2019-03-07T08:08:09Z"),
            "optimeDurableDate" : ISODate("2019-03-07T08:08:09Z"),
            "lastHeartbeat" : ISODate("2019-03-07T08:08:16.008Z"),
            "lastHeartbeatRecv" : ISODate("2019-03-07T08:08:15.798Z"),
            "pingMs" : NumberLong(0),
            "lastHeartbeatMessage" : "",
            "syncingTo" : "lxc-mongodb-01:27017",
            "syncSourceHost" : "lxc-mongodb-01:27017",
            "syncSourceId" : 0,
            "infoMessage" : "",
            "configVersion" : 4
        }
    ],
    "ok" : 1,
    "operationTime" : Timestamp(1551946089, 1),
    "$clusterTime" : {
        "clusterTime" : Timestamp(1551946089, 1),
        "signature" : {
            "hash" : BinData(0,"ZPnNWVwjB1K9jdaSHlnfnmRPqqM="),
            "keyId" : NumberLong("6653042051040411649")
        }
    }
}

这是rs.conf()主要的输出:

np:PRIMARY> rs.conf()
{
    "_id" : "np",
    "version" : 4,
    "protocolVersion" : NumberLong(1),
    "writeConcernMajorityJournalDefault" : false,
    "members" : [
        {
            "_id" : 0,
            "host" : "lxc-mongodb-01:27017",
            "arbiterOnly" : false,
            "buildIndexes" : true,
            "hidden" : false,
            "priority" : 1,
            "tags" : {

            },
            "slaveDelay" : NumberLong(0),
            "votes" : 1
        },
        {
            "_id" : 1,
            "host" : "lxc-mongodb-03:27017",
            "arbiterOnly" : true,
            "buildIndexes" : true,
            "hidden" : false,
            "priority" : 0,
            "tags" : {

            },
            "slaveDelay" : NumberLong(0),
            "votes" : 1
        },
        {
            "_id" : 2,
            "host" : "lxc-mongodb-02:27017",
            "arbiterOnly" : false,
            "buildIndexes" : true,
            "hidden" : false,
            "priority" : 0,
            "tags" : {

            },
            "slaveDelay" : NumberLong(0),
            "votes" : 0
        }
    ],
    "settings" : {
        "chainingAllowed" : true,
        "heartbeatIntervalMillis" : 2000,
        "heartbeatTimeoutSecs" : 10,
        "electionTimeoutMillis" : 10000,
        "catchUpTimeoutMillis" : -1,
        "catchUpTakeoverDelayMillis" : 30000,
        "getLastErrorModes" : {

        },
        "getLastErrorDefaults" : {
            "w" : 1,
            "wtimeout" : 0
        },
        "replicaSetId" : ObjectId("5c545a7d4e358716c8129ac6")
    }
}
mongodb replication
  • 1 1 个回答
  • 698 Views

1 个回答

  • Voted
  1. Best Answer
    Nicolas Payart
    2019-03-08T02:17:58+08:002019-03-08T02:17:58+08:00

    没有可选的辅助节点,因为辅助节点的优先级设置为 0(参见rs.conf()原帖。感谢Mani的建议!)。

    我更新了 lxc-mongodb-02 (_id = 2) 的优先级(和投票):

    cfg = rs.conf();
    cfg.members[0].priority = 2;
    cfg.members[2].votes = 1;
    cfg.members[2].priority = 1;
    rs.reconfig(cfg);
    

    lxc-mongodb-02 现在可以选为主节点。

    话虽如此,我只是意识到将通过更改优先级而不是使用rs.stepDown()命令来执行永久切换。

    因此,为了将 lxc-mongodb-02 提升为主节点,我运行:

    cfg = rs.conf();
    cfg.members[2].priority = 3;
    np:PRIMARY> rs.reconfig(cfg);
    {
        "ok" : 1,
        "operationTime" : Timestamp(1551953687, 1),
        "$clusterTime" : {
            "clusterTime" : Timestamp(1551953687, 1),
            "signature" : {
                "hash" : BinData(0,"r4jVzPM1nUnJ44THZ3E+cJA1SDU="),
                "keyId" : NumberLong("6653042051040411649")
            }
        }
    }
    
    • 5

相关问题

  • 关于操作/管理 MongoDB 的良好资源

  • 在同一台物理服务器上运行复制是不明智的吗?

  • 有没有办法以小于 1 秒的分辨率测量 MySQL 中的复制滞后?

  • 运行时间偏移延迟复制的最佳实践

  • PostgreSQL 9.0 Replication 和 Slony-I 有什么区别?

Sidebar

Stats

  • 问题 205573
  • 回答 270741
  • 最佳答案 135370
  • 用户 68524
  • 热门
  • 回答
  • Marko Smith

    连接到 PostgreSQL 服务器:致命:主机没有 pg_hba.conf 条目

    • 12 个回答
  • Marko Smith

    如何让sqlplus的输出出现在一行中?

    • 3 个回答
  • Marko Smith

    选择具有最大日期或最晚日期的日期

    • 3 个回答
  • Marko Smith

    如何列出 PostgreSQL 中的所有模式?

    • 4 个回答
  • Marko Smith

    列出指定表的所有列

    • 5 个回答
  • Marko Smith

    如何在不修改我自己的 tnsnames.ora 的情况下使用 sqlplus 连接到位于另一台主机上的 Oracle 数据库

    • 4 个回答
  • Marko Smith

    你如何mysqldump特定的表?

    • 4 个回答
  • Marko Smith

    使用 psql 列出数据库权限

    • 10 个回答
  • Marko Smith

    如何从 PostgreSQL 中的选择查询中将值插入表中?

    • 4 个回答
  • Marko Smith

    如何使用 psql 列出所有数据库和表?

    • 7 个回答
  • Martin Hope
    Jin 连接到 PostgreSQL 服务器:致命:主机没有 pg_hba.conf 条目 2014-12-02 02:54:58 +0800 CST
  • Martin Hope
    Stéphane 如何列出 PostgreSQL 中的所有模式? 2013-04-16 11:19:16 +0800 CST
  • Martin Hope
    Mike Walsh 为什么事务日志不断增长或空间不足? 2012-12-05 18:11:22 +0800 CST
  • Martin Hope
    Stephane Rolland 列出指定表的所有列 2012-08-14 04:44:44 +0800 CST
  • Martin Hope
    haxney MySQL 能否合理地对数十亿行执行查询? 2012-07-03 11:36:13 +0800 CST
  • Martin Hope
    qazwsx 如何监控大型 .sql 文件的导入进度? 2012-05-03 08:54:41 +0800 CST
  • Martin Hope
    markdorison 你如何mysqldump特定的表? 2011-12-17 12:39:37 +0800 CST
  • Martin Hope
    Jonas 如何使用 psql 对 SQL 查询进行计时? 2011-06-04 02:22:54 +0800 CST
  • Martin Hope
    Jonas 如何从 PostgreSQL 中的选择查询中将值插入表中? 2011-05-28 00:33:05 +0800 CST
  • Martin Hope
    Jonas 如何使用 psql 列出所有数据库和表? 2011-02-18 00:45:49 +0800 CST

热门标签

sql-server mysql postgresql sql-server-2014 sql-server-2016 oracle sql-server-2008 database-design query-performance sql-server-2017

Explore

  • 主页
  • 问题
    • 最新
    • 热门
  • 标签
  • 帮助

Footer

AskOverflow.Dev

关于我们

  • 关于我们
  • 联系我们

Legal Stuff

  • Privacy Policy

Language

  • Pt
  • Server
  • Unix

© 2023 AskOverflow.DEV All Rights Reserve