Bons recursos para operar/administrar o MongoDB

Question

Nicolas Payart

Asked: 2019-03-07 05:13:23 +0800 CST2019-03-07 05:13:23 +0800 CST 2019-03-07 05:13:23 +0800 CST

O stepDown do MongoDB falha em uma arquitetura PSA

772

Configurei um cluster MongoDB usando uma arquitetura de árbitro primário-secundário de 3 membros

Meio Ambiente:

Contêineres LXC
Linux Debian Stretch (9.8)
Versão do servidor MongoDB: 4.0.6

Contêineres MongoDB:

lxc-mongodb-01 (primário)
lxc-mongodb-02 (secundário)
lxc-mongodb-03 (árbitro)

Status de replicação

Tudo parece funcionar bem e a replicação está funcionando:

np:PRIMARY> rs.printSlaveReplicationInfo()
source: lxc-mongodb-02:27017
    syncedTo: Wed Mar 06 2019 12:08:27 GMT+0100 (CET)
    0 secs (0 hrs) behind the primary

Falha na alternância

Mas quando tento alternar primário/secundário usando rs.stepDown(), ele falha com uma mensagem de erro "Nenhum secundário elegível capturado":

np:PRIMARY> rs.stepDown(60, 30)
{
    "operationTime" : Timestamp(1551870647, 1),
    "ok" : 0,
    "errmsg" : "No electable secondaries caught up as of 2019-03-06T12:11:19.140+0100Please use the replSetStepDown command with the argument {force: true} to force node to step down.",
    "code" : 262,
    "codeName" : "ExceededTimeLimit",
    "$clusterTime" : {
        "clusterTime" : Timestamp(1551870647, 1),
        "signature" : {
            "hash" : BinData(0,"+/jQR8cG+y/bPtoF7gnv2Pmn2BY="),
            "keyId" : NumberLong("6653042051040411649")
        }
    }
}

Observe que este é um cluster de não produção, portanto, não há transação em andamento.

Logs do server01 (primário):

2019-03-06T12:08:07.709+0100 I ACCESS   [conn17] Successfully authenticated as principal root on admin
2019-03-06T12:10:49.140+0100 I COMMAND  [conn17] Attempting to step down in response to replSetStepDown command
2019-03-06T12:11:19.140+0100 I COMMAND  [conn17] command admin.$cmd appName: "MongoDB Shell" command: replSetStepDown { replSetStepDown: 60.0, secondaryCatchUpPeriodSecs: 30.0, lsid: { id: UUID("8941645a-c582-4353-b216-6e5ee91c08b0") }, $clusterTime: { clusterTime: Timestamp(1551870507, 1), signature: { hash: BinData(0, 484DDC04A03F9CBEDA0E5FA5E4F438F414E43E8F), keyId: 6653042051040411649 } }, $db: "admin" } numYields:0 ok:0 errMsg:"No electable secondaries caught up as of 2019-03-06T12:11:19.140+0100Please use the replSetStepDown command with the argument {force: true} to force node to step down." errName:ExceededTimeLimit errCode:262 reslen:385 locks:{ Global: { acquireCount: { r: 2, W: 2 } } } protocol:op_msg 29999ms

Logs do server02 (secundário):

2019-03-06T12:10:52.278+0100 I REPL     [replication-1] Restarting oplog query due to error: InterruptedDueToReplStateChange: error in fetcher batch callback :: caused by :: operation was interrupted. Last fetched optime (with hash): { ts: Timestamp(1551870647, 1), t: 8 }[-3124663669138993987]. Restarts remaining: 1
2019-03-06T12:10:52.278+0100 I REPL     [replication-1] Scheduled new oplog query Fetcher source: lxc-mongodb-01:27017 database: local query: { find: "oplog.rs", filter: { ts: { $gte: Timestamp(1551870647, 1) } }, tailable: true, oplogReplay: true, awaitData: true, maxTimeMS: 2000, batchSize: 13981010, term: 8, readConcern: { afterClusterTime: Timestamp(1551870647, 1) } } query metadata: { $replData: 1, $oplogQueryData: 1, $readPreference: { mode: "secondaryPreferred" } } active: 1 findNetworkTimeout: 7000ms getMoreNetworkTimeout: 10000ms shutting down?: 0 first: 1 firstCommandScheduler: RemoteCommandRetryScheduler request: RemoteCommand 6603 -- target:lxc-mongodb-01:27017 db:local cmd:{ find: "oplog.rs", filter: { ts: { $gte: Timestamp(1551870647, 1) } }, tailable: true, oplogReplay: true, awaitData: true, maxTimeMS: 2000, batchSize: 13981010, term: 8, readConcern: { afterClusterTime: Timestamp(1551870647, 1) } } active: 1 callbackHandle.valid: 1 callbackHandle.cancelled: 0 attempt: 1 retryPolicy: RetryPolicyImpl maxAttempts: 1 maxTimeMillis: -1ms
2019-03-06T12:10:52.279+0100 W REPL     [rsBackgroundSync] Fetcher stopped querying remote oplog with error: InvalidSyncSource: Sync source cannot be behind me, and if I am up-to-date with the sync source, it must have a higher lastOpCommitted. My last fetched oplog optime: { ts: Timestamp(1551870647, 1), t: 8 }, latest oplog optime of sync source: { ts: Timestamp(1551870647, 1), t: 8 }, my lastOpCommitted: { ts: Timestamp(1551870647, 1), t: 8 }, lastOpCommitted of sync source: { ts: Timestamp(1551870647, 1), t: 8 }
2019-03-06T12:10:52.279+0100 I REPL     [rsBackgroundSync] Clearing sync source lxc-mongodb-01:27017 to choose a new one.
2019-03-06T12:10:52.279+0100 I REPL     [rsBackgroundSync] could not find member to sync from
2019-03-06T12:10:57.276+0100 I REPL     [SyncSourceFeedback] SyncSourceFeedback error sending update to lxc-mongodb-01:27017: InvalidSyncSource: Sync source was cleared. Was lxc-mongodb-01:27017
2019-03-06T12:11:27.284+0100 I REPL     [rsBackgroundSync] sync source candidate: lxc-mongodb-01:27017
2019-03-06T12:11:27.286+0100 I REPL     [rsBackgroundSync] Changed sync source from empty to lxc-mongodb-01:27017
2019-03-06T12:11:28.833+0100 I NETWORK  [LogicalSessionCacheRefresh] Starting new replica set monitor for np/lxc-mongodb-01:27017,lxc-mongodb-02:27017

Logs do server03 (árbitro):

2019-03-06T12:11:29.428+0100 I NETWORK  [LogicalSessionCacheRefresh] Starting new replica set monitor for np/lxc-mongodb-01:27017,lxc-mongodb-02:27017
2019-03-06T12:11:29.429+0100 I NETWORK  [LogicalSessionCacheRefresh] Starting new replica set monitor for np/lxc-mongodb-01:27017,lxc-mongodb-02:27017

Olhando para a documentação e alguns tópicos, tentei ajustar algumas configurações, sem sucesso:

replication.enableMajorityReadConcern = false
writeConcernMajorityJournalDefault = false

Pergunta

Então, o que estou perdendo para fazer o stepDown funcionar como esperado?

EDITAR 07/03/2019

Aqui está rs.status()a saída do primário:

np:PRIMARY> rs.status()
{
    "set" : "np",
    "date" : ISODate("2019-03-07T08:08:17.623Z"),
    "myState" : 1,
    "term" : NumberLong(8),
    "syncingTo" : "",
    "syncSourceHost" : "",
    "syncSourceId" : -1,
    "heartbeatIntervalMillis" : NumberLong(2000),
    "optimes" : {
        "lastCommittedOpTime" : {
            "ts" : Timestamp(1551946089, 1),
            "t" : NumberLong(8)
        },
        "readConcernMajorityOpTime" : {
            "ts" : Timestamp(1551946089, 1),
            "t" : NumberLong(8)
        },
        "appliedOpTime" : {
            "ts" : Timestamp(1551946089, 1),
            "t" : NumberLong(8)
        },
        "durableOpTime" : {
            "ts" : Timestamp(1551946089, 1),
            "t" : NumberLong(8)
        }
    },
    "members" : [
        {
            "_id" : 0,
            "name" : "lxc-mongodb-01:27017",
            "health" : 1,
            "state" : 1,
            "stateStr" : "PRIMARY",
            "uptime" : 75954,
            "optime" : {
                "ts" : Timestamp(1551946089, 1),
                "t" : NumberLong(8)
            },
            "optimeDate" : ISODate("2019-03-07T08:08:09Z"),
            "syncingTo" : "",
            "syncSourceHost" : "",
            "syncSourceId" : -1,
            "infoMessage" : "",
            "electionTime" : Timestamp(1551870155, 1),
            "electionDate" : ISODate("2019-03-06T11:02:35Z"),
            "configVersion" : 4,
            "self" : true,
            "lastHeartbeatMessage" : ""
        },
        {
            "_id" : 1,
            "name" : "lxc-mongodb-03:27017",
            "health" : 1,
            "state" : 7,
            "stateStr" : "ARBITER",
            "uptime" : 75952,
            "lastHeartbeat" : ISODate("2019-03-07T08:08:16.005Z"),
            "lastHeartbeatRecv" : ISODate("2019-03-07T08:08:17.410Z"),
            "pingMs" : NumberLong(0),
            "lastHeartbeatMessage" : "",
            "syncingTo" : "",
            "syncSourceHost" : "",
            "syncSourceId" : -1,
            "infoMessage" : "",
            "configVersion" : 4
        },
        {
            "_id" : 2,
            "name" : "lxc-mongodb-02:27017",
            "health" : 1,
            "state" : 2,
            "stateStr" : "SECONDARY",
            "uptime" : 75952,
            "optime" : {
                "ts" : Timestamp(1551946089, 1),
                "t" : NumberLong(8)
            },
            "optimeDurable" : {
                "ts" : Timestamp(1551946089, 1),
                "t" : NumberLong(8)
            },
            "optimeDate" : ISODate("2019-03-07T08:08:09Z"),
            "optimeDurableDate" : ISODate("2019-03-07T08:08:09Z"),
            "lastHeartbeat" : ISODate("2019-03-07T08:08:16.008Z"),
            "lastHeartbeatRecv" : ISODate("2019-03-07T08:08:15.798Z"),
            "pingMs" : NumberLong(0),
            "lastHeartbeatMessage" : "",
            "syncingTo" : "lxc-mongodb-01:27017",
            "syncSourceHost" : "lxc-mongodb-01:27017",
            "syncSourceId" : 0,
            "infoMessage" : "",
            "configVersion" : 4
        }
    ],
    "ok" : 1,
    "operationTime" : Timestamp(1551946089, 1),
    "$clusterTime" : {
        "clusterTime" : Timestamp(1551946089, 1),
        "signature" : {
            "hash" : BinData(0,"ZPnNWVwjB1K9jdaSHlnfnmRPqqM="),
            "keyId" : NumberLong("6653042051040411649")
        }
    }
}

Aqui está rs.conf()a saída do primário:

np:PRIMARY> rs.conf()
{
    "_id" : "np",
    "version" : 4,
    "protocolVersion" : NumberLong(1),
    "writeConcernMajorityJournalDefault" : false,
    "members" : [
        {
            "_id" : 0,
            "host" : "lxc-mongodb-01:27017",
            "arbiterOnly" : false,
            "buildIndexes" : true,
            "hidden" : false,
            "priority" : 1,
            "tags" : {

            },
            "slaveDelay" : NumberLong(0),
            "votes" : 1
        },
        {
            "_id" : 1,
            "host" : "lxc-mongodb-03:27017",
            "arbiterOnly" : true,
            "buildIndexes" : true,
            "hidden" : false,
            "priority" : 0,
            "tags" : {

            },
            "slaveDelay" : NumberLong(0),
            "votes" : 1
        },
        {
            "_id" : 2,
            "host" : "lxc-mongodb-02:27017",
            "arbiterOnly" : false,
            "buildIndexes" : true,
            "hidden" : false,
            "priority" : 0,
            "tags" : {

            },
            "slaveDelay" : NumberLong(0),
            "votes" : 0
        }
    ],
    "settings" : {
        "chainingAllowed" : true,
        "heartbeatIntervalMillis" : 2000,
        "heartbeatTimeoutSecs" : 10,
        "electionTimeoutMillis" : 10000,
        "catchUpTimeoutMillis" : -1,
        "catchUpTakeoverDelayMillis" : 30000,
        "getLastErrorModes" : {

        },
        "getLastErrorDefaults" : {
            "w" : 1,
            "wtimeout" : 0
        },
        "replicaSetId" : ObjectId("5c545a7d4e358716c8129ac6")
    }
}

1 respostas

Voted

Nicolas Payart · Answer 1 · 2019-03-08T02:17:58+08:00

Best Answer

Nicolas Payart

2019-03-08T02:17:58+08:002019-03-08T02:17:58+08:00

Não há secundários elegíveis porque a prioridade do nó secundário está definida como 0 (veja rs.conf()no post original. Obrigado a Mani por sua sugestão!).

Atualizei a prioridade (e votos ) para lxc-mongodb-02 (_id = 2):

cfg = rs.conf();
cfg.members[0].priority = 2;
cfg.members[2].votes = 1;
cfg.members[2].priority = 1;
rs.reconfig(cfg);

lxc-mongodb-02 agora é elegível como PRIMARY.

Dito isto, acabei de perceber que uma alternância permanente será realizada alterando a prioridade em vez de usar rs.stepDown()o comando.

Então, para promover o lxc-mongodb-02 como primário, eu corro:

cfg = rs.conf();
cfg.members[2].priority = 3;
np:PRIMARY> rs.reconfig(cfg);
{
    "ok" : 1,
    "operationTime" : Timestamp(1551953687, 1),
    "$clusterTime" : {
        "clusterTime" : Timestamp(1551953687, 1),
        "signature" : {
            "hash" : BinData(0,"r4jVzPM1nUnJ44THZ3E+cJA1SDU="),
            "keyId" : NumberLong("6653042051040411649")
        }
    }
}

5

O stepDown do MongoDB falha em uma arquitetura PSA

Meio Ambiente:

Contêineres MongoDB:

Status de replicação

Falha na alternância

Logs do server01 (primário):

Logs do server02 (secundário):

Logs do server03 (árbitro):

Pergunta

EDITAR 07/03/2019

conectar ao servidor PostgreSQL: FATAL: nenhuma entrada pg_hba.conf para o host

Como fazer a saída do sqlplus aparecer em uma linha?

Selecione qual tem data máxima ou data mais recente

Como faço para listar todos os esquemas no PostgreSQL?

Listar todas as colunas de uma tabela especificada

Como usar o sqlplus para se conectar a um banco de dados Oracle localizado em outro host sem modificar meu próprio tnsnames.ora

Como você mysqldump tabela (s) específica (s)?

Listar os privilégios do banco de dados usando o psql

Como inserir valores em uma tabela de uma consulta de seleção no PostgreSQL?

Como faço para listar todos os bancos de dados e tabelas usando o psql?

O stepDown do MongoDB falha em uma arquitetura PSA

Meio Ambiente:

Contêineres MongoDB:

Status de replicação

Falha na alternância

Logs do server01 (primário):

Logs do server02 (secundário):

Logs do server03 (árbitro):

Pergunta

EDITAR 07/03/2019

1 respostas

relate perguntas