我有一个 3 节点的 Patroni2.1.1 + PGSQL13.4 集群,其中一个节点在完成基本备份引导后表现得很奇怪。当通过相同的过程时,我有另一个相同的节点工作正常。
基本上,一旦 Patroni 尝试连接到 localhost:5432,服务器似乎就死机并重新分叉(据我所知)。您将在“循环”下方看到从connection received: host=::1 port=XXXXX
.
2021-11-01 12:01:43.395 EDT [242925] LOG: starting PostgreSQL 13.4 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 8.4.1 20200928 (Red Hat 8.4.1-1), 64-bit
2021-11-01 12:01:43.396 EDT [242925] LOG: listening on IPv6 address "::1", port 5432
2021-11-01 12:01:43.396 EDT [242925] LOG: listening on IPv4 address "127.0.0.1", port 5432
2021-11-01 12:01:43.397 EDT [242925] LOG: listening on IPv4 address "192.168.23.13", port 5432
2021-11-01 12:01:43.398 EDT [242925] LOG: listening on Unix socket "/var/run/postgresql/.s.PGSQL.5432"
2021-11-01 12:01:43.401 EDT [242925] LOG: listening on Unix socket "/tmp/.s.PGSQL.5432"
2021-11-01 12:01:43.405 EDT [242928] LOG: database system was interrupted; last known up at 2021-11-01 12:01:25 EDT
2021-11-01 12:01:43.405 EDT [242928] DEBUG: removing all temporary WAL segments
2021-11-01 12:01:43.436 EDT [242928] DEBUG: executing restore command "cp /var/lib/pgsql/13/archive/00000077.history pg_wal/RECOVERYHISTORY"
cp: cannot stat '/var/lib/pgsql/13/archive/00000077.history': No such file or directory
2021-11-01 12:01:43.440 EDT [242928] DEBUG: could not restore file "00000077.history" from archive: child process exited with exit code 1
2021-11-01 12:01:43.441 EDT [242928] LOG: entering standby mode
2021-11-01 12:01:43.441 EDT [242928] DEBUG: backup time 2021-11-01 12:01:25 EDT in file "backup_label"
2021-11-01 12:01:43.441 EDT [242928] DEBUG: backup label pg_basebackup base backup in file "backup_label"
2021-11-01 12:01:43.441 EDT [242928] DEBUG: backup timeline 118 in file "backup_label"
2021-11-01 12:01:43.441 EDT [242928] DEBUG: executing restore command "cp /var/lib/pgsql/13/archive/00000076.history pg_wal/RECOVERYHISTORY"
2021-11-01 12:01:43.538 EDT [242928] LOG: restored log file "00000076.history" from archive
2021-11-01 12:01:43.540 EDT [242928] DEBUG: executing restore command "cp /var/lib/pgsql/13/archive/000000760000000100000000 pg_wal/RECOVERYXLOG"
2021-11-01 12:01:43.689 EDT [242928] LOG: restored log file "000000760000000100000000" from archive
2021-11-01 12:01:43.692 EDT [242928] DEBUG: got WAL segment from archive
2021-11-01 12:01:43.693 EDT [242928] DEBUG: checkpoint record is at 1/60
2021-11-01 12:01:43.693 EDT [242928] DEBUG: redo record is at 1/28; shutdown false
2021-11-01 12:01:43.693 EDT [242928] DEBUG: next transaction ID: 531; next OID: 24606
2021-11-01 12:01:43.693 EDT [242928] DEBUG: next MultiXactId: 1; next MultiXactOffset: 0
2021-11-01 12:01:43.693 EDT [242928] DEBUG: oldest unfrozen transaction ID: 478, in database 1
2021-11-01 12:01:43.693 EDT [242928] DEBUG: oldest MultiXactId: 1, in database 1
2021-11-01 12:01:43.693 EDT [242928] DEBUG: commit timestamp Xid oldest/newest: 0/0
2021-11-01 12:01:43.693 EDT [242928] DEBUG: transaction ID wrap limit is 2147484125, limited by database with OID 1
2021-11-01 12:01:43.693 EDT [242928] DEBUG: MultiXactId wrap limit is 2147483648, limited by database with OID 1
2021-11-01 12:01:43.693 EDT [242928] DEBUG: starting up replication slots
2021-11-01 12:01:43.693 EDT [242928] DEBUG: starting up replication origin progress state
2021-11-01 12:01:43.697 EDT [242928] DEBUG: resetting unlogged relations: cleanup 1 init 0
2021-11-01 12:01:43.698 EDT [242928] DEBUG: initializing for hot standby
2021-11-01 12:01:43.698 EDT [242928] DEBUG: my backend ID is 1
2021-11-01 12:01:43.698 EDT [242928] LOG: redo starts at 1/28
2021-11-01 12:01:43.698 EDT [242928] DEBUG: prune KnownAssignedXids to 531
2021-11-01 12:01:43.698 EDT [242928] CONTEXT: WAL redo at 1/28 for Standby/RUNNING_XACTS: nextXid 531 latestCompletedXid 530 oldestRunningXid 531
2021-11-01 12:01:43.698 EDT [242928] DEBUG: 0 KnownAssignedXids (num=0 tail=0 head=0)
2021-11-01 12:01:43.698 EDT [242928] CONTEXT: WAL redo at 1/28 for Standby/RUNNING_XACTS: nextXid 531 latestCompletedXid 530 oldestRunningXid 531
2021-11-01 12:01:43.698 EDT [242928] DEBUG: recovery snapshots are now enabled
2021-11-01 12:01:43.698 EDT [242928] CONTEXT: WAL redo at 1/28 for Standby/RUNNING_XACTS: nextXid 531 latestCompletedXid 530 oldestRunningXid 531
2021-11-01 12:01:43.698 EDT [242928] DEBUG: prune KnownAssignedXids to 531
2021-11-01 12:01:43.698 EDT [242928] CONTEXT: WAL redo at 1/D8 for Standby/RUNNING_XACTS: nextXid 531 latestCompletedXid 530 oldestRunningXid 531
2021-11-01 12:01:43.698 EDT [242928] DEBUG: end of backup reached
2021-11-01 12:01:43.698 EDT [242928] CONTEXT: WAL redo at 1/110 for XLOG/BACKUP_END: 1/28
2021-11-01 12:01:43.699 EDT [242932] DEBUG: checkpointer updated shared memory configuration values
2021-11-01 12:01:43.700 EDT [242928] LOG: consistent recovery state reached at 1/138
2021-11-01 12:01:43.700 EDT [242928] DEBUG: executing restore command "cp /var/lib/pgsql/13/archive/000000760000000100000001 pg_wal/RECOVERYXLOG"
2021-11-01 12:01:43.700 EDT [242925] LOG: database system is ready to accept read only connections
cp: cannot stat '/var/lib/pgsql/13/archive/000000760000000100000001': No such file or directory
2021-11-01 12:01:43.704 EDT [242928] DEBUG: could not restore file "000000760000000100000001" from archive: child process exited with exit code 1
2021-11-01 12:01:43.704 EDT [242928] DEBUG: could not open file "pg_wal/000000760000000100000001": No such file or directory
2021-11-01 12:01:43.704 EDT [242928] DEBUG: switched WAL source from archive to stream after failure
2021-11-01 12:01:43.705 EDT [242936] DEBUG: find_in_dynamic_libpath: trying "/usr/pgsql-13/lib/libpqwalreceiver"
2021-11-01 12:01:43.706 EDT [242936] DEBUG: find_in_dynamic_libpath: trying "/usr/pgsql-13/lib/libpqwalreceiver.so"
2021-11-01 12:01:44.370 EDT [242925] DEBUG: forked new backend, pid=242938 socket=12
2021-11-01 12:01:44.371 EDT [242938] LOG: connection received: host=::1 port=36714
2021-11-01 12:01:51.235 EDT [242938] DEBUG: shmem_exit(0): 0 before_shmem_exit callbacks to make
2021-11-01 12:01:51.235 EDT [242938] DEBUG: shmem_exit(0): 0 on_shmem_exit callbacks to make
2021-11-01 12:01:51.235 EDT [242938] DEBUG: proc_exit(0): 1 callbacks to make
2021-11-01 12:01:51.235 EDT [242938] DEBUG: exit(0)
2021-11-01 12:01:51.235 EDT [242938] DEBUG: shmem_exit(-1): 0 before_shmem_exit callbacks to make
2021-11-01 12:01:51.235 EDT [242938] DEBUG: shmem_exit(-1): 0 on_shmem_exit callbacks to make
2021-11-01 12:01:51.235 EDT [242938] DEBUG: proc_exit(-1): 0 callbacks to make
2021-11-01 12:01:51.236 EDT [242925] DEBUG: forked new backend, pid=242941 socket=12
2021-11-01 12:01:51.237 EDT [242941] LOG: connection received: host=127.0.0.1 port=43428
2021-11-01 12:01:51.237 EDT [242925] DEBUG: reaping dead processes
2021-11-01 12:01:51.237 EDT [242925] DEBUG: server process (PID 242938) exited with exit code 0
2021-11-01 12:01:58.284 EDT [242941] DEBUG: shmem_exit(0): 0 before_shmem_exit callbacks to make
2021-11-01 12:01:58.284 EDT [242941] DEBUG: shmem_exit(0): 0 on_shmem_exit callbacks to make
2021-11-01 12:01:58.284 EDT [242941] DEBUG: proc_exit(0): 1 callbacks to make
2021-11-01 12:01:58.284 EDT [242941] DEBUG: exit(0)
2021-11-01 12:01:58.284 EDT [242941] DEBUG: shmem_exit(-1): 0 before_shmem_exit callbacks to make
2021-11-01 12:01:58.284 EDT [242941] DEBUG: shmem_exit(-1): 0 on_shmem_exit callbacks to make
2021-11-01 12:01:58.284 EDT [242941] DEBUG: proc_exit(-1): 0 callbacks to make
2021-11-01 12:01:58.285 EDT [242925] DEBUG: reaping dead processes
2021-11-01 12:01:58.285 EDT [242925] DEBUG: server process (PID 242941) exited with exit code 0
2021-11-01 12:01:59.295 EDT [242925] DEBUG: forked new backend, pid=242944 socket=12
2021-11-01 12:01:59.295 EDT [242944] LOG: connection received: host=::1 port=36758
2021-11-01 12:02:06.301 EDT [242944] DEBUG: shmem_exit(0): 0 before_shmem_exit callbacks to make
2021-11-01 12:02:06.301 EDT [242944] DEBUG: shmem_exit(0): 0 on_shmem_exit callbacks to make
2021-11-01 12:02:06.301 EDT [242944] DEBUG: proc_exit(0): 1 callbacks to make
2021-11-01 12:02:06.301 EDT [242944] DEBUG: exit(0)
2021-11-01 12:02:06.301 EDT [242944] DEBUG: shmem_exit(-1): 0 before_shmem_exit callbacks to make
2021-11-01 12:02:06.301 EDT [242944] DEBUG: shmem_exit(-1): 0 on_shmem_exit callbacks to make
2021-11-01 12:02:06.301 EDT [242944] DEBUG: proc_exit(-1): 0 callbacks to make
2021-11-01 12:02:06.301 EDT [242925] DEBUG: forked new backend, pid=242948 socket=12
2021-11-01 12:02:06.302 EDT [242925] DEBUG: reaping dead processes
2021-11-01 12:02:06.302 EDT [242925] DEBUG: server process (PID 242944) exited with exit code 0
2021-11-01 12:02:06.302 EDT [242948] LOG: connection received: host=127.0.0.1 port=43470
2021-11-01 12:02:13.440 EDT [242948] DEBUG: shmem_exit(0): 0 before_shmem_exit callbacks to make
2021-11-01 12:02:13.440 EDT [242948] DEBUG: shmem_exit(0): 0 on_shmem_exit callbacks to make
2021-11-01 12:02:13.440 EDT [242948] DEBUG: proc_exit(0): 1 callbacks to make
2021-11-01 12:02:13.440 EDT [242948] DEBUG: exit(0)
2021-11-01 12:02:13.440 EDT [242948] DEBUG: shmem_exit(-1): 0 before_shmem_exit callbacks to make
2021-11-01 12:02:13.440 EDT [242948] DEBUG: shmem_exit(-1): 0 on_shmem_exit callbacks to make
2021-11-01 12:02:13.440 EDT [242948] DEBUG: proc_exit(-1): 0 callbacks to make
2021-11-01 12:02:13.441 EDT [242925] DEBUG: reaping dead processes
2021-11-01 12:02:13.441 EDT [242925] DEBUG: server process (PID 242948) exited with exit code 0
2021-11-01 12:02:14.148 EDT [242925] DEBUG: postmaster received signal 2
2021-11-01 12:02:14.148 EDT [242925] LOG: received fast shutdown request
2021-11-01 12:02:14.150 EDT [242925] LOG: aborting any active transactions
2021-11-01 12:02:14.150 EDT [242933] DEBUG: shmem_exit(0): 1 before_shmem_exit callbacks to make
2021-11-01 12:02:14.150 EDT [242933] DEBUG: shmem_exit(0): 5 on_shmem_exit callbacks to make
2021-11-01 12:02:14.150 EDT [242933] DEBUG: proc_exit(0): 2 callbacks to make
2021-11-01 12:02:14.150 EDT [242933] DEBUG: exit(0)
2021-11-01 12:02:14.150 EDT [242933] DEBUG: shmem_exit(-1): 0 before_shmem_exit callbacks to make
2021-11-01 12:02:14.150 EDT [242933] DEBUG: shmem_exit(-1): 0 on_shmem_exit callbacks to make
2021-11-01 12:02:14.150 EDT [242933] DEBUG: proc_exit(-1): 0 callbacks to make
2021-11-01 12:02:14.150 EDT [242928] DEBUG: shmem_exit(1): 1 before_shmem_exit callbacks to make
2021-11-01 12:02:14.150 EDT [242928] DEBUG: shmem_exit(1): 7 on_shmem_exit callbacks to make
2021-11-01 12:02:14.150 EDT [242928] DEBUG: removing all KnownAssignedXids
2021-11-01 12:02:14.150 EDT [242928] DEBUG: release all standby locks
2021-11-01 12:02:14.150 EDT [242928] DEBUG: proc_exit(1): 2 callbacks to make
2021-11-01 12:02:14.150 EDT [242928] DEBUG: exit(1)
2021-11-01 12:02:14.150 EDT [242928] DEBUG: shmem_exit(-1): 0 before_shmem_exit callbacks to make
2021-11-01 12:02:14.150 EDT [242928] DEBUG: shmem_exit(-1): 0 on_shmem_exit callbacks to make
2021-11-01 12:02:14.150 EDT [242928] DEBUG: proc_exit(-1): 0 callbacks to make
2021-11-01 12:02:14.151 EDT [242925] DEBUG: reaping dead processes
2021-11-01 12:02:14.151 EDT [242925] DEBUG: reaping dead processes
2021-11-01 12:02:14.449 EDT [242925] DEBUG: forked new backend, pid=242954 socket=12
2021-11-01 12:02:14.450 EDT [242954] LOG: connection received: host=::1 port=36798
2021-11-01 12:02:15.915 EDT [242936] FATAL: terminating walreceiver process due to administrator command
2021-11-01 12:02:15.915 EDT [242936] DEBUG: shmem_exit(1): 1 before_shmem_exit callbacks to make
2021-11-01 12:02:15.915 EDT [242936] DEBUG: shmem_exit(1): 6 on_shmem_exit callbacks to make
2021-11-01 12:02:15.915 EDT [242936] DEBUG: proc_exit(1): 2 callbacks to make
2021-11-01 12:02:15.915 EDT [242936] DEBUG: exit(1)
2021-11-01 12:02:15.916 EDT [242936] DEBUG: shmem_exit(-1): 0 before_shmem_exit callbacks to make
2021-11-01 12:02:15.916 EDT [242936] DEBUG: shmem_exit(-1): 0 on_shmem_exit callbacks to make
2021-11-01 12:02:15.916 EDT [242936] DEBUG: proc_exit(-1): 0 callbacks to make
2021-11-01 12:02:15.917 EDT [242925] DEBUG: reaping dead processes
2021-11-01 12:02:15.917 EDT [242932] LOG: shutting down
2021-11-01 12:02:15.917 EDT [242932] DEBUG: skipping restartpoint, already performed at 1/28
2021-11-01 12:02:15.919 EDT [242932] DEBUG: updated min recovery point to 1/1000000 on timeline 118
2021-11-01 12:02:15.920 EDT [242932] DEBUG: shmem_exit(0): 1 before_shmem_exit callbacks to make
2021-11-01 12:02:15.920 EDT [242932] DEBUG: shmem_exit(0): 5 on_shmem_exit callbacks to make
2021-11-01 12:02:15.920 EDT [242932] DEBUG: proc_exit(0): 2 callbacks to make
2021-11-01 12:02:15.920 EDT [242932] DEBUG: exit(0)
2021-11-01 12:02:15.920 EDT [242932] DEBUG: shmem_exit(-1): 0 before_shmem_exit callbacks to make
2021-11-01 12:02:15.920 EDT [242932] DEBUG: shmem_exit(-1): 0 on_shmem_exit callbacks to make
2021-11-01 12:02:15.920 EDT [242932] DEBUG: proc_exit(-1): 0 callbacks to make
2021-11-01 12:02:15.921 EDT [242925] DEBUG: reaping dead processes
2021-11-01 12:02:15.921 EDT [242934] DEBUG: writing stats file "pg_stat/global.stat"
2021-11-01 12:02:15.921 EDT [242934] DEBUG: shmem_exit(-1): 0 before_shmem_exit callbacks to make
2021-11-01 12:02:15.921 EDT [242934] DEBUG: shmem_exit(-1): 0 on_shmem_exit callbacks to make
2021-11-01 12:02:15.921 EDT [242934] DEBUG: proc_exit(-1): 0 callbacks to make
2021-11-01 12:02:15.922 EDT [242925] DEBUG: reaping dead processes
2021-11-01 12:02:17.890 EDT [242925] DEBUG: postmaster received signal 2
2021-11-01 12:02:18.617 EDT [242954] DEBUG: shmem_exit(0): 0 before_shmem_exit callbacks to make
2021-11-01 12:02:18.617 EDT [242954] DEBUG: shmem_exit(0): 0 on_shmem_exit callbacks to make
2021-11-01 12:02:18.617 EDT [242954] DEBUG: proc_exit(0): 1 callbacks to make
2021-11-01 12:02:18.617 EDT [242954] DEBUG: exit(0)
2021-11-01 12:02:18.617 EDT [242954] DEBUG: shmem_exit(-1): 0 before_shmem_exit callbacks to make
2021-11-01 12:02:18.617 EDT [242954] DEBUG: shmem_exit(-1): 0 on_shmem_exit callbacks to make
2021-11-01 12:02:18.617 EDT [242954] DEBUG: proc_exit(-1): 0 callbacks to make
2021-11-01 12:02:18.618 EDT [242925] DEBUG: reaping dead processes
2021-11-01 12:02:18.618 EDT [242925] DEBUG: server process (PID 242954) exited with exit code 0
2021-11-01 12:02:18.618 EDT [242925] DEBUG: shmem_exit(0): 0 before_shmem_exit callbacks to make
2021-11-01 12:02:18.618 EDT [242925] DEBUG: shmem_exit(0): 5 on_shmem_exit callbacks to make
2021-11-01 12:02:18.618 EDT [242925] DEBUG: cleaning up dynamic shared memory control segment with ID 2104592565
2021-11-01 12:02:18.631 EDT [242925] DEBUG: proc_exit(0): 2 callbacks to make
2021-11-01 12:02:18.631 EDT [242925] LOG: database system is shut down
2021-11-01 12:02:18.631 EDT [242925] DEBUG: exit(0)
2021-11-01 12:02:18.631 EDT [242925] DEBUG: shmem_exit(-1): 0 before_shmem_exit callbacks to make
2021-11-01 12:02:18.631 EDT [242925] DEBUG: shmem_exit(-1): 0 on_shmem_exit callbacks to make
2021-11-01 12:02:18.631 EDT [242925] DEBUG: proc_exit(-1): 0 callbacks to make
2021-11-01 12:02:18.633 EDT [242927] DEBUG: logger shutting down
2021-11-01 12:02:18.633 EDT [242927] DEBUG: shmem_exit(0): 0 before_shmem_exit callbacks to make
2021-11-01 12:02:18.633 EDT [242927] DEBUG: shmem_exit(0): 0 on_shmem_exit callbacks to make
2021-11-01 12:02:18.633 EDT [242927] DEBUG: proc_exit(0): 0 callbacks to make
2021-11-01 12:02:18.633 EDT [242927] DEBUG: exit(0)
2021-11-01 12:02:18.633 EDT [242927] DEBUG: shmem_exit(-1): 0 before_shmem_exit callbacks to make
2021-11-01 12:02:18.633 EDT [242927] DEBUG: shmem_exit(-1): 0 on_shmem_exit callbacks to make
2021-11-01 12:02:18.633 EDT [242927] DEBUG: proc_exit(-1): 0 callbacks to make
如果我停止 Patroni,服务器将开始正常运行并按预期加入复制。用户 postgres 可以使用 psql 实用程序甚至使用 pgadmin 进行本地或远程连接。
有人可以解释这里发生了什么吗?我仍在学习 PGSQL,但我对这些日志还不是很满意……谢谢!
有关更多上下文,您还可以查看我发布的 Patroni 问题#2104 。
OK 终于找到原因了!在 strace 上使用 strace 我发现了以下循环:
原来,postgres 用户的错误主体仍然被缓存,并且在 kdestroy -A 重新开始工作之后。