最近在学习postgresql。现在我正在学习如何使用插槽配置 PG 复制。- 所以我正在关注这个示例 - https://girders.org/postgresql/2021/11/05/setup-postgresql14-replication/
但是,当从主端提出新的 tran 时,所有尝试都会失败,并显示“没有这样的文件或目录”。我找不到问题和解决方法。请看一下并提供帮助。
环境 -
VM1 -- 100.70.224.70/23 -- primary side
VM2 -- 100.70.225.241/23 -- replica
PG version -- 14.10
脚步:
- 两边都已经有一个 PG 实例,PGDATA 都是 /var/lib/pgsql/data ,存档文件夹都是 /tmp/pgbak 两者都配置了具有相同配置的存档 --
archive_command = 'test ! -f /tmp/pgbak/%f && cp %p /tmp/pgbak/%f'
archive_timeout = '1min'
archive_mode = 'on'
archive_cleanup_command = 'pg_archivecleanup archivelocation %r'
restore_command = 'cp /tmp/pgbak/%f %p'
- 在副本上,清理 PG 数据文件夹 -- rm -rf /var/lib/pgsql/data/* 并清理存档文件夹 -- rm -rf /tmp/pgbak
- 在主服务器上,编辑 postgresql.conf --
wal_level = replica
max_wal_senders = 10
wal_keep_size = '1GB'
wal_compression = on
- 在主数据库上,创建复制用户 --
createuser -U postgres --replication repl
- 在主服务器上,编辑 pg_hba.conf 添加 2 行——
host all all 0.0.0.0/0 trust
host replication all 0.0.0.0/0 trust
并在主实例上重新启动 PG 实例。然后用 psql 在副本上进行测试,效果很好。
- 在主节点上创建复制槽 --
select * from pg_create_physical_replication_slot('db02_repl_slot');
- 在副本上,进行基本备份 --
pg_basebackup --pgdata /var/lib/pgsql/data --format=p --write-recovery-conf --checkpoint=fast --label=mffb --progress --host=100.70.224.70 -R --username=repl
- 在副本上,将复制槽添加到 postgresql.auto.conf --
primary_conninfo = 'user=repl passfile=''/var/lib/pgsql/.pgpass'' channel_binding=prefer host=100.70.224.70 port=5432 sslmode=prefer sslcompression=0 sslcertmode=allow sslsni=1 ssl_min_protocol_version=TLSv1.2 gssencmode=prefer krbsrvname=postgres gssdelegation=0 target_session_attrs=any load_balance_hosts=disable application_name=db02.repl'
primary_slot_name = 'db02_repl_slot'
- 在副本上启动 PG 实例,并检查主节点上的插槽状态,看起来不错 --
eisendb=# select slot_name, slot_type, active, wal_status from pg_replication_slots;
slot_name | slot_type | active | wal_status
----------------+-----------+--------+------------
db02_repl_slot | physical | t | reserved
- 然后我在主库上进行数据修改测试,然后我发现没有数据传输到副本,并且从副本上的错误日志文件中,我发现了这个错误 -
2024-01-01 12:30:07.066 UTC [4737]CONTEXT: WAL redo at D5/75000060 for Standby/RUNNING_XACTS: nextXid 10361 latestCompletedXid 10360 oldestRunningXid 10361
2024-01-01 12:30:07.066 UTC [4737]DEBUG: executing restore command "cp /tmp/pgbak/00000001000000D500000076 pg_wal/RECOVERYXLOG"
2024-01-01 12:30:07.068 UTC [4741]DEBUG: checkpointer updated shared memory configuration values
cp: cannot stat '/tmp/pgbak/00000001000000D500000076': No such file or directory
2024-01-01 12:30:07.069 UTC [4737]DEBUG: could not restore file "00000001000000D500000076" from archive: child process exited with exit code 1
2024-01-01 12:30:07.069 UTC [4737]DEBUG: prune KnownAssignedXids to 10361
And this is the log records filtered the debug messages --
2024-01-01 13:09:12.622 UTC [8367]LOG: database system was interrupted; last known up at 2024-01-01 13:05:59 UTC
cp: cannot stat '/tmp/pgbak/00000002.history': No such file or directory
2024-01-01 13:09:12.634 UTC [8367]LOG: entering standby mode
cp: cannot stat '/tmp/pgbak/00000001000000D500000082': No such file or directory
2024-01-01 13:09:12.639 UTC [8367]LOG: redo starts at D5/82000028
2024-01-01 13:09:12.640 UTC [8367]LOG: consistent recovery state reached at D5/82000138
2024-01-01 13:09:12.640 UTC [8362]LOG: database system is ready to accept read-only connections
cp: cannot stat '/tmp/pgbak/00000001000000D500000083': No such file or directory
2024-01-01 13:09:12.649 UTC [8374]LOG: started streaming WAL from primary at D5/83000000 on timeline 1
2024-01-01 13:10:28.316 UTC [8367]LOG: recovery stopping before commit of transaction 10363, time 2024-01-01 13:10:28.315075+00
2024-01-01 13:10:28.316 UTC [8367]LOG: pausing at the end of recovery
2024-01-01 13:10:28.316 UTC [8367]HINT: Execute pg_wal_replay_resume() to promote.
看起来副本正在存档文件夹中搜索一些存档的 wal,但没有找到。同时我检查了副本上的 /tmp/pgbak 并发现它也是空的...我不熟悉 PG 复制详细信息,所以我在想我的配置中是否有任何错误,以便主服务器上的存档 wal 可以不被复制到副本吗?如果是的话,请纠正我。提前致谢。