我可以在使用数据库后激活 PITR 吗？

Question

Aldar

Asked: 2021-07-30 08:27:21 +0800 CST2021-07-30 08:27:21 +0800 CST 2021-07-30 08:27:21 +0800 CST

PostgreSQL 11：由于管理员命令而终止 walreceiver 进程

772

我的 PostgreSQL 11 服务器之一抛出了一个奇怪的错误，并在基本备份后拒绝正确启动流复制：

致命：由于管理员命令而终止 walreceiver 进程

我尝试启用最高的日志详细度（Debug5），但这并没有提供更多关于它为什么一直死的洞察力：

2021-07-29 15:33:30.862 UTC [39315] DEBUG:  postgres: PostmasterMain: initial environment dump:
2021-07-29 15:33:30.862 UTC [39315] DEBUG:  -----------------------------------------
2021-07-29 15:33:30.862 UTC [39315] DEBUG:      PG_OOM_ADJUST_FILE=/proc/self/oom_score_adj
2021-07-29 15:33:30.862 UTC [39315] DEBUG:      PG_GRANDPARENT_PID=39310
2021-07-29 15:33:30.862 UTC [39315] DEBUG:      PGLOCALEDIR=/usr/share/locale
2021-07-29 15:33:30.862 UTC [39315] DEBUG:      PGSYSCONFDIR=/etc/postgresql-common
2021-07-29 15:33:30.862 UTC [39315] DEBUG:      LANG=en_US.UTF-8
2021-07-29 15:33:30.862 UTC [39315] DEBUG:      PWD=/
2021-07-29 15:33:30.863 UTC [39315] DEBUG:      PGDATA=/var/lib/postgresql/11/replica
2021-07-29 15:33:30.863 UTC [39315] DEBUG:      LC_COLLATE=en_US.UTF-8
2021-07-29 15:33:30.863 UTC [39315] DEBUG:      LC_CTYPE=en_US.UTF-8
2021-07-29 15:33:30.863 UTC [39315] DEBUG:      LC_MESSAGES=en_US.UTF-8
2021-07-29 15:33:30.863 UTC [39315] DEBUG:      LC_MONETARY=C
2021-07-29 15:33:30.863 UTC [39315] DEBUG:      LC_NUMERIC=C
2021-07-29 15:33:30.863 UTC [39315] DEBUG:      LC_TIME=C
2021-07-29 15:33:30.863 UTC [39315] DEBUG:  -----------------------------------------
2021-07-29 15:33:30.867 UTC [39315] DEBUG:  registering background worker "logical replication launcher"
2021-07-29 15:33:30.868 UTC [39315] LOG:  listening on IPv6 address "::1", port 5433
2021-07-29 15:33:30.868 UTC [39315] LOG:  listening on IPv4 address "127.0.0.1", port 5433
2021-07-29 15:33:30.868 UTC [39315] LOG:  listening on IPv4 address "*snip*", port 5433
2021-07-29 15:33:30.868 UTC [39315] LOG:  listening on Unix socket "/var/run/postgresql/.s.PGSQL.5433"
2021-07-29 15:33:30.869 UTC [39315] DEBUG:  invoking IpcMemoryCreate(size=156459008)
2021-07-29 15:33:30.869 UTC [39315] DEBUG:  mmap(157286400) with MAP_HUGETLB failed, huge pages disabled: Cannot allocate memory
2021-07-29 15:33:30.885 UTC [39315] DEBUG:  SlruScanDirectory invoking callback on pg_notify/0000
2021-07-29 15:33:30.885 UTC [39315] DEBUG:  removing file "pg_notify/0000"
2021-07-29 15:33:30.885 UTC [39315] DEBUG:  dynamic shared memory system will support 588 segments
2021-07-29 15:33:30.885 UTC [39315] DEBUG:  created dynamic shared memory control segment 1597797971 (14128 bytes)
2021-07-29 15:33:30.887 UTC [39315] DEBUG:  max_safe_fds = 982, usable_fds = 1000, already_open = 8
2021-07-29 15:33:30.889 UTC [39316] LOG:  database system was shut down in recovery at 2021-07-29 15:31:23 UTC
2021-07-29 15:33:30.890 UTC [39316] DEBUG:  standby_mode = 'on'
2021-07-29 15:33:30.890 UTC [39316] DEBUG:  primary_conninfo = '*snip*'
2021-07-29 15:33:30.890 UTC [39316] DEBUG:  recovery_target_timeline = latest
2021-07-29 15:33:30.890 UTC [39316] LOG:  entering standby mode
2021-07-29 15:33:30.890 UTC [39316] LOG:  invalid resource manager ID 172 at 235/450000D0
2021-07-29 15:33:30.890 UTC [39316] DEBUG:  switched WAL source from archive to stream after failure
2021-07-29 15:33:30.891 UTC [39317] DEBUG:  find_in_dynamic_libpath: trying "/usr/lib/postgresql/11/lib/libpqwalreceiver"
2021-07-29 15:33:30.905 UTC [39317] DEBUG:  find_in_dynamic_libpath: trying "/usr/lib/postgresql/11/lib/libpqwalreceiver.so"
2021-07-29 15:33:30.918 UTC [39317] LOG:  started streaming WAL from primary at 235/45000000 on timeline 1
2021-07-29 15:33:30.920 UTC [39317] DEBUG:  sendtime 2021-07-29 15:33:30.917704+00 receipttime 2021-07-29 15:33:30.920718+00 replication apply delay (N/A) transfer latency 3 ms
2021-07-29 15:33:30.920 UTC [39317] DEBUG:  sending write 235/45020000 flush 0/0 apply 0/0
2021-07-29 15:33:30.921 UTC [39317] DEBUG:  sending write 235/45020000 flush 235/45020000 apply 0/0
2021-07-29 15:33:30.921 UTC [39316] LOG:  invalid resource manager ID 172 at 235/450000D0
2021-07-29 15:33:30.921 UTC [39317] FATAL:  terminating walreceiver process due to administrator command
2021-07-29 15:33:30.921 UTC [39317] DEBUG:  shmem_exit(1): 1 before_shmem_exit callbacks to make
2021-07-29 15:33:30.921 UTC [39317] DEBUG:  shmem_exit(1): 5 on_shmem_exit callbacks to make
2021-07-29 15:33:30.921 UTC [39317] DEBUG:  proc_exit(1): 2 callbacks to make
2021-07-29 15:33:30.921 UTC [39317] DEBUG:  exit(1)
2021-07-29 15:33:30.921 UTC [39316] DEBUG:  switched WAL source from stream to archive after failure

唯一值得关注的是 LOG 级别的消息，例如

日志：235/450000D0 处的资源管理器 ID 172 无效

然而，事实证明，这些只是 Postgres 的说法“到达有效 WAL 结构的末尾”，并且可以安全地忽略 LOG 级别的消息。

我尝试删除现有的 WAL 日志（来自datadir/pg_wal/），认为可能是文件损坏，服务器仍然不会启动复制。唯一的解决方案是制作一个全新的基本备份。

我的问题 - 当 Postgres 的terminating xyz process due to administrator command一个进程以非标准方式死亡时，Postgres 是否会给出默认消息？

在这种情况下会有更多的调试选项吗？即使是最高的日志记录详细程度也没有提供太多有用的信息。

1 个回答

Voted

jjanes · Answer 1 · 2021-07-30T09:22:50+08:00

主服务器和副本服务器上的完整版 PostgreSQL 是什么？

这看起来可能是损坏了，您可以做的就是进行新的基本备份（并调查您的硬件，看看您是否可以找出损坏的原因）。

我假设您从副本中删除了 WAL，而不是从主服务器中删除。所以它会重新获取相同的 WAL 文件，并发现它们仍然是损坏的。不一定是那样，但大概是这样，否则你不会问这个问题。如果损坏是传输中的文件发生的网络故障，那么新副本可能是好的，但显然损坏存在于主服务器上的原始 WAL 文件中，因此无论您阅读多少次，它仍然是损坏的。此外，如果损坏只是在将原始 WAL 文件写入磁盘时发生在磁盘上，那么存档可能不会损坏。如果将 WAL 文件复制到存档的存档命令将数据从 RAM 中的文件系统缓存中取出，则副本不会损坏。所以要么损坏发生在 RAM 本身，或者存档命令必须从磁盘读取（现在已损坏的）数据才能将其复制到存档中。或者它发生得更早，就像软件错误一样。

没有额外的信息。系统无法弄清楚它在看什么，所以它所能做的就是把乱七八糟的东西吐出来让我们去推测。但是，是的，它确实命令 wal 接收器退出，让它继续运行是没有意义的，因为它无法重播超过损坏点。所以它正在切换回 WAL 存档，以查看文件是否有损坏。

PostgreSQL 11：由于管理员命令而终止 walreceiver 进程

连接到 PostgreSQL 服务器：致命：主机没有 pg_hba.conf 条目

如何让sqlplus的输出出现在一行中？

选择具有最大日期或最晚日期的日期

如何列出 PostgreSQL 中的所有模式？

列出指定表的所有列

如何在不修改我自己的 tnsnames.ora 的情况下使用 sqlplus 连接到位于另一台主机上的 Oracle 数据库

你如何mysqldump特定的表？

使用 psql 列出数据库权限

如何从 PostgreSQL 中的选择查询中将值插入表中？

如何使用 psql 列出所有数据库和表？

PostgreSQL 11：由于管理员命令而终止 walreceiver 进程

1 个回答

相关问题