我的 PostgreSQL 11 服务器之一抛出了一个奇怪的错误,并在基本备份后拒绝正确启动流复制:
致命:由于管理员命令而终止 walreceiver 进程
我尝试启用最高的日志详细度(Debug5),但这并没有提供更多关于它为什么一直死的洞察力:
2021-07-29 15:33:30.862 UTC [39315] DEBUG: postgres: PostmasterMain: initial environment dump:
2021-07-29 15:33:30.862 UTC [39315] DEBUG: -----------------------------------------
2021-07-29 15:33:30.862 UTC [39315] DEBUG: PG_OOM_ADJUST_FILE=/proc/self/oom_score_adj
2021-07-29 15:33:30.862 UTC [39315] DEBUG: PG_GRANDPARENT_PID=39310
2021-07-29 15:33:30.862 UTC [39315] DEBUG: PGLOCALEDIR=/usr/share/locale
2021-07-29 15:33:30.862 UTC [39315] DEBUG: PGSYSCONFDIR=/etc/postgresql-common
2021-07-29 15:33:30.862 UTC [39315] DEBUG: LANG=en_US.UTF-8
2021-07-29 15:33:30.862 UTC [39315] DEBUG: PWD=/
2021-07-29 15:33:30.863 UTC [39315] DEBUG: PGDATA=/var/lib/postgresql/11/replica
2021-07-29 15:33:30.863 UTC [39315] DEBUG: LC_COLLATE=en_US.UTF-8
2021-07-29 15:33:30.863 UTC [39315] DEBUG: LC_CTYPE=en_US.UTF-8
2021-07-29 15:33:30.863 UTC [39315] DEBUG: LC_MESSAGES=en_US.UTF-8
2021-07-29 15:33:30.863 UTC [39315] DEBUG: LC_MONETARY=C
2021-07-29 15:33:30.863 UTC [39315] DEBUG: LC_NUMERIC=C
2021-07-29 15:33:30.863 UTC [39315] DEBUG: LC_TIME=C
2021-07-29 15:33:30.863 UTC [39315] DEBUG: -----------------------------------------
2021-07-29 15:33:30.867 UTC [39315] DEBUG: registering background worker "logical replication launcher"
2021-07-29 15:33:30.868 UTC [39315] LOG: listening on IPv6 address "::1", port 5433
2021-07-29 15:33:30.868 UTC [39315] LOG: listening on IPv4 address "127.0.0.1", port 5433
2021-07-29 15:33:30.868 UTC [39315] LOG: listening on IPv4 address "*snip*", port 5433
2021-07-29 15:33:30.868 UTC [39315] LOG: listening on Unix socket "/var/run/postgresql/.s.PGSQL.5433"
2021-07-29 15:33:30.869 UTC [39315] DEBUG: invoking IpcMemoryCreate(size=156459008)
2021-07-29 15:33:30.869 UTC [39315] DEBUG: mmap(157286400) with MAP_HUGETLB failed, huge pages disabled: Cannot allocate memory
2021-07-29 15:33:30.885 UTC [39315] DEBUG: SlruScanDirectory invoking callback on pg_notify/0000
2021-07-29 15:33:30.885 UTC [39315] DEBUG: removing file "pg_notify/0000"
2021-07-29 15:33:30.885 UTC [39315] DEBUG: dynamic shared memory system will support 588 segments
2021-07-29 15:33:30.885 UTC [39315] DEBUG: created dynamic shared memory control segment 1597797971 (14128 bytes)
2021-07-29 15:33:30.887 UTC [39315] DEBUG: max_safe_fds = 982, usable_fds = 1000, already_open = 8
2021-07-29 15:33:30.889 UTC [39316] LOG: database system was shut down in recovery at 2021-07-29 15:31:23 UTC
2021-07-29 15:33:30.890 UTC [39316] DEBUG: standby_mode = 'on'
2021-07-29 15:33:30.890 UTC [39316] DEBUG: primary_conninfo = '*snip*'
2021-07-29 15:33:30.890 UTC [39316] DEBUG: recovery_target_timeline = latest
2021-07-29 15:33:30.890 UTC [39316] LOG: entering standby mode
2021-07-29 15:33:30.890 UTC [39316] LOG: invalid resource manager ID 172 at 235/450000D0
2021-07-29 15:33:30.890 UTC [39316] DEBUG: switched WAL source from archive to stream after failure
2021-07-29 15:33:30.891 UTC [39317] DEBUG: find_in_dynamic_libpath: trying "/usr/lib/postgresql/11/lib/libpqwalreceiver"
2021-07-29 15:33:30.905 UTC [39317] DEBUG: find_in_dynamic_libpath: trying "/usr/lib/postgresql/11/lib/libpqwalreceiver.so"
2021-07-29 15:33:30.918 UTC [39317] LOG: started streaming WAL from primary at 235/45000000 on timeline 1
2021-07-29 15:33:30.920 UTC [39317] DEBUG: sendtime 2021-07-29 15:33:30.917704+00 receipttime 2021-07-29 15:33:30.920718+00 replication apply delay (N/A) transfer latency 3 ms
2021-07-29 15:33:30.920 UTC [39317] DEBUG: sending write 235/45020000 flush 0/0 apply 0/0
2021-07-29 15:33:30.921 UTC [39317] DEBUG: sending write 235/45020000 flush 235/45020000 apply 0/0
2021-07-29 15:33:30.921 UTC [39316] LOG: invalid resource manager ID 172 at 235/450000D0
2021-07-29 15:33:30.921 UTC [39317] FATAL: terminating walreceiver process due to administrator command
2021-07-29 15:33:30.921 UTC [39317] DEBUG: shmem_exit(1): 1 before_shmem_exit callbacks to make
2021-07-29 15:33:30.921 UTC [39317] DEBUG: shmem_exit(1): 5 on_shmem_exit callbacks to make
2021-07-29 15:33:30.921 UTC [39317] DEBUG: proc_exit(1): 2 callbacks to make
2021-07-29 15:33:30.921 UTC [39317] DEBUG: exit(1)
2021-07-29 15:33:30.921 UTC [39316] DEBUG: switched WAL source from stream to archive after failure
唯一值得关注的是 LOG 级别的消息,例如
日志:235/450000D0 处的资源管理器 ID 172 无效
然而,事实证明,这些只是 Postgres 的说法“到达有效 WAL 结构的末尾”,并且可以安全地忽略 LOG 级别的消息。
我尝试删除现有的 WAL 日志(来自datadir/pg_wal/
),认为可能是文件损坏,服务器仍然不会启动复制。唯一的解决方案是制作一个全新的基本备份。
我的问题 - 当 Postgres 的terminating xyz process due to administrator command
一个进程以非标准方式死亡时,Postgres 是否会给出默认消息?
在这种情况下会有更多的调试选项吗?即使是最高的日志记录详细程度也没有提供太多有用的信息。