Aldar提出的问题 -dba

Aldar

Asked: 2021-07-30 08:27:21 +0800 CST

PostgreSQL 11：由于管理员命令而终止 walreceiver 进程

我的 PostgreSQL 11 服务器之一抛出了一个奇怪的错误，并在基本备份后拒绝正确启动流复制：

致命：由于管理员命令而终止 walreceiver 进程

我尝试启用最高的日志详细度（Debug5），但这并没有提供更多关于它为什么一直死的洞察力：

2021-07-29 15:33:30.862 UTC [39315] DEBUG:  postgres: PostmasterMain: initial environment dump:
2021-07-29 15:33:30.862 UTC [39315] DEBUG:  -----------------------------------------
2021-07-29 15:33:30.862 UTC [39315] DEBUG:      PG_OOM_ADJUST_FILE=/proc/self/oom_score_adj
2021-07-29 15:33:30.862 UTC [39315] DEBUG:      PG_GRANDPARENT_PID=39310
2021-07-29 15:33:30.862 UTC [39315] DEBUG:      PGLOCALEDIR=/usr/share/locale
2021-07-29 15:33:30.862 UTC [39315] DEBUG:      PGSYSCONFDIR=/etc/postgresql-common
2021-07-29 15:33:30.862 UTC [39315] DEBUG:      LANG=en_US.UTF-8
2021-07-29 15:33:30.862 UTC [39315] DEBUG:      PWD=/
2021-07-29 15:33:30.863 UTC [39315] DEBUG:      PGDATA=/var/lib/postgresql/11/replica
2021-07-29 15:33:30.863 UTC [39315] DEBUG:      LC_COLLATE=en_US.UTF-8
2021-07-29 15:33:30.863 UTC [39315] DEBUG:      LC_CTYPE=en_US.UTF-8
2021-07-29 15:33:30.863 UTC [39315] DEBUG:      LC_MESSAGES=en_US.UTF-8
2021-07-29 15:33:30.863 UTC [39315] DEBUG:      LC_MONETARY=C
2021-07-29 15:33:30.863 UTC [39315] DEBUG:      LC_NUMERIC=C
2021-07-29 15:33:30.863 UTC [39315] DEBUG:      LC_TIME=C
2021-07-29 15:33:30.863 UTC [39315] DEBUG:  -----------------------------------------
2021-07-29 15:33:30.867 UTC [39315] DEBUG:  registering background worker "logical replication launcher"
2021-07-29 15:33:30.868 UTC [39315] LOG:  listening on IPv6 address "::1", port 5433
2021-07-29 15:33:30.868 UTC [39315] LOG:  listening on IPv4 address "127.0.0.1", port 5433
2021-07-29 15:33:30.868 UTC [39315] LOG:  listening on IPv4 address "*snip*", port 5433
2021-07-29 15:33:30.868 UTC [39315] LOG:  listening on Unix socket "/var/run/postgresql/.s.PGSQL.5433"
2021-07-29 15:33:30.869 UTC [39315] DEBUG:  invoking IpcMemoryCreate(size=156459008)
2021-07-29 15:33:30.869 UTC [39315] DEBUG:  mmap(157286400) with MAP_HUGETLB failed, huge pages disabled: Cannot allocate memory
2021-07-29 15:33:30.885 UTC [39315] DEBUG:  SlruScanDirectory invoking callback on pg_notify/0000
2021-07-29 15:33:30.885 UTC [39315] DEBUG:  removing file "pg_notify/0000"
2021-07-29 15:33:30.885 UTC [39315] DEBUG:  dynamic shared memory system will support 588 segments
2021-07-29 15:33:30.885 UTC [39315] DEBUG:  created dynamic shared memory control segment 1597797971 (14128 bytes)
2021-07-29 15:33:30.887 UTC [39315] DEBUG:  max_safe_fds = 982, usable_fds = 1000, already_open = 8
2021-07-29 15:33:30.889 UTC [39316] LOG:  database system was shut down in recovery at 2021-07-29 15:31:23 UTC
2021-07-29 15:33:30.890 UTC [39316] DEBUG:  standby_mode = 'on'
2021-07-29 15:33:30.890 UTC [39316] DEBUG:  primary_conninfo = '*snip*'
2021-07-29 15:33:30.890 UTC [39316] DEBUG:  recovery_target_timeline = latest
2021-07-29 15:33:30.890 UTC [39316] LOG:  entering standby mode
2021-07-29 15:33:30.890 UTC [39316] LOG:  invalid resource manager ID 172 at 235/450000D0
2021-07-29 15:33:30.890 UTC [39316] DEBUG:  switched WAL source from archive to stream after failure
2021-07-29 15:33:30.891 UTC [39317] DEBUG:  find_in_dynamic_libpath: trying "/usr/lib/postgresql/11/lib/libpqwalreceiver"
2021-07-29 15:33:30.905 UTC [39317] DEBUG:  find_in_dynamic_libpath: trying "/usr/lib/postgresql/11/lib/libpqwalreceiver.so"
2021-07-29 15:33:30.918 UTC [39317] LOG:  started streaming WAL from primary at 235/45000000 on timeline 1
2021-07-29 15:33:30.920 UTC [39317] DEBUG:  sendtime 2021-07-29 15:33:30.917704+00 receipttime 2021-07-29 15:33:30.920718+00 replication apply delay (N/A) transfer latency 3 ms
2021-07-29 15:33:30.920 UTC [39317] DEBUG:  sending write 235/45020000 flush 0/0 apply 0/0
2021-07-29 15:33:30.921 UTC [39317] DEBUG:  sending write 235/45020000 flush 235/45020000 apply 0/0
2021-07-29 15:33:30.921 UTC [39316] LOG:  invalid resource manager ID 172 at 235/450000D0
2021-07-29 15:33:30.921 UTC [39317] FATAL:  terminating walreceiver process due to administrator command
2021-07-29 15:33:30.921 UTC [39317] DEBUG:  shmem_exit(1): 1 before_shmem_exit callbacks to make
2021-07-29 15:33:30.921 UTC [39317] DEBUG:  shmem_exit(1): 5 on_shmem_exit callbacks to make
2021-07-29 15:33:30.921 UTC [39317] DEBUG:  proc_exit(1): 2 callbacks to make
2021-07-29 15:33:30.921 UTC [39317] DEBUG:  exit(1)
2021-07-29 15:33:30.921 UTC [39316] DEBUG:  switched WAL source from stream to archive after failure

唯一值得关注的是 LOG 级别的消息，例如

日志：235/450000D0 处的资源管理器 ID 172 无效

然而，事实证明，这些只是 Postgres 的说法“到达有效 WAL 结构的末尾”，并且可以安全地忽略 LOG 级别的消息。

我尝试删除现有的 WAL 日志（来自datadir/pg_wal/），认为可能是文件损坏，服务器仍然不会启动复制。唯一的解决方案是制作一个全新的基本备份。

我的问题 - 当 Postgres 的terminating xyz process due to administrator command一个进程以非标准方式死亡时，Postgres 是否会给出默认消息？

在这种情况下会有更多的调试选项吗？即使是最高的日志记录详细程度也没有提供太多有用的信息。

Aldar

Asked: 2019-09-14 04:27:39 +0800 CST

无法删除孤立的 InnoDB 临时表 - 缺少 .frm 文件

我第二次遇到了一个问题，我们的 MySQL 服务器（MariaDB v10.1）突然抛出错误 - 无法删除临时表：'./database_name/#sql-4593_791'，错误：120。

遵循https://mariadb.com/resources/blog/get-rid-of-orphaned-innodb-temporary-tables-the-right-way/之类的指南不起作用，我仍然收到表不是的错误已知 ( ERROR 1051 (42S02): Unknown table 'database_name.#mysql50#sql-4593_1e9')

这个问题与删除孤立 InnoDB 临时表的其他问题的不同之处在于，数据库服务器似乎已经从磁盘和内存中删除了 .frm 文件（运行lsof | grep sql-4593_1e9仅显示 .ibd 文件打开）

这会是问题的根源吗？如果是这样，有没有办法重新创建一个几乎无法访问的未知结构表的 .frm 文件？

我可以在数据库本身中唯一提到它的是运行：

SELECT * FROM INFORMATION_SCHEMA.INNODB_SYS_TABLES WHERE NAME LIKE '%#sql%';

输出以下内容：

+----------+-----------------------------+------+--------+---------+-------------+------------+---------------+
| TABLE_ID | NAME                        | FLAG | N_COLS | SPACE   | FILE_FORMAT | ROW_FORMAT | ZIP_PAGE_SIZE |
+----------+-----------------------------+------+--------+---------+-------------+------------+---------------+
|  1576128 | database_name/#sql-4593_1e9 |    1 |    118 | 1576114 | Antelope    | Compact    |             0 |
|  1576129 | database_name/#sql-4593_78d |    1 |    118 | 1576115 | Antelope    | Compact    |             0 |
|  1576130 | database_name/#sql-4593_791 |    1 |    118 | 1576116 | Antelope    | Compact    |             0 |
+----------+-----------------------------+------+--------+---------+-------------+------------+---------------+

我不喜欢手动从磁盘中删除数据库文件，因为我不完全确定这不会导致问题。即使表空间 id 冲突的可能性几乎为零。

PostgreSQL 11：由于管理员命令而终止 walreceiver 进程

无法删除孤立的 InnoDB 临时表 - 缺少 .frm 文件

连接到 PostgreSQL 服务器：致命：主机没有 pg_hba.conf 条目

如何让sqlplus的输出出现在一行中？

选择具有最大日期或最晚日期的日期

如何列出 PostgreSQL 中的所有模式？

列出指定表的所有列

如何在不修改我自己的 tnsnames.ora 的情况下使用 sqlplus 连接到位于另一台主机上的 Oracle 数据库

你如何mysqldump特定的表？

使用 psql 列出数据库权限

如何从 PostgreSQL 中的选择查询中将值插入表中？

如何使用 psql 列出所有数据库和表？

Aldar's questions