我可以在使用数据库后激活 PITR 吗？

Question

Steve S

Asked: 2018-04-19 13:22:38 +0800 CST2018-04-19 13:22:38 +0800 CST 2018-04-19 13:22:38 +0800 CST

尝试将 pg_xlog 移动到不同分区后无法重新启动 PostgreSQL

772

我在 CentOS 7 服务器上备份和运行 PostgreSQL 时遇到问题。

一些上下文：在注意到我在服务器的 /root 分区上的存储空间很快耗尽后，我快速挖掘了一下，发现罪魁祸首是 pg_xlog 中的所有文件都没有被删除。由于 /home 有很多可用空间，我选择将 pg_xlog 目录移动到 /home 上的新目录，以便争取足够的时间来解决日志文件未被清除的初始问题。然后，根据我在此处找到的一些建议，我将原始 pg_xlog 设置为指向 /home 分区中新的符号链接。

但这导致 SELinux 出现一些奇怪的错误，即使在 chown-ing 到 postgres:postgres （符号链接和新目录）之后，也不允许 Postgres 从新目录中读取。

由于我并不完全精通 SELinux，因此我决定将日志文件复制回其原始目录并研究物理扩展分区。

但是在重新复制所有文件后，PostgreSQL 将无法启动。错误消息似乎不是很有帮助（至少对我来说不是）。Journalctl -xe 报告：

Apr 19 00:12:29 localhost.localdomain systemd[1]: Unit postgresql.service entered failed state.
Apr 19 00:12:29 localhost.localdomain systemd[1]: postgresql.service failed.
Apr 19 00:12:29 localhost.localdomain polkitd[1105]: Unregistered >Authentication Agent for unix-process:17236:1124444 (system bus name :1.588, object path /org/freedesk
Apr 19 00:12:31 localhost.localdomain abrt-server[17258]: Email address of sender was not specified. Would you like to do so now? If not, 'user@localhost' is to be used
Apr 19 00:12:31 localhost.localdomain abrt-server[17258]: Email address of receiver was not specified. Would you like to do so now? If not, 'root@localhost' is to be us
Apr 19 00:12:31 localhost.localdomain abrt-server[17258]: Sending an e-mail...
Apr 19 00:12:31 localhost.localdomain abrt-server[17258]: Sending a notification email to: root@localhost
Apr 19 00:12:31 localhost.localdomain abrt-server[17258]: Email was sent to: root@localhost
Apr 19 00:12:31 localhost.localdomain postfix/pickup[10305]: AA0676622C: uid=0 from=<user@localhost>
Apr 19 00:12:31 localhost.localdomain postfix/cleanup[17283]: AA0676622C: message-id=<5ad7b4bf.TazSMT+2CbgyTlql%user@localhost>
Apr 19 00:12:31 localhost.localdomain postfix/qmgr[1963]: AA0676622C: from=<[email protected]>, size=45585, nrcpt=1 (queue active)
Apr 19 00:12:31 localhost.localdomain postfix/local[17285]: AA0676622C: to=<[email protected]>, orig_to=<root@localhost>, relay=local, delay=0.05, delays=0.04/
Apr 19 00:12:31 localhost.localdomain postfix/qmgr[1963]: AA0676622C: removed

systemctl status postgresql 说：

● postgresql.service - PostgreSQL 数据库服务器已加载：已加载（/usr/lib/systemd/system/postgresql.service；已启用；供应商预设：已禁用）活动：自 2018 年 4 月 19 日星期四 00 日起失败（结果：退出代码） :12:29 吃；1 分钟 34 秒前进程：17252 ExecStart=/usr/bin/pg_ctl start -D ${PGDATA} -s -o -p ${PGPORT} -w -t 300 (code=exited, status=1/FAILURE) 进程：17243 ExecStartPre=/usr/bin/postgresql-check-db-dir ${PGDATA} (code=exited, status=0/SUCCESS)

4 月 19 日 00:12:28 localhost.localdomain systemd 1：正在启动 PostgreSQL 数据库服务器... 4 月 19 日 00:12:29 localhost.localdomain systemd 1：postgresql.service：控制进程退出，代码=退出状态=1 Apr 19 00 :12:29 localhost.localdomain systemd 1 : 无法启动 PostgreSQL 数据库服务器。Apr 19 00:12:29 localhost.localdomain systemd 1：单元 postgresql.service 进入失败状态。4 月 19 日 00:12:29 localhost.localdomain systemd 1：postgresql.service 失败。

我必须承认我已经束手无策了。任何帮助，即使它只是生成更多有用的错误消息的一种方式，将不胜感激。

编辑：

似乎在 pg_xlog 文件的复制过程中，文件的所有权更改为 root。此后，为了保留权限，我再次使用 rsync 重新复制了这些相同的文件。现在我按照下面评论部分的建议包括 PostgreSQL 日志。PostgreSQL 日志中的错误消息是：

WARNING:  transaction log file "00000001000000470000008D" could not be archived: too many failures
LOG:  archive command failed with exit code 1
DETAIL:  The failed archive command was: false
LOG:  archive command failed with exit code 1
DETAIL:  The failed archive command was: false
LOG:  archive command failed with exit code 1
DETAIL:  The failed archive command was: false
WARNING:  transaction log file "00000001000000470000008D" could not be archived: too many failures
LOG:  database system was interrupted; last known up at 2018-04-18 19:21:53 EAT
PANIC:  could not open file "pg_xlog/000000010000006900000017" (log file 105, segment 23): Permission denied
LOG:  startup process (PID 17256) was terminated by signal 6: Aborted
LOG:  aborting startup due to startup process failure
LOG:  database system was interrupted; last known up at 2018-04-18 19:21:53 EAT
PANIC:  could not open file "pg_xlog/000000010000006900000017" (log file 105, segment 23): Permission denied
LOG:  startup process (PID 20020) was terminated by signal 6: Aborted
LOG:  aborting startup due to startup process failure
LOG:  database system was interrupted; last known up at 2018-04-18 19:21:53 EAT
FATAL:  the database system is starting up
PANIC:  could not open file "pg_xlog/000000010000006900000017" (log file 105, segment 23): Permission denied
LOG:  startup process (PID 25607) was terminated by signal 6: Aborted
LOG:  aborting startup due to startup process failure
LOG:  database system was interrupted; last known up at 2018-04-18 19:21:53 EAT
FATAL:  the database system is starting up
FATAL:  the database system is starting up
LOG:  invalid magic number 0000 in log file 105, segment 23, offset 9617408
LOG:  invalid primary checkpoint record
LOG:  invalid magic number 0000 in log file 105, segment 23, offset 9601024
LOG:  invalid secondary checkpoint record
PANIC:  could not locate a valid checkpoint record
FATAL:  the database system is starting up
LOG:  startup process (PID 28108) was terminated by signal 6: Aborted
LOG:  aborting startup due to startup process failure
LOG:  database system was interrupted; last known up at 2018-04-18 19:21:53 EAT
LOG:  invalid magic number 0000 in log file 105, segment 23, offset 9617408
LOG:  invalid primary checkpoint record
LOG:  invalid magic number 0000 in log file 105, segment 23, offset 9601024
LOG:  invalid secondary checkpoint record
PANIC:  could not locate a valid checkpoint record
LOG:  startup process (PID 28529) was terminated by signal 6: Aborted
LOG:  aborting startup due to startup process failure

2 个回答

Voted

jjanes · Answer 1 · 2018-04-19T18:23:28+08:00

jjanes

2018-04-19T18:23:28+08:002018-04-19T18:23:28+08:00

LOG：日志文件 105、段 23、偏移量 9617408 中的幻数 0000 无效

这看起来很糟糕。当您来回复制 xlog 文件时，它们似乎以某种方式归零。您可能需要聘请专业的服务公司进行 PostgreSQL 数据恢复。您应该备份您仍然可以找到的所有数据，并将其放在无法更改或删除的地方。包括 /root 分区上的两个，以及 /home 上剩下的任何内容，以防在从 /home 复制回 /root 时出现问题。

我相信有问题的 xlog 文件将是 000000xx0000006900000017 （其中 xx 可以是任何十六进制数字）。知道该文件是全为零还是大部分为零会很有趣。

另外，您是否有第一次尝试启动服务器时的服务器日志文件？或者，日志文件中出现的第一个 PANIC？

2

Steve S · Answer 2 · 2018-04-20T09:31:00+08:00

Best Answer

Steve S

2018-04-20T09:31:00+08:002018-04-20T09:31:00+08:00

我最终pg_resetxlog按照这个 StackOverflow帖子的建议求助于。

我应该强调的是，在花费数小时将所有内容备份到外部磁盘之后，我只是作为最后的手段才这样做。我似乎很幸运，因为pg_resetxlog工作完美无缺，并且数据库服务器重新启动并运行良好。根据对该 SO 帖子的评论，使用pg_resetxlog有可能使一切变得更糟。

回顾事情，我不禁觉得如果我使用rsync而不是普通的旧cp并mv移动xlog文件，xlog错误可能已经被阻止了。

1

尝试将 pg_xlog 移动到不同分区后无法重新启动 PostgreSQL

连接到 PostgreSQL 服务器：致命：主机没有 pg_hba.conf 条目

如何让sqlplus的输出出现在一行中？

选择具有最大日期或最晚日期的日期

如何列出 PostgreSQL 中的所有模式？

列出指定表的所有列

如何在不修改我自己的 tnsnames.ora 的情况下使用 sqlplus 连接到位于另一台主机上的 Oracle 数据库

你如何mysqldump特定的表？

使用 psql 列出数据库权限

如何从 PostgreSQL 中的选择查询中将值插入表中？

如何使用 psql 列出所有数据库和表？

尝试将 pg_xlog 移动到不同分区后无法重新启动 PostgreSQL

2 个回答

相关问题