我可以在使用数据库后激活 PITR 吗？

Question

dw8547

Asked: 2019-06-05 04:18:51 +0800 CST2019-06-05 04:18:51 +0800 CST 2019-06-05 04:18:51 +0800 CST

我们如何绕过 PostgreSQL autovacuum 在副本上使用表级锁 ACCESS EXCLUSIVE？

772

我们正在运行：

user@primary/client-n:~$ psql -d database -c "SELECT version();"                                                                   
version
---------------------------------------------------------------------------------------------------------------------------------------------
 PostgreSQL 10.7 (Ubuntu 10.7-1.pgdg16.04+1) on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 5.4.0-6ubuntu1~16.04.11) 5.4.0 20160609, 64-bit
(1 row)

上：

user@primary/client-n:~$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description:    Ubuntu 16.04.6 LTS
Release:        16.04
Codename:       xenial

并设置一个主客户端和两个流复制客户端，配置如下：

user@client-n:~$ psql -d postgres -c "SELECT name, setting FROM pg_settings WHERE name IN ( 'hot_standby', 'hot_standby_feedback', 'max_standby_streaming_delay' );"
           name             | setting 
----------------------------+---------
hot_standby                 | on
hot_standby_feedback        | on
max_standby_streaming_delay | 150000
(3 rows)

我们只有一个数据库（标准数据库除外）和数据库中的一个表。每天大约 3 到 4 次，我们会遇到一种特殊情况下的 autovacuum 情况，文档中将其描述为：

[...]它（autovacuum）不会将空间返回给操作系统，除非在特殊情况下，表末尾的一个或多个页面完全空闲并且可以轻松获得独占表锁

我们正在监视pg_locks并且已经能够观察到 autovacuum 守护进程获取表级锁定ACCESS EXCLUSIVE，这反过来会导致客户端上的大量进程被阻塞，如下面的日志条目所示：

基本的：

...
2019-06-04 05:59:29.154 BST [8998-1] LOG:  automatic vacuum of table "database.schema.table": index scans: 1
...

客户 1：

...
2019-06-04 05:59:03.660 BST [21167-858] [PostgreSQL JDBC Driver@ip_address(port):role@database] | LOG:  process 21167 still waiting for AccessShareLock on relation 16390 of database 16388 after 1000.222 ms
2019-06-04 05:59:03.660 BST [21167-859] [PostgreSQL JDBC Driver@ip_address(port):role@database] | DETAIL:  Process holding the lock: 2741. Wait queue: 21167, 1215, 26415.
2019-06-04 05:59:03.660 BST [21167-860] [PostgreSQL JDBC Driver@ip_address(port):role@database] | STATEMENT:  SELECT ...
2019-06-04 05:59:03.730 BST [1215-51] [PostgreSQL JDBC Driver@ip_address(port):role@database] | LOG:  process 1215 still waiting for AccessShareLock on relation 16390 of database 16388 after 1000.188 ms at character 15
2019-06-04 05:59:03.730 BST [1215-52] [PostgreSQL JDBC Driver@ip_address(port):role@database] | DETAIL:  Process holding the lock: 2741. Wait queue: 21167, 1215, 26415.
2019-06-04 05:59:03.730 BST [1215-53] [PostgreSQL JDBC Driver@ip_address(port):role@database] | STATEMENT:  SELECT ...
...
2019-06-04 05:59:19.975 BST [22242-4569] [PostgreSQL JDBC Driver@ip_address(port):role@database] | LOG:  process 22242 still waiting for AccessShareLock on relation 16390 of database 16388 after 1000.281 ms at character 15
2019-06-04 05:59:19.975 BST [22242-4570] [PostgreSQL JDBC Driver@ip_address(port):role@database] | DETAIL:  Process holding the lock: 2741. Wait queue: 21167, 1215, 26415, 2423, 1289, 24009, 22441, 2640, 1843, 1056, 23336, 28060, 1860, 1134, 19419, 14649, 2721, 29540, 20138, 22242.
2019-06-04 05:59:19.975 BST [22242-4571] [PostgreSQL JDBC Driver@ip_address(port):role@database] | STATEMENT:  SELECT...
...

以及持有锁的过程：

postgres=# SELECT pid, backend_type, wait_event_type, wait_event FROM pg_stat_activity WHERE pid = 2741;
 pid  | backend_type | wait_event_type |   wait_event
------+--------------+-----------------+----------------
 2741 | startup      | Activity        | RecoveryWalAll
(1 row)

客户 2：

...
2019-06-04 06:00:08.964 BST [16153-1] [PostgreSQL JDBC Driver@ip_address(port):role@database] | FATAL:  terminating connection due to conflict with recovery
2019-06-04 06:00:08.964 BST [16153-2] [PostgreSQL JDBC Driver@ip_address(port):role@database] | DETAIL:  User was holding a relation lock for too long.
2019-06-04 06:00:08.964 BST [16153-3] [PostgreSQL JDBC Driver@ip_address(port):role@database] | HINT:  In a moment you should be able to reconnect to the database and repeat your command.
2019-06-04 06:00:09.964 BST [5747-537] [PostgreSQL JDBC Driver@ip_address(port):role@database] | LOG:  process 5747 still waiting for AccessShareLock on relation 16390 of database 16388 after 1000.248 ms
2019-06-04 06:00:09.964 BST [5747-538] [PostgreSQL JDBC Driver@ip_address(port):role@database] | DETAIL:  Process holding the lock: 12709. Wait queue: 5747, 19765, 16036, 14617, 12280, 14513, 14728, 15398, 27611, 14542, 15948, 23398, 5853, 5098, 4324, 10760, 23480, 30192, 15300, 16228.
2019-06-04 06:00:09.964 BST [5747-539] [PostgreSQL JDBC Driver@ip_address(port):role@database] | STATEMENT:  SELECT ...
2019-06-04 06:00:09.975 BST [19765-6847] [PostgreSQL JDBC Driver@ip_address(port):role@database] | LOG:  process 19765 still waiting for AccessShareLock on relation 16390 of database 16388 after 1000.180 ms
2019-06-04 06:00:09.975 BST [19765-6848] [PostgreSQL JDBC Driver@ip_address(port):role@database] | DETAIL:  Process holding the lock: 12709. Wait queue: 5747, 19765, 16036, 14617, 12280, 14513, 14728, 15398, 27611, 14542, 15948, 23398, 5853, 5098, 4324, 10760, 23480, 30192, 15300, 16228.
2019-06-04 06:00:09.975 BST [19765-6849] [PostgreSQL JDBC Driver@ip_address(port):role@database] | STATEMENT:  SELECT ...
...
2019-06-04 06:01:25.487 BST [15873-1] [PostgreSQL JDBC Driver@ip_address(port):role@database] | LOG:  process 15873 still waiting for AccessShareLock on relation 16390 of database 16388 after 1000.218 ms at character 15
2019-06-04 06:01:25.487 BST [15873-2] [PostgreSQL JDBC Driver@ip_address(port):role@database] | DETAIL:  Process holding the lock: 12709. Wait queue: 5747, 19765, 16036, 14617, 12280, 14513, 14728, 15398, 27611, 14542, 15948, 23398, 5853, 5098, 4324, 10760, 23480, 30192, 15300, 16228, 16127, 16285, 15873.
2019-06-04 06:01:25.487 BST [15873-3] [PostgreSQL JDBC Driver@ip_address(port):role@database] | STATEMENT:  SELECT ...
...
2019-06-04 06:01:29.160 BST [16127-6] [PostgreSQL JDBC Driver@ip_address(port):role@database] | LOG:  process 16127 acquired AccessShareLock on relation 16390 of database 16388 after 8560.748 ms at character 15
2019-06-04 06:01:29.160 BST [16127-7] [PostgreSQL JDBC Driver@ip_address(port):role@database] | STATEMENT:  SELECT ...
...

再次持有锁的进程：

postgres=# SELECT pid, backend_type, wait_event_type, wait_event FROM pg_stat_activity WHERE pid = 2741;
 pid  | backend_type | wait_event_type |   wait_event
------+--------------+-----------------+----------------
12709 | startup      | Activity        | RecoveryWalAll
(1 row)

客户端上被阻止的查询导致 API 延迟在 10 到 20 秒之间，有时还会出现大量 5xx 响应。Out SRE 团队的任务是降低这些事件期间的 API 延迟，我们正在寻找解决这个问题的方法，我们认为这是一个非常小众的情况。我们目前正在recovery_min_apply_delay = 120s客户端 1 上进行试验（因此后面的日志条目），以便两个客户端不会同时锁定。这在一定程度上减少了错误响应的数量并稍微降低了延迟峰值。我们不确定如何完全解决这个问题，事实上，是否有可能这样做。我们将不胜感激您的建议。我们找到了这个相关的帖子，但遗憾的是，它也没有得到解决。

1 个回答

Voted

Laurenz Albe · Answer 1 · 2019-06-05T05:34:11+08:00

Best Answer

Laurenz Albe

2019-06-05T05:34:11+08:002019-06-05T05:34:11+08:00

VACUUM避免截断和它所需的独占锁的一种未记录的方法是设置old_snapshot_threshold为 -1 以外的值。这反过来将摆脱锁和由此产生的恢复冲突。

从 PostgreSQL v12 开始，您可以使用更好的技术来禁用VACUUM单个表的截断：

ALTER TABLE mytab
   SET (vacuum_truncate = on,
        toast.vacuum_truncate = on);

2

我们如何绕过 PostgreSQL autovacuum 在副本上使用表级锁 ACCESS EXCLUSIVE？

连接到 PostgreSQL 服务器：致命：主机没有 pg_hba.conf 条目

如何让sqlplus的输出出现在一行中？

选择具有最大日期或最晚日期的日期

如何列出 PostgreSQL 中的所有模式？

列出指定表的所有列

如何在不修改我自己的 tnsnames.ora 的情况下使用 sqlplus 连接到位于另一台主机上的 Oracle 数据库

你如何mysqldump特定的表？

使用 psql 列出数据库权限

如何从 PostgreSQL 中的选择查询中将值插入表中？

如何使用 psql 列出所有数据库和表？

我们如何绕过 PostgreSQL autovacuum 在副本上使用表级锁 ACCESS EXCLUSIVE？

1 个回答

相关问题