我可以在使用数据库后激活 PITR 吗？

Question

Jack Douglas

Asked: 2015-07-28 23:27:52 +0800 CST2015-07-28 23:27:52 +0800 CST 2015-07-28 23:27:52 +0800 CST

同时调用同一个函数：死锁是如何发生的？

772

new_customerWeb 应用程序每秒调用我的函数数次（但每个会话仅调用一次）。它所做的第一件事就是锁定customer表（执行“如果不存在则插入”- 的简单变体upsert）。

我对文档的理解是，其他调用new_customer应该简单地排队，直到所有先前的调用都完成：

LOCK TABLE 获得表级锁，如有必要，等待释放任何冲突的锁。

为什么有时会陷入僵局？

定义：

create function new_customer(secret bytea) returns integer language sql 
                security definer set search_path = postgres,pg_temp as $$
  lock customer in exclusive mode;
  --
  with w as ( insert into customer(customer_secret,customer_read_secret)
              select secret,decode(md5(encode(secret, 'hex')),'hex') 
              where not exists(select * from customer where customer_secret=secret)
              returning customer_id )
  insert into collection(customer_id) select customer_id from w;
  --
  select customer_id from customer where customer_secret=secret;
$$;

来自日志的错误：

_{2015-07-28 08:02:58 BST 详细信息：进程 12380 等待数据库 12141 的关系 16438 上的 ExclusiveLock；被进程 12379 阻止。
        进程 12379 等待数据库 12141 的关系 16438 上的 ExclusiveLock；被进程 12380 阻止。
        进程 12380：选择 new_customer(decode($1::text, 'hex'))
        进程 12379：选择 new_customer(decode($1::text, 'hex'))
2015-07-28 08:02:58 BST 提示：有关查询详细信息，请参阅服务器日志。
2015-07-28 08:02:58 BST 上下文：SQL 函数“new_customer”语句 1
2015-07-28 08:02:58 BST 声明：选择 new_customer(decode($1::text, 'hex'))}

关系：

postgres=# select relname from pg_class where oid=16438;
┌──────────┐
│ relname  │
├──────────┤
│ customer │
└──────────┘

编辑：

我已经设法获得了一个简单的可重现测试用例。对我来说，由于某种竞争条件，这看起来像是一个错误。

架构：

create table test( id serial primary key, val text );

create function f_test(v text) returns integer language sql security definer set search_path = postgres,pg_temp as $$
  lock test in exclusive mode;
  insert into test(val) select v where not exists(select * from test where val=v);
  select id from test where val=v;
$$;

bash 脚本在两个 bash 会话中同时运行：

for i in {1..1000}; do psql postgres postgres -c "select f_test('blah')"; done

错误日志（通常是 1000 次调用的少数死锁）：

2015-07-28 16:46:19 BST ERROR:  deadlock detected
2015-07-28 16:46:19 BST DETAIL:  Process 9394 waits for ExclusiveLock on relation 65605 of database 12141; blocked by process 9393.
        Process 9393 waits for ExclusiveLock on relation 65605 of database 12141; blocked by process 9394.
        Process 9394: select f_test('blah')
        Process 9393: select f_test('blah')
2015-07-28 16:46:19 BST HINT:  See server log for query details.
2015-07-28 16:46:19 BST CONTEXT:  SQL function "f_test" statement 1
2015-07-28 16:46:19 BST STATEMENT:  select f_test('blah')

编辑2：

@ypercube建议使用函数外部的变体lock table：

for i in {1..1000}; do psql postgres postgres -c "begin; lock test in exclusive mode; select f_test('blah'); end"; done

有趣的是，这消除了死锁。

2 个回答

Voted

Jack Douglas · Answer 1 · 2015-07-29T09:56:35+08:00

Best Answer

Jack Douglas

2015-07-29T09:56:35+08:002015-07-29T09:56:35+08:00

我将此发布到 pgsql-bugs， Tom Lane的回复表明这是一个锁升级问题，被 SQL 语言函数处理方式的机制所掩盖。本质上，生成的锁是在对表的排他锁之前insert获得的：

我相信这个问题是一个 SQL 函数会同时对整个函数体进行解析（也许还计划；现在不想检查代码）。这意味着由于 INSERT 命令，您在函数体解析期间在 LOCK 命令实际执行之前在“测试”表上获取 RowExclusiveLock。所以 LOCK 代表锁升级尝试，死锁是可以预料的。

这种编码技术在 plpgsql 中是安全的，但在 SQL 语言函数中不安全。

已经讨论过重新实现 SQL 语言函数，以便一次解析一个语句，但不要对在那个方向发生的事情屏住呼吸；对于任何人来说，这似乎都不是优先考虑的问题。

问候，汤姆莱恩

这也解释了为什么将表锁定在包装 plpgsql 块中的函数外部（如@ypercube 所建议的那样）可以防止死锁。

10

alexk · Answer 2 · 2015-07-29T06:16:13+08:00

假设您在调用 new_customer 之前运行了另一个语句，并且这些语句获得了一个与（基本上是客户表中的任何数据修改）冲突的锁EXCLUSIVE，那么解释非常简单。

可以通过一个简单的示例重现该问题（甚至不包括函数）：

CREATE TABLE test(id INTEGER);

第一次会议：

BEGIN;

INSERT INTO test VALUES(1);

第二届会议

BEGIN;
INSERT INTO test VALUES(1);
LOCK TABLE test IN EXCLUSIVE MODE;

第一次会议

LOCK TABLE test IN EXCLUSIVE MODE;

当第一个会话执行插入时，它会获取ROW EXCLUSIVE表上的锁。同时，会话 2 也尝试获取ROW EXCLUSIVE锁，并尝试获取EXCLUSIVE锁。此时它必须等待第一个会话，因为EXCLUSIVElock 与ROW EXCLUSIVE. 最后，第 1 会话跳过鲨鱼并尝试获取EXCLUSIVE锁，但由于锁是按顺序获取的，因此它在第 2 会话之后排队。反过来，这会等待第一个，从而产生死锁：

DETAIL:  Process 28514 waits for ExclusiveLock on relation 58331454 of database 44697822; blocked by process 28084.
Process 28084 waits for ExclusiveLock on relation 58331454 of database 44697822; blocked by process 28514

这个问题的解决方案是尽早获取锁，通常作为事务中的第一件事。另一方面，PostgreSQL 工作负载只在一些非常罕见的情况下需要锁，所以我建议重新考虑你做 upsert 的方式（看看这篇文章http://www.depesz.com/2012/06/10 /why-is-upsert-so-complicated/）。

同时调用同一个函数：死锁是如何发生的？

编辑：

编辑2：

连接到 PostgreSQL 服务器：致命：主机没有 pg_hba.conf 条目

如何让sqlplus的输出出现在一行中？

选择具有最大日期或最晚日期的日期

如何列出 PostgreSQL 中的所有模式？

列出指定表的所有列

如何在不修改我自己的 tnsnames.ora 的情况下使用 sqlplus 连接到位于另一台主机上的 Oracle 数据库

你如何mysqldump特定的表？

使用 psql 列出数据库权限

如何从 PostgreSQL 中的选择查询中将值插入表中？

如何使用 psql 列出所有数据库和表？

同时调用同一个函数：死锁是如何发生的？

编辑：

编辑2：

2 个回答

相关问题