我有一个项目,我正在尝试使用 PostgreSQLON CONFLICT DO UPDATE
子句,但我遇到了大量的死锁问题。
我的架构如下:
webarchive=# \d web_pages
Table "public.web_pages"
Column | Type | Modifiers
-------------------+-----------------------------+---------------------------------------------------------------------
id | integer | not null default nextval('web_pages_id_seq'::regclass)
state | dlstate_enum | not null
errno | integer |
url | text | not null
starturl | text | not null
netloc | text | not null
file | integer |
priority | integer | not null
distance | integer | not null
is_text | boolean |
limit_netloc | boolean |
title | citext |
mimetype | text |
type | itemtype_enum |
content | text |
fetchtime | timestamp without time zone |
addtime | timestamp without time zone |
tsv_content | tsvector |
normal_fetch_mode | boolean | default true
ignoreuntiltime | timestamp without time zone | not null default '1970-01-01 00:00:00'::timestamp without time zone
Indexes:
"web_pages_pkey" PRIMARY KEY, btree (id)
"ix_web_pages_url" UNIQUE, btree (url)
"idx_web_pages_title" gin (to_tsvector('english'::regconfig, title::text))
"ix_web_pages_distance" btree (distance)
"ix_web_pages_distance_filtered" btree (priority) WHERE state = 'new'::dlstate_enum AND distance < 1000000 AND normal_fetch_mode = true
"ix_web_pages_id" btree (id)
"ix_web_pages_netloc" btree (netloc)
"ix_web_pages_priority" btree (priority)
"ix_web_pages_state" btree (state)
"ix_web_pages_url_ops" btree (url text_pattern_ops)
"web_pages_state_netloc_idx" btree (state, netloc)
Foreign-key constraints:
"web_pages_file_fkey" FOREIGN KEY (file) REFERENCES web_files(id)
Triggers:
update_row_count_trigger BEFORE INSERT OR UPDATE ON web_pages FOR EACH ROW EXECUTE PROCEDURE web_pages_content_update_func()
我的更新命令如下:
INSERT INTO
web_pages
(url, starturl, netloc, distance, is_text, priority, type, fetchtime, state)
VALUES
(:url, :starturl, :netloc, :distance, :is_text, :priority, :type, :fetchtime, :state)
ON CONFLICT (url) DO
UPDATE
SET
state = EXCLUDED.state,
starturl = EXCLUDED.starturl,
netloc = EXCLUDED.netloc,
is_text = EXCLUDED.is_text,
distance = EXCLUDED.distance,
priority = EXCLUDED.priority,
fetchtime = EXCLUDED.fetchtime
WHERE
web_pages.fetchtime < :threshtime
AND
web_pages.url = EXCLUDED.url
;
(注意:参数通过SQLAlchemy
参数化查询样式进行转义)
我看到了几十个死锁错误,即使在相对较轻的并发下(6 个工作人员):
Main.SiteArchiver.Process-5.MainThread - WARNING - SQLAlchemy OperationalError - Retrying.
Traceback (most recent call last):
File "/media/Storage/Scripts/ReadableWebProxy/flask/lib/python3.4/site-packages/sqlalchemy/engine/base.py", line 1139, in _execute_context
context)
File "/media/Storage/Scripts/ReadableWebProxy/flask/lib/python3.4/site-packages/sqlalchemy/engine/default.py", line 450, in do_execute
cursor.execute(statement, parameters)
psycopg2.extensions.TransactionRollbackError: deadlock detected
DETAIL: Process 11391 waits for ShareLock on transaction 40632808; blocked by process 11389.
Process 11389 waits for ShareLock on transaction 40632662; blocked by process 11391.
HINT: See server log for query details.
CONTEXT: while inserting index tuple (743427,2) in relation "web_pages"
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/media/Storage/Scripts/ReadableWebProxy/WebMirror/Engine.py", line 558, in upsertResponseLinks
self.db_sess.execute(cmd, params=new)
File "/media/Storage/Scripts/ReadableWebProxy/flask/lib/python3.4/site-packages/sqlalchemy/orm/session.py", line 1034, in execute
bind, close_with_result=True).execute(clause, params or {})
File "/media/Storage/Scripts/ReadableWebProxy/flask/lib/python3.4/site-packages/sqlalchemy/engine/base.py", line 914, in execute
return meth(self, multiparams, params)
File "/media/Storage/Scripts/ReadableWebProxy/flask/lib/python3.4/site-packages/sqlalchemy/sql/elements.py", line 323, in _execute_on_connection
return connection._execute_clauseelement(self, multiparams, params)
File "/media/Storage/Scripts/ReadableWebProxy/flask/lib/python3.4/site-packages/sqlalchemy/engine/base.py", line 1010, in _execute_clauseelement
compiled_sql, distilled_params
File "/media/Storage/Scripts/ReadableWebProxy/flask/lib/python3.4/site-packages/sqlalchemy/engine/base.py", line 1146, in _execute_context
context)
File "/media/Storage/Scripts/ReadableWebProxy/flask/lib/python3.4/site-packages/sqlalchemy/engine/base.py", line 1341, in _handle_dbapi_exception
exc_info
File "/media/Storage/Scripts/ReadableWebProxy/flask/lib/python3.4/site-packages/sqlalchemy/util/compat.py", line 200, in raise_from_cause
reraise(type(exception), exception, tb=exc_tb, cause=cause)
File "/media/Storage/Scripts/ReadableWebProxy/flask/lib/python3.4/site-packages/sqlalchemy/util/compat.py", line 183, in reraise
raise value.with_traceback(tb)
File "/media/Storage/Scripts/ReadableWebProxy/flask/lib/python3.4/site-packages/sqlalchemy/engine/base.py", line 1139, in _execute_context
context)
File "/media/Storage/Scripts/ReadableWebProxy/flask/lib/python3.4/site-packages/sqlalchemy/engine/default.py", line 450, in do_execute
cursor.execute(statement, parameters)
sqlalchemy.exc.OperationalError: (psycopg2.extensions.TransactionRollbackError) deadlock detected
DETAIL: Process 11391 waits for ShareLock on transaction 40632808; blocked by process 11389.
Process 11389 waits for ShareLock on transaction 40632662; blocked by process 11391.
HINT: See server log for query details.
CONTEXT: while inserting index tuple (743427,2) in relation "web_pages"
[SQL: ' INSERT INTO web_pages (url, starturl, netloc, distance, is_text, priority, type, fetchtime, state) VALUES (%(url)s, %(starturl)s, %(netloc)s, %(distance)s, %(is_text)s, %(priority)s, %(type)s, %(fetchtime)s, %(state)s) ON CONFLICT (url) DO UPDATE SET state = EXCLUDED.state, starturl = EXCLUDED.starturl, netloc = EXCLUDED.netloc, is_text = EXCLUDED.is_text, distance = EXCLUDED.distance, priority = EXCLUDED.priority, fetchtime = EXCLUDED.fetchtime WHERE web_pages.fetchtime < %(threshtime)s ; '] [parameters: {'url': 'xxxxxx', 'is_text': True, 'netloc': 'xxxxxx', 'distance': 1000000, 'priority': 10000, 'threshtime': datetime.datetime(2016, 4, 24, 0, 38, 10, 778866), 'state': 'new', 'starturl': 'xxxxxxx', 'type': 'unknown', 'fetchtime': datetime.datetime(2016, 4, 24, 0, 38, 10, 778934)}]
我的事务隔离级别是REPEATABLE READ
,所以我对数据库应该如何工作的理解是,我会看到很多序列化错误,但不应该发生死锁,因为如果两个事务更改同一行,后面的事务应该会失败。
我的猜测是 UPDATE 以某种方式锁定了 INSERT 查询(或类似的东西),我需要在某处放置一个同步点(?),但我不太了解各种查询组件的范围进行任何故障排除,然后只是随机更改内容并查看效果。我已经阅读了一些资料,但是 PostgreSQL 文档非常抽象,而且ON CONFLICT xxx
术语似乎还没有被广泛使用,因此没有那么多资源可用于实际故障排除,尤其是对于非 SQL 专家。
我该如何尝试解决这个问题?我还尝试了其他隔离级别(READ COMMITTED
, SERIALIZABLE
)但无济于事。
死锁不是由特定语句引起的。它是由并发问题引起的。所以基本上,您应该开始观察您的应用程序的一个会话如何处理同时工作的其他会话。
以下是避免死锁的一般准则:
始终维护表上的主键。这个主键应该是识别表中特定记录的方法。这将避免太多行进入锁定范围。
在所有交易中保持秩序。例如,您的一个应用程序逻辑在表 A 然后表 B 中插入/更新数据。不应该有另一个逻辑在表 B 然后表 A 中插入/更新数据。
监视并抓住罪魁祸首。PostgreSQL 提供 pg_stat_activty 和 pg_stat_statements 之类的视图来监控会话和查询。这是一些示例查询,您可以使用它们来监控阻塞/死锁。https://wiki.postgresql.org/wiki/Lock_Monitoring 您可能需要调整 log_lock_waits 和 deadlock_timeout 参数。
在事务中首先获取最严格的锁。所以较小的不会出现。
最后但最不重要的是,减少事务的大小。更频繁地提交。长时间运行的事务更有可能陷入死锁。此外,由于 MVCC 在 Postgres 中实现的方式,postgres 中的长事务包含更多数量的活动元组。