我在 10.6.21-MariaDB-ubu2004 中遇到了死锁。我的架构如下:
CREATE TABLE INT_CHANNEL_MESSAGE
(
MESSAGE_ID CHAR(36) NOT NULL,
GROUP_KEY CHAR(36) NOT NULL,
CREATED_DATE BIGINT NOT NULL,
MESSAGE_PRIORITY BIGINT,
MESSAGE_SEQUENCE BIGINT NOT NULL AUTO_INCREMENT UNIQUE,
MESSAGE_BYTES BLOB,
REGION VARCHAR(100) NOT NULL,
PRIMARY KEY (REGION, GROUP_KEY, CREATED_DATE, MESSAGE_SEQUENCE)
) ENGINE = InnoDB;
CREATE INDEX INT_CHANNEL_MSG_DELETE_IDX ON INT_CHANNEL_MESSAGE (REGION, GROUP_KEY, MESSAGE_ID);
最初,表中只有一行。假设每行的值都是唯一的、唯一的REGION
并且GROUP_KEY
是固定的。
INSERT
事务#1使用单独的语句插入两行:
INSERT into INT_CHANNEL_MESSAGE(
MESSAGE_ID,
GROUP_KEY,
REGION,
CREATED_DATE,
MESSAGE_PRIORITY,
MESSAGE_BYTES)
values (?, ?, ?, ?, ?, ?)
首先执行插入操作,然后事务挂起。事务 1 的隔离级别为REPEATABLE_READ
(尽管已尝试更改为READ_COMMITED
)。
事务 #2 在事务 #1 的第一次插入操作执行后立即启动(由应用程序触发)。隔离级别设置为READ_COMMITED
。选择初始行进行更新,然后事务在DELETE
调用时挂起:
SELECT INT_CHANNEL_MESSAGE.MESSAGE_ID, INT_CHANNEL_MESSAGE.MESSAGE_BYTES
from INT_CHANNEL_MESSAGE
where INT_CHANNEL_MESSAGE.GROUP_KEY = ? and INT_CHANNEL_MESSAGE.REGION = ?
order by CREATED_DATE, MESSAGE_SEQUENCE LIMIT 1 FOR UPDATE SKIP LOCKED
DELETE from INT_CHANNEL_MESSAGE where MESSAGE_ID=? and GROUP_KEY=? and REGION=?
SHOW ENGINE INNODB STATUS
输出:
| InnoDB | |
=====================================
2025-04-04 16:06:44 0x7fc6241b3700 INNODB MONITOR OUTPUT
=====================================
Per second averages calculated from the last 21 seconds
-----------------
BACKGROUND THREAD
-----------------
srv_master_thread loops: 0 srv_active, 0 srv_shutdown, 2376 srv_idle
srv_master_thread log flush and writes: 2376
----------
SEMAPHORES
----------
------------
TRANSACTIONS
------------
Trx id counter 1646
Purge done for trx's n:o < 1646 undo n:o < 0 state: running
History list length 2
LIST OF TRANSACTIONS FOR EACH SESSION:
---TRANSACTION 1643, ACTIVE 31 sec
2 lock struct(s), heap size 1128, 1 row lock(s), undo log entries 1
MariaDB thread id 76, OS thread handle 140489014748928, query id 8535 172.21.0.1 nbs
---TRANSACTION 1640, ACTIVE 31 sec fetching rows
mysql tables in use 1, locked 1
LOCK WAIT 3 lock struct(s), heap size 1128, 3 row lock(s), undo log entries 1
MariaDB thread id 75, OS thread handle 140488995890944, query id 8537 172.21.0.1 nbs Updating
SET STATEMENT SQL_SELECT_LIMIT=1 FOR DELETE from INT_CHANNEL_MESSAGE where MESSAGE_ID='ce0ce618-2430-0b4c-727b-7250e5388f15' and GROUP_KEY='cb18446f-633c-3a46-b5ac-95ab539126d1' and REGION='DEFAULT'
------- TRX HAS BEEN WAITING 31293172 us FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 55 page no 3 n bits 320 index PRIMARY of table `nbs_biometric`.`INT_CHANNEL_MESSAGE` trx id 1640 lock_mode X locks rec but not gap waiting
Record lock, heap no 3 PHYSICAL RECORD: n_fields 9; compact format; info bits 0
0: len 7; hex 44454641554c54; asc DEFAULT;;
1: len 30; hex 63623138343436662d363333632d336134362d623561632d393561623533; asc cb18446f-633c-3a46-b5ac-95ab53; (total 36 bytes);
2: len 8; hex 80000196018d82b2; asc ;;
3: len 8; hex 8000000000000012; asc ;;
4: len 6; hex 00000000066b; asc k;;
5: len 7; hex bf000001410110; asc A ;;
6: len 30; hex 63623533326436662d343362352d393164352d636561612d623965616434; asc cb532d6f-43b5-91d5-ceaa-b9ead4; (total 36 bytes);
7: SQL NULL;
8: len 30; hex aced0005737200346f72672e737072696e676672616d65776f726b2e6d65; asc sr 4org.springframework.me; (total 1252 bytes);
------------------
---TRANSACTION (0x7fc6388d7180), not started
0 lock struct(s), heap size 1128, 0 row lock(s)
--------
FILE I/O
--------
Pending flushes (fsync) log: 0; buffer pool: 0
166 OS file reads, 332 OS file writes, 594 OS fsyncs
0.00 reads/s, 0 avg bytes/read, 0.00 writes/s, 0.00 fsyncs/s
-------------------------------------
INSERT BUFFER AND ADAPTIVE HASH INDEX
-------------------------------------
Ibuf: size 1, free list len 0, seg size 2, 0 merges
merged operations:
insert 0, delete mark 0, delete 0
discarded operations:
insert 0, delete mark 0, delete 0
0.00 hash searches/s, 0.00 non-hash searches/s
---
LOG
---
Log sequence number 891776
Log flushed up to 891776
Pages flushed up to 42676
Last checkpoint at 42664
0 pending log flushes, 0 pending chkp writes
334 log i/o's done, 0.00 log i/o's/second
----------------------
BUFFER POOL AND MEMORY
----------------------
Total large memory allocated 167772160
Dictionary memory allocated 931712
Buffer pool size 8112
Free buffers 7349
Database pages 763
Old database pages 261
Modified db pages 621
Percent of dirty pages(LRU & free pages): 7.654
Max dirty pages percent: 90.000
Pending reads 0
Pending writes: LRU 0, flush list 0
Pages made young 0, not young 0
0.00 youngs/s, 0.00 non-youngs/s
Pages read 152, created 611, written 0
0.00 reads/s, 0.00 creates/s, 0.00 writes/s
Buffer pool hit rate 1000 / 1000, young-making rate 0 / 1000 not 0 / 1000
Pages read ahead 0.00/s, evicted without access 0.00/s, Random read ahead 0.00/s
LRU len: 763, unzip_LRU len: 0
I/O sum[0]:cur[0], unzip sum[0]:cur[0]
--------------
ROW OPERATIONS
--------------
0 read views open inside InnoDB
Process ID=0, Main thread ID=0, state: sleeping
Number of rows inserted 63, updated 0, deleted 18, read 1182
0.00 inserts/s, 0.00 updates/s, 0.00 deletes/s, 0.00 reads/s
Number of system rows inserted 0, updated 0, deleted 0, read 0
0.00 inserts/s, 0.00 updates/s, 0.00 deletes/s, 0.00 reads/s
----------------------------
END OF INNODB MONITOR OUTPUT
============================
|
当事务 #1 插入单行时,不会发生死锁。我该如何避免这种情况?
我附加了应用程序日志以供更广泛的了解:
2025-04-04 18:06:13,426 [tx1]: TX START
2025-04-04 18:06:13,426 [tx1]: Executing prepared SQL update
2025-04-04 18:06:13,426 [tx1]: Executing prepared SQL statement [INSERT into INT_CHANNEL_MESSAGE(
MESSAGE_ID,
GROUP_KEY,
REGION,
CREATED_DATE,
MESSAGE_PRIORITY,
MESSAGE_BYTES)
values (?, ?, ?, ?, ?, ?)
]
2025-04-04 18:06:13,427 [tx2]: Executing prepared SQL query
2025-04-04 18:06:13,427 [tx2]: Executing prepared SQL statement [ SELECT INT_CHANNEL_MESSAGE.MESSAGE_ID, INT_CHANNEL_MESSAGE.MESSAGE_BYTES
from INT_CHANNEL_MESSAGE
where INT_CHANNEL_MESSAGE.GROUP_KEY = ? and INT_CHANNEL_MESSAGE.REGION = ?
order by CREATED_DATE, MESSAGE_SEQUENCE LIMIT 1 FOR UPDATE SKIP LOCKED]
2025-04-04 18:06:13,429 [tx2]: Executing prepared SQL update
2025-04-04 18:06:13,429 [tx2]: Executing prepared SQL statement [DELETE from INT_CHANNEL_MESSAGE where MESSAGE_ID=? and GROUP_KEY=? and REGION=?]
2025-04-04 18:07:03,430 [tx2]: Error: 1205-HY000: Lock wait timeout exceeded; try restarting transaction
2025-04-04 18:07:03,432 [tx2]: Extracted SQL state class 'HY' from value 'HY000'
2025-04-04 18:07:03,434 [tx1]: Executing prepared SQL update
2025-04-04 18:07:03,434 [tx1]: Executing prepared SQL statement [INSERT into INT_CHANNEL_MESSAGE(
MESSAGE_ID,
GROUP_KEY,
REGION,
CREATED_DATE,
MESSAGE_PRIORITY,
MESSAGE_BYTES)
values (?, ?, ?, ?, ?, ?)
]
2025-04-04 18:07:03,435 [tx2]: Resetting isolation level of JDBC Connection [HikariProxyConnection@1449683964 wrapping org.mariadb.jdbc.Connection@5b1420f9] to 4
2025-04-04 18:07:03,436 [tx1]: TX END
首先,InnoDB 存储引擎实际上没有行级锁定,而是更细粒度的锁定:索引记录锁定。如果表包含任何二级索引(就像本例一样),则在
INSERT
和任何锁定读取(包括作为DELETE
或 的一部分执行的读取UPDATE
)之间更容易陷入死锁。INSERT
会首先对聚集索引(PRIMARY KEY
索引)进行操作,然后将更改传播到二级索引。使用二级索引的锁定读取将首先获取二级索引记录的锁,然后再获取相应聚集索引记录的锁。锁顺序反转很容易造成死锁,因为它是 waits-for 图出现循环的先决条件。在搜索此类早期示例时,我偶然发现了错误报告MDEV-23560。您分享的输出似乎表明 MariaDB 服务器内置的 InnoDB 死锁检测器可能存在问题。根据输出,一个事务已被阻塞 31.3 秒。这仍在默认
innodb_lock_wait_timeout
的 50 秒之内。预期情况是,如果发生死锁,其中一个事务将立即回滚,而不是在锁等待超时后回滚。基本上,导致锁等待超时的唯一原因是,有另一个活动事务持有冲突的锁并且尚未提交(例如,它正在等待来自客户端连接的进一步输入)。如果死锁检测器确实损坏了(而不是通过 禁用innodb_deadlock_detect=OFF
),您能否提交一个包含完整可重现测试用例的 bug?我还想提一下最近引入的参数
innodb_snapshot_isolation
( MDEV-35124 )。在某些情况下,启用该参数可能会将锁等待超时替换为其他错误ER_CHECKREAD
。如果未启用该参数,则实际上是不可重复的,正如Jepsen 在 MySQL 8.0.34 的分析REPEATABLE READ
中所指出的那样。事实证明,这实际上是应用程序级别的死锁。有强有力的证据证明这一点:
SHOW ENGINE INNODB STATUS
没有出现任何僵局然而,tx2 中的 DELETE 操作确实在等待获取锁,而该锁被 tx1 持有。Marko的回答解释了原因。
修复方法是摆脱冗余的应用程序级锁定。