本文档记录工作过程发现的死锁(DeadLock)问题的原因分析和处理方法
死锁日志:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 | ------------------------ LATEST DETECTED DEADLOCK ------------------------ 2020-10-24 15:40:22 0x7fcf7b820700 *** (1) TRANSACTION: TRANSACTION 49624342, ACTIVE 7 sec inserting mysql tables in use 1, locked 1 LOCK WAIT 5 lock struct(s), heap size 1136, 2 row lock(s), undo log entries 5 MySQL thread id 241951, OS thread handle 140529097492224, query id 153688122 10.3.98.155 root update insert into unicom_biz_send_recv_middle (id,order_id,......) values ('7131cadade2e4b45b734a8c8eab4e44a','e30c7f8b104345a18fd5e705936efe36',...... *** (1) WAITING FOR THIS LOCK TO BE GRANTED: RECORD LOCKS space id 24755 page no 38 n bits 264 index idx_order_id of table `sceo_integration`.`unicom_biz_send_recv_middle` trx id 49624342 lock_mode X locks gap before rec insert intention waiting Record lock, heap no 190 PHYSICAL RECORD: n_fields 2; compact format; info bits 32 0: len 6; hex 653331653730; asc e31e70;; 1: len 30; hex 623466633037343535363064343465626263616139393562303036356437; asc b4fc0745560d44ebbcaa995b0065d7; (total 32 bytes); *** (2) TRANSACTION: TRANSACTION 49624312, ACTIVE 7 sec inserting mysql tables in use 1, locked 1 26 lock struct(s), heap size 3520, 20 row lock(s), undo log entries 26 MySQL thread id 241753, OS thread handle 140529107076864, query id 153688869 10.3.98.155 root update insert into unicom_biz_send_recv_middle (id,order_id,......) values ('0ed7678397e94202aa481757e2324d2a','e31e70f4fd1d454ab13e8d3f117656f7',...... *** (2) HOLDS THE LOCK(S): RECORD LOCKS space id 24755 page no 38 n bits 264 index idx_order_id of table `sceo_integration`.`unicom_biz_send_recv_middle` trx id 49624312 lock_mode X locks gap before rec Record lock, heap no 178 PHYSICAL RECORD: n_fields 2; compact format; info bits 0 0: len 6; hex 653336653339; asc e36e39;; 1: len 30; hex 343231353563333038343564346165376163323536653636323736333966; asc 42155c30845d4ae7ac256e6627639f; (total 32 bytes); Record lock, heap no 190 PHYSICAL RECORD: n_fields 2; compact format; info bits 32 0: len 6; hex 653331653730; asc e31e70;; 1: len 30; hex 623466633037343535363064343465626263616139393562303036356437; asc b4fc0745560d44ebbcaa995b0065d7; (total 32 bytes); *** (2) WAITING FOR THIS LOCK TO BE GRANTED: RECORD LOCKS space id 24755 page no 38 n bits 264 index idx_order_id of table `sceo_integration`.`unicom_biz_send_recv_middle` trx id 49624312 lock_mode X locks gap before rec insert intention waiting Record lock, heap no 190 PHYSICAL RECORD: n_fields 2; compact format; info bits 32 0: len 6; hex 653331653730; asc e31e70;; 1: len 30; hex 623466633037343535363064343465626263616139393562303036356437; asc b4fc0745560d44ebbcaa995b0065d7; (total 32 bytes); *** WE ROLL BACK TRANSACTION (1) |
上面日志中 lock_mode X locks gap before rec insert intention waiting
表示两个事务同时持有间隙锁,并且都在等待插入意向锁。根据索引字段 order_id
对当前数据库中已有数据做排序:
id | order_id |
---|---|
d080180e908a4a20959b8991e3fb2daa | e37f5317f2e1450a9cd69cfb00543f3d |
8ff5ce0eada34aeba20e651c60d0201f | e37d872e0a6d4be0877064f06f70a994 |
42155c30845d4ae7ac256e6627639f3f | e36e39e5a5674956961a95043ab0592d |
8de655e142ba4f2fb6346d5fbce7aee4 | e2d8fd5ecfa9484a881a2fb073ae3d9b |
73dc37233255495d904e81967792f954 | e27dcbdecc0546caa4bec447e33a8767 |
547d1fc2b1604c8c90595e8c4b78faeb | e27b85294ad8435897cce0029b829db0 |
414ab95ce70849ebabc2dc59c6a7a218 | e272a595a55642c79cbb600da6a0455f |
可以发现日志中两个事务准备insert的两条数据的 order_id
的值:e30c7f...
, e31e70...
排序后刚好都落在 e36e39...
和 e2d8fd...
之间,
这导致两个事务的上一步delete操作都能拿到 (e36e39...
,e2d8fd...
] 这个区间的间隙锁(Gap Lock)。
接下来事务(1)的insert操作发现当前事务持有间隙锁(Gap Lock)则会请求插入意向锁(Insert Intention Lock),因为 插入意向锁(Insert Intention Lock)与其他事务的间隙锁(Gap Lock)互斥,所以事务(1)请求的插入意向锁(Insert Intention Lock)会一直处于阻塞状态(即日志中的 insert intention waiting),并等待其他事务释放该区间的间隙锁(Gap Lock),INFORMATION_SCHEMA.INNODB_LOCKS 中能看到两个事务的持锁情况如下:
lock_trx_id | lock_mode | lock_type | lock_index | lock_space | lock_page | lock_rec | lock_data |
---|---|---|---|---|---|---|---|
49944201 | X,GAP | RECORD | idx_order_id | 34191 | 39 | 105 | 'e36e39', '42155c30845d4ae7ac256e6627639f3f' |
49944184 | X,GAP | RECORD | idx_order_id | 34191 | 39 | 105 | 'e36e39', '42155c30845d4ae7ac256e6627639f3f' |
此时如果事务(2)也请求该区间的插入意向锁(Insert Intention Lock),则MySQL直接认为出现死锁(Dead Lock),并选择一个其认为影响较小的事务进行回滚,以让另一个事务继续下去。
根据非唯一索引删除一条不存在的记录时才会产生间隙锁(Gap Lock),如果记录存在则不会产生间隙锁(Gap Lock),该案例的删除操作应改为根据主键删除,即删除前根据索引字段 order_id
查询,如果有记录返回再根据主键进行删除。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 | ------------------------ LATEST DETECTED DEADLOCK ------------------------ 2020-10-28 14:13:42 0x7f8051206700 *** (1) TRANSACTION: TRANSACTION 22272963, ACTIVE 27 sec starting index read mysql tables in use 1, locked 1 LOCK WAIT 3 lock struct(s), heap size 1136, 2 row lock(s), undo log entries 1 MySQL thread id 54270, OS thread handle 140189101999872, query id 67632913 hf-004239.hkhf.hkgp.net 10.3.99.20 zhaomj updating update inv_lot_inventory_copy set pick_qty =0,inv_qty =10 where id = '317270bd380b49869bdb8abad69e080a' *** (1) WAITING FOR THIS LOCK TO BE GRANTED: RECORD LOCKS space id 30877 page no 12 n bits 88 index PRIMARY of table `yongjia`.`inv_lot_inventory_copy` trx id 22272963 lock_mode X locks rec but not gap waiting Record lock, heap no 9 PHYSICAL RECORD: n_fields 82; compact format; info bits 0 0: len 30; hex 333137323730626433383062343938363962646238616261643639653038; asc 317270bd380b49869bdb8abad69e08; (total 32 bytes); *** (2) TRANSACTION: TRANSACTION 22272966, ACTIVE 23 sec starting index read mysql tables in use 1, locked 1 3 lock struct(s), heap size 1136, 2 row lock(s), undo log entries 1 MySQL thread id 54271, OS thread handle 140189093619456, query id 67632952 hf-004239.hkhf.hkgp.net 10.3.99.20 zhaomj updating update inv_lot_inventory_copy set pick_qty =0,inv_qty =10 where id = '23ea1059baf7441db2b3063836d6dad2' *** (2) HOLDS THE LOCK(S): RECORD LOCKS space id 30877 page no 12 n bits 88 index PRIMARY of table `yongjia`.`inv_lot_inventory_copy` trx id 22272966 lock_mode X locks rec but not gap Record lock, heap no 9 PHYSICAL RECORD: n_fields 82; compact format; info bits 0 0: len 30; hex 333137323730626433383062343938363962646238616261643639653038; asc 317270bd380b49869bdb8abad69e08; (total 32 bytes); *** (2) WAITING FOR THIS LOCK TO BE GRANTED: RECORD LOCKS space id 30877 page no 12 n bits 88 index PRIMARY of table `yongjia`.`inv_lot_inventory_copy` trx id 22272966 lock_mode X locks rec but not gap waiting Record lock, heap no 3 PHYSICAL RECORD: n_fields 82; compact format; info bits 0 0: len 30; hex 323365613130353962616637343431646232623330363338333664366461; asc 23ea1059baf7441db2b3063836d6da; (total 32 bytes); *** WE ROLL BACK TRANSACTION (2) |
日志中locks rec but not gap
表示这两个事务只是在等待行锁(Record Lock),其中事务(2)持有主键索引为317270...
的行锁(Record Lock),并等待主键索引为23ea10...
的行锁(Record Lock);而事务(1)的日志并不完整,只能看到在等待主键索引为317270b...
的行锁(Record Lock),结合业务日志发现事务(一)和事务(2)循环更新批号库存表(inv_lot_inventory_copy)的顺序分别是:
事务(1):
事务(2):
因此推断出事务(1)等待主键索引为 317270...
的行锁(Record Lock)同时也持有主键索引为 23ea10...
的行锁(Record Lock)。
当事务(2)开始处理第二条数据时会进入阻塞状态( locks rec but not gap waiting),等待事务一释放其持有的主键索引为 23ea10...
的行锁(Record Lock),等待的同时,事务(1)开始处理它的第二条数据,而事务(1)处理第二条数据所需的行锁(Record Lock)刚好被事务(2)持有,此时MySQL判定发生了死锁,回滚了事务(2)。
此类死锁,可以通过给需要处理的数据统一按主键或索引字段(取决于更新的条件)排序来解决,保证每个事务处理数据的顺序一致即可。例如此案例中将 两个事务要处理的数据都按主键排序:
事务(1):
事务(2):
这样,如果事务(1)先拿到主键索引为 23ea10...
的行锁(Record Lock),事务(2)开始就会阻塞,不会再拿到主键索引为 317270...
的行锁(Record Lock),直到事务(1)提交或回滚。
部分单据在提交、审批和通过时会对同一张表的同一批数据做更新操作,例如案例二的出入库单,审批时可以修改批号库存的捡料数量,通过时扣减批号库存的库存数量,先后的两次操作并没有问题,但是如果将流程配置为自动通过或者后台调用了流程的自动通过接口,则会导致所有的修改被合并到一个事务中,这时候事务要处理的数据顺序可能是这样的:
事务(1):
事务(2):
这样两个事务仍会死锁,解决方案是将中间过程要修改的数据或是差异记录到单据上,最终将修改内容合并到流程通过的UPDATE语句中。
参考阅读: