我最近发现我的生产应用程序的数据库实例的 CPU 使用率约为 100%,当我查看卡住的查询时,我发现很多卡住的查询涉及一个表,该表有一列包含祖先列表,基本上执行正则表达式子字符串以查找该列的父 ID,然后执行 aNOT IN
以排除父行...
SELECT fp.*, admin_folders.id AS folder_id FROM folder_permissions fp RIGHT OUTER JOIN (
SELECT f.* FROM folders f WHERE f.deleted = FALSE AND ((SUBSTRING(f.ancestry FROM '([^/]*)$')::integer NOT IN (SELECT f2.id FROM folders f2 WHERE deleted = FALSE)) OR (f.ancestry IS NULL))
) admin_folders
ON admin_folders.id = fp.folder_id
AND fp.user_id = 12345 ORDER BY lower(admin_folders.name);
最初我认为可能是正则表达式的子字符串导致了问题,所以我尝试添加一个 ltree 列并将其更改为((ltree2text(subpath(f.path, -2, 1))::integer
... 这似乎使其在较低环境中稍微快一点,但当此查询时,生产数据库再次无限期挂起完毕。
我感兴趣的是,在较低的环境中,如果我使用具有 >100,000 条文件夹权限记录的用户 ID 执行此查询,它是瞬时的。
----------------------------------------------------------------------------------------------------------------------------------
Sort (cost=24.49..24.50 rows=5 width=73) (actual time=4.286..4.289 rows=8 loops=1)
Sort Key: (lower((f.name)::text))
Sort Method: quicksort Memory: 25kB
-> Nested Loop Left Join (cost=10.22..24.43 rows=5 width=73) (actual time=3.810..3.997 rows=8 loops=1)
Join Filter: (f.id = fp.folder_id)
-> Seq Scan on folders f (cost=10.22..20.62 rows=5 width=520) (actual time=3.448..3.624 rows=8 loops=1)
Filter: ((NOT deleted) AND ((NOT (hashed SubPlan 1)) OR (ancestry IS NULL)))
Rows Removed by Filter: 14
SubPlan 1
-> Seq Scan on folders f2 (cost=0.00..10.20 rows=10 width=4) (actual time=0.008..0.018 rows=22 loops=1)
Filter: (NOT deleted)
-> Materialize (cost=0.00..3.72 rows=1 width=37) (actual time=0.015..0.015 rows=0 loops=8)
-> Seq Scan on folder_permissions fp (cost=0.00..3.71 rows=1 width=37) (actual time=0.120..0.120 rows=0 loops=1)
Filter: (user_id = 12345)
Rows Removed by Filter: 175
Planning Time: 2.346 ms
Execution Time: 5.266 ms
(17 rows)
在生产环境中,用只有 8,000 个文件夹权限记录的用户尝试它,它无限期地挂起...我已经让解释分析运行了 8 个多小时,但我从未看到输出...
较低和较高的环境都有相同的索引,我尝试重建它们。
*** 更新 ***
应要求,现对挂机进行说明:
=> explain SELECT fp.*, admin_folders.id AS folder_id FROM folder_permissions fp RIGHT OUTER JOIN (SELECT f.* FROM folders f WHERE f.deleted = FALSE AND ((SUBSTRING(f.ancestry FROM '([^/]*)$')::integer NOT IN (SELECT f2.id FROM folders f2 WHERE deleted = FALSE)) OR (f.ancestry IS NULL))) admin_folders ON admin_folders.id = fp.folder_id AND fp.user_id = 12345 ORDER BY lower(admin_folders.name);
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------
Gather Merge (cost=332160509.79..332167328.27 rows=58440 width=73)
Workers Planned: 2
-> Sort (cost=332159509.77..332159582.82 rows=29220 width=73)
Sort Key: (lower((f.name)::text))
-> Merge Left Join (cost=17496.11..332157342.42 rows=29220 width=73)
Merge Cond: (f.id = fp.folder_id)
-> Parallel Index Scan using index_folder_id_on_undeleted_v2 on folders f (cost=0.42..332139670.56 rows=29220 width=20)
Filter: ((NOT (SubPlan 1)) OR (ancestry IS NULL))
SubPlan 1
-> Materialize (cost=0.00..11018.00 rows=140223 width=4)
-> Seq Scan on folders f2 (cost=0.00..9768.88 rows=140223 width=4)
Filter: (NOT deleted)
-> Sort (cost=17495.69..17507.71 rows=4807 width=37)
Sort Key: fp.folder_id
-> Bitmap Heap Scan on folder_permissions fp (cost=93.82..17201.72 rows=4807 width=37)
Recheck Cond: (user_id = 12345)
-> Bitmap Index Scan on index_folder_permissions_on_user_id (cost=0.00..92.62 rows=4807 width=0)
Index Cond: (user_id = 12345)
(18 rows)
并解释对没有 NOT IN 部分的查询的分析:
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Sort (cost=410.76..410.84 rows=33 width=73) (actual time=12.446..12.483 rows=43 loops=1)
Sort Key: (lower((f.name)::text))
Sort Method: quicksort Memory: 28kB
-> Nested Loop Left Join (cost=5.11..409.92 rows=33 width=73) (actual time=0.417..12.314 rows=43 loops=1)
-> Bitmap Heap Scan on folders f (cost=4.68..130.41 rows=33 width=20) (actual time=0.392..1.518 rows=43 loops=1)
Recheck Cond: ((ancestry IS NULL) AND (NOT deleted))
Heap Blocks: exact=43
-> Bitmap Index Scan on index_folders_on_ancestry_and_not_deleted (cost=0.00..4.67 rows=33 width=0) (actual time=0.220..0.221 rows=1620 loops=1)
Index Cond: (ancestry IS NULL)
-> Index Scan using index_folder_permissions_on_folder_id_and_user_id on folder_permissions fp (cost=0.44..8.46 rows=1 width=37) (actual time=0.246..0.246 rows=0 loops=43)
Index Cond: ((folder_id = f.id) AND (user_id = 12345))
Planning Time: 0.464 ms
Execution Time: 12.581 ms
(13 rows)
在非挂机上也是同样的情况:
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Sort (cost=556.42..556.53 rows=45 width=73) (actual time=0.685..0.726 rows=49 loops=1)
Sort Key: (lower((f.name)::text))
Sort Method: quicksort Memory: 29kB
-> Nested Loop Left Join (cost=5.20..555.19 rows=45 width=73) (actual time=0.045..0.589 rows=49 loops=1)
-> Bitmap Heap Scan on folders f (cost=4.77..174.15 rows=45 width=17) (actual time=0.025..0.178 rows=49 loops=1)
Recheck Cond: ((ancestry IS NULL) AND (NOT deleted))
Heap Blocks: exact=45
-> Bitmap Index Scan on index_folders_on_ancestry_and_not_deleted (cost=0.00..4.76 rows=45 width=0) (actual time=0.014..0.015 rows=51 loops=1)
Index Cond: (ancestry IS NULL)
-> Index Scan using index_folder_permissions_on_folder_id_and_user_id on folder_permissions fp (cost=0.43..8.46 rows=1 width=37) (actual time=0.004..0.005 rows=0 loops=49)
Index Cond: ((folder_id = f.id) AND (user_id = 12345))
Planning Time: 0.404 ms
Execution Time: 0.845 ms
(13 rows)
最后,解释一下导致挂起的查询部分:
compass=> explain SELECT f.* FROM folders f WHERE f.deleted = FALSE AND ((SUBSTRING(f.ancestry FROM '([^/]*)$')::integer NOT IN (SELECT f2.id FROM folders f2 WHERE deleted = FALSE)) OR (f.ancestry IS NULL));
QUERY PLAN
------------------------------------------------------------------------------------------------------------
Gather (cost=3547.07..332130118.64 rows=70128 width=486)
Workers Planned: 2
-> Parallel Bitmap Heap Scan on folders f (cost=2547.07..332122105.84 rows=29220 width=486)
Recheck Cond: (NOT deleted)
Filter: ((NOT (SubPlan 1)) OR (ancestry IS NULL))
-> Bitmap Index Scan on index_folder_id_on_undeleted_v2 (cost=0.00..2529.53 rows=140223 width=0)
SubPlan 1
-> Materialize (cost=0.00..11018.00 rows=140223 width=4)
-> Seq Scan on folders f2 (cost=0.00..9768.88 rows=140223 width=4)
Filter: (NOT deleted)
(10 rows)