我有一个 PostgreSQL 服务器版本。15.4 我有一个如下查询:
SELECT i."Id"
FROM "InventoryItems" i
INNER JOIN "Products" p ON i."ProductBaseId" = p."Id"
INNER JOIN (SELECT "CustomerMarketPlaceId", PLAINTO_TSQUERY('english', "Keyword") "Keyword"
FROM "RestrictedKeywords"
WHERE "CustomerMarketPlaceId" = 19100
LIMIT 158
) z
ON z."CustomerMarketPlaceId" = i."CustomerMarketPlaceId"
WHERE i."CustomerMarketPlaceId" = 19100
AND i."Status" IN (1, 2, 3, 4, 6, 7)
AND TO_TSVECTOR('english', COALESCE(p."Title", '')) @@ z."Keyword";
请注意子查询中奇怪的数字158 。这会产生下面的执行计划,这很酷,全部使用索引、仅索引扫描、堆扫描等。
Hash Join (cost=1704.06..102240.40 rows=5758 width=8)
Hash Cond: (p.""Id"" = i.""ProductBaseId"")"
-> Nested Loop (cost=59.43..97792.89 rows=244027 width=8)
-> Subquery Scan on z (cost=0.29..45.30 rows=79 width=36)
Filter: (z.""CustomerMarketPlaceId"" = 19100)"
-> Limit (cost=0.29..43.32 rows=158 width=36)
-> Index Only Scan using ""IX_RestrictedKeywords_CustomerMarketPlaceId"" on ""RestrictedKeywords"" (cost=0.29..3199.81 rows=11747 width=36)"
Index Cond: (""CustomerMarketPlaceId"" = 19100)"
-> Bitmap Heap Scan on ""Products"" p (cost=59.14..1206.42 rows=3089 width=132)"
Recheck Cond: (to_tsvector('english'::regconfig, (COALESCE(""Title"", ''::character varying))::text) @@ z.""Keyword"")"
-> Bitmap Index Scan on ""IX_Products_Title"" (cost=0.00..58.37 rows=3089 width=0)"
Index Cond: (to_tsvector('english'::regconfig, (COALESCE(""Title"", ''::character varying))::text) @@ z.""Keyword"")"
-> Hash (cost=1462.43..1462.43 rows=14576 width=16)
-> Index Scan using ""IX_InventoryItems_CustomerMarketPlaceId_Status"" on ""InventoryItems"" i (cost=0.29..1462.43 rows=14576 width=16)"
Index Cond: ((""CustomerMarketPlaceId"" = 19100) AND (""Status"" = ANY ('{1,2,3,4,6,7}'::integer[])))"
但是当我将该幻数更改为159 时,会出现以下查询计划:
Hash Join (cost=50296.29..104375.05 rows=6195 width=8)
Hash Cond: (i.""ProductBaseId"" = p.""Id"")"
Join Filter: (to_tsvector('english'::regconfig, (COALESCE(p.""Title"", ''::character varying))::text) @@ z.""Keyword"")"
-> Nested Loop (cost=0.58..16998.07 rows=1238960 width=44)
-> Index Scan using ""IX_InventoryItems_CustomerMarketPlaceId_Status"" on ""InventoryItems"" i (cost=0.29..1462.43 rows=14576 width=16)"
Index Cond: ((""CustomerMarketPlaceId"" = 19100) AND (""Status"" = ANY ('{1,2,3,4,6,7}'::integer[])))"
-> Materialize (cost=0.29..48.86 rows=85 width=36)
-> Subquery Scan on z (cost=0.29..48.43 rows=85 width=36)
Filter: (z.""CustomerMarketPlaceId"" = 19100)"
-> Limit (cost=0.29..46.32 rows=169 width=36)
-> Index Only Scan using ""IX_RestrictedKeywords_CustomerMarketPlaceId"" on ""RestrictedKeywords"" (cost=0.29..3199.81 rows=11747 width=36)"
Index Cond: (""CustomerMarketPlaceId"" = 19100)"
-> Hash (cost=30535.76..30535.76 rows=616876 width=132)
-> Seq Scan on ""Products"" p (cost=0.00..30535.76 rows=616876 width=132)"
突然,它开始认为对Products
表进行完整扫描在理论上会执行得更好,但在实践中,查询速度减慢了 100 倍(或者可能更多)。我尝试更改
default_statistics_target
为更大的数字,并重新分析查询中的相关表并没有改变结果。我在生产和测试环境中都尝试过这一点,唯一的区别是限制,因为生产环境有更多的 RAM 可供使用。是什么导致 PostgreSQL 改变主意?(我猜它认为查询不适合内存???)为什么它选择一个极其糟糕的执行计划?即使不使用gin
索引,查询仍然可以通过"InventoryItems"("CustomerMarketPlaceId","Status")
索引进行过滤。
我不知道这是否有帮助,但查询中使用的索引如下所示:
CREATE INDEX "IX_Products_Title"
ON "Products" USING gin (TO_TSVECTOR('english'::REGCONFIG, COALESCE("Title", ''::CHARACTER VARYING)::TEXT));
如果我像下面这样重构相同的查询(即使我删除了子查询中的限制),它将选择更好的执行计划,并且确实会产生更快的查询
SELECT i."Id"
FROM "InventoryItems" i
INNER JOIN "Products" p ON i."ProductBaseId" = p."Id"
INNER JOIN (SELECT "CustomerMarketPlaceId", PLAINTO_TSQUERY('english', "Keyword") "Keyword"
FROM "RestrictedKeywords"
-- WHERE "CustomerMarketPlaceId" = 19100
--- LIMIT 500
) z
ON TO_TSVECTOR('english', COALESCE(p."Title", '')) @@ z."Keyword" z."CustomerMarketPlaceId" = i."CustomerMarketPlaceId"
WHERE i."CustomerMarketPlaceId" = 19100
AND i."Status" IN (1, 2, 3, 4, 6, 7)
AND TO_TSVECTOR('english', COALESCE(p."Title", '')) @@ z."Keyword"; -- <-- removing this condition will cause full scans
产生这个执行计划:
Hash Join (cost=1646.19..238317.48 rows=4281 width=8)
Hash Cond: (p.""Id"" = i.""ProductBaseId"")"
-> Nested Loop (cost=1.56..234592.40 rows=181124 width=4)
-> Index Only Scan using ""IX_RestrictedKeywords_CustomerMarketPlaceId"" on ""RestrictedKeywords"" (cost=0.29..263.06 rows=11747 width=11)"
Index Cond: (""CustomerMarketPlaceId"" = 19100)"
-> Bitmap Heap Scan on ""Products"" p (cost=1.28..19.80 rows=15 width=132)"
Recheck Cond: ((to_tsvector('english'::regconfig, (COALESCE(""Title"", ''::character varying))::text) @@ plainto_tsquery('english'::regconfig, (""RestrictedKeywords"".""Keyword"")::text)) AND (to_tsvector('english'::regconfig, (COALESCE(""Title"", ''::character varying))::text) @@ plainto_tsquery('english'::regconfig, (""RestrictedKeywords"".""Keyword"")::text)))"
-> Bitmap Index Scan on ""IX_Products_Title"" (cost=0.00..1.27 rows=15 width=0)"
Index Cond: ((to_tsvector('english'::regconfig, (COALESCE(""Title"", ''::character varying))::text) @@ plainto_tsquery('english'::regconfig, (""RestrictedKeywords"".""Keyword"")::text)) AND (to_tsvector('english'::regconfig, (COALESCE(""Title"", ''::character varying))::text) @@ plainto_tsquery('english'::regconfig, (""RestrictedKeywords"".""Keyword"")::text)))"
-> Hash (cost=1462.43..1462.43 rows=14576 width=12)
-> Index Scan using ""IX_InventoryItems_CustomerMarketPlaceId_Status"" on ""InventoryItems"" i (cost=0.29..1462.43 rows=14576 width=12)"
Index Cond: ((""CustomerMarketPlaceId"" = 19100) AND (""Status"" = ANY ('{1,2,3,4,6,7}'::integer[])))"
任何帮助我理解这些执行计划选择的解释将不胜感激。
编辑:我已explain (analyze,buffers)
按照评论中的要求添加了前两个限制查询的输出:
Hash Join (cost=1684.53..99640.82 rows=5780 width=8) (actual time=15.972..333.366 rows=1953 loops=1)
Hash Cond: (p.""Id"" = i.""ProductBaseId"")"
Buffers: shared hit=41110 dirtied=1513
-> Nested Loop (cost=34.49..95801.34 rows=243616 width=8) (actual time=0.273..316.567 rows=47020 loops=1)
Buffers: shared hit=39674 dirtied=1513
-> Subquery Scan on z (cost=0.29..45.30 rows=79 width=36) (actual time=0.050..0.566 rows=158 loops=1)
Filter: (z.""CustomerMarketPlaceId"" = 19100)"
Buffers: shared hit=4
-> Limit (cost=0.29..43.32 rows=158 width=36) (actual time=0.049..0.544 rows=158 loops=1)
Buffers: shared hit=4
-> Index Only Scan using ""IX_RestrictedKeywords_CustomerMarketPlaceId"" on ""RestrictedKeywords"" (cost=0.29..3199.81 rows=11747 width=36) (actual time=0.048..0.528 rows=158 loops=1)"
Index Cond: (""CustomerMarketPlaceId"" = 19100)"
Heap Fetches: 0
Buffers: shared hit=4
-> Bitmap Heap Scan on ""Products"" p (cost=34.20..1181.26 rows=3084 width=132) (actual time=0.127..1.960 rows=298 loops=158)"
Recheck Cond: (to_tsvector('english'::regconfig, (COALESCE(""Title"", ''::character varying))::text) @@ z.""Keyword"")"
Heap Blocks: exact=37818
Buffers: shared hit=39670 dirtied=1513
-> Bitmap Index Scan on ""IX_Products_Title"" (cost=0.00..33.43 rows=3084 width=0) (actual time=0.096..0.096 rows=298 loops=158)"
Index Cond: (to_tsvector('english'::regconfig, (COALESCE(""Title"", ''::character varying))::text) @@ z.""Keyword"")"
Buffers: shared hit=1803
-> Hash (cost=1467.12..1467.12 rows=14634 width=16) (actual time=11.340..11.340 rows=19978 loops=1)
Buckets: 32768 (originally 16384) Batches: 1 (originally 1) Memory Usage: 1193kB
Buffers: shared hit=1436
-> Index Scan using ""IX_InventoryItems_CustomerMarketPlaceId_Status"" on ""InventoryItems"" i (cost=0.29..1467.12 rows=14634 width=16) (actual time=0.012..8.048 rows=19978 loops=1)"
Index Cond: ((""CustomerMarketPlaceId"" = 19100) AND (""Status"" = ANY ('{1,2,3,4,6,7}'::integer[])))"
Buffers: shared hit=1436
Planning:
Buffers: shared hit=29
Planning Time: 0.753 ms
Execution Time: 333.540 ms
Hash Join (cost=50366.43..107003.29 rows=6585 width=8) (actual time=414.038..86828.819 rows=2373 loops=1)
Hash Cond: (i.""ProductBaseId"" = p.""Id"")"
Join Filter: (to_tsvector('english'::regconfig, (COALESCE(p.""Title"", ''::character varying))::text) @@ z.""Keyword"")"
Rows Removed by Join Filter: 3593667
Buffers: shared hit=25883 dirtied=267, temp read=29576 written=29576"
-> Nested Loop (cost=0.58..17982.15 rows=1317060 width=44) (actual time=0.045..591.422 rows=3596040 loops=1)
Buffers: shared hit=1440
-> Index Scan using ""IX_InventoryItems_CustomerMarketPlaceId_Status"" on ""InventoryItems"" i (cost=0.29..1467.12 rows=14634 width=16) (actual time=0.013..7.770 rows=19978 loops=1)"
Index Cond: ((""CustomerMarketPlaceId"" = 19100) AND (""Status"" = ANY ('{1,2,3,4,6,7}'::integer[])))"
Buffers: shared hit=1436
-> Materialize (cost=0.29..52.01 rows=90 width=36) (actual time=0.000..0.008 rows=180 loops=19978)
Buffers: shared hit=4
-> Subquery Scan on z (cost=0.29..51.56 rows=90 width=36) (actual time=0.029..0.531 rows=180 loops=1)
Filter: (z.""CustomerMarketPlaceId"" = 19100)"
Buffers: shared hit=4
-> Limit (cost=0.29..49.31 rows=180 width=36) (actual time=0.028..0.507 rows=180 loops=1)
Buffers: shared hit=4
-> Index Only Scan using ""IX_RestrictedKeywords_CustomerMarketPlaceId"" on ""RestrictedKeywords"" (cost=0.29..3199.81 rows=11747 width=36) (actual time=0.028..0.491 rows=180 loops=1)"
Index Cond: (""CustomerMarketPlaceId"" = 19100)"
Heap Fetches: 0
Buffers: shared hit=4
-> Hash (cost=30610.49..30610.49 rows=616749 width=132) (actual time=349.356..349.356 rows=616749 loops=1)
Buckets: 524288 Batches: 4 Memory Usage: 26496kB
Buffers: shared hit=24443 dirtied=267, temp written=7668"
-> Seq Scan on ""Products"" p (cost=0.00..30610.49 rows=616749 width=132) (actual time=0.004..178.437 rows=616749 loops=1)"
Buffers: shared hit=24443 dirtied=267
Planning:
Buffers: shared hit=29
Planning Time: 0.423 ms
Execution Time: 86829.206 ms