(缩写)表摘要:
-- Table cases:
id SERIAL PRIMARY KEY,
application_number VARCHAR(30) NOT NULL,
publication_number VARCHAR(30) NOT NULL,
-- Table patents:
case_id INTEGER PRIMARY KEY, -- FK to cases(id)
pct_application_number VARCHAR(30) NOT NULL,
pct_publication_number VARCHAR(30) NOT NULL,
-- All character columns have working trigram indexes:
CREATE INDEX cases_application_number_trgm_idx ON cases
USING GIN (application_number gin_trgm_ops);
-- (etc)
以下查询很慢(~200 毫秒),因为它不使用索引:
SELECT c.id
FROM cases c
JOIN patents p ON p.case_id = c.id
WHERE c.application_number ILIKE '%1234%' OR p.pct_application_number ILIKE '%1234%'
以下查询都很快(1-2 毫秒):
-- AND instead of OR
WHERE c.application_number ILIKE '%1234%' AND p.pct_application_number ILIKE '%1234%'
-- OR, but only table "cases"
WHERE c.application_number ILIKE '%1234%' OR c.publication_number ILIKE '%1234%'
-- OR, but only table "patents"
WHERE p.pct_application_number ILIKE '%1234%' OR p.pct_publication_number ILIKE '%1234%'
-- Simulating the OR with a UNION
SELECT c.id
FROM cases c
JOIN patents p ON p.case_id = c.id
WHERE c.application_number ILIKE '%1234%'
UNION
SELECT c.id
FROM cases c
JOIN patents p ON p.case_id = c.id
WHERE p.pct_application_number ILIKE '%1234%'
这是EXPLAIN ANALYZE
慢速查询的输出:
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------
Hash Join (cost=2329.80..10590.54 rows=7 width=4) (actual time=54.951..186.713 rows=35 loops=1)
Hash Cond: (c.id = p.case_id)
Join Filter: (((c.application_number)::text ~~* '%1234%'::text) OR ((p.pct_application_number)::text ~~* '%1234%'::text))
Rows Removed by Join Filter: 68223
-> Seq Scan on cases c (cost=0.00..4981.99 rows=142099 width=12) (actual time=0.011..32.875 rows=142099 loops=1)
-> Hash (cost=1142.58..1142.58 rows=68258 width=11) (actual time=31.105..31.105 rows=68258 loops=1)
Buckets: 131072 Batches: 2 Memory Usage: 2473kB
-> Seq Scan on patents p (cost=0.00..1142.58 rows=68258 width=11) (actual time=0.019..11.995 rows=68258 loops=1)
Planning time: 1.875 ms
Execution time: 186.780 ms
(10 rows)
此处发布的查询已大大减少以说明问题。实际查询更复杂,涉及在五个(或更多)表的六个(或更多)列中进行文本搜索,大约有 10 个输出列。我想我可以将所有这些重写为一系列查询并将它们连接成一个巨大的UNION
......有没有更好的方法来处理这个问题?
添加enable_seqscan
禁用的查询计划(根据要求):
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------
Merge Join (cost=0.71..18767.92 rows=7 width=4) (actual time=4.809..150.368 rows=35 loops=1)
Merge Cond: (c.id = p.case_id)
Join Filter: (((c.application_number)::text ~~* '%1234%'::text) OR ((p.pct_application_number)::text ~~* '%1234%'::text))
Rows Removed by Join Filter: 68223
-> Index Scan using cases_pkey on cases c (cost=0.42..14942.96 rows=142099 width=12) (actual time=0.004..32.695 rows=142097 loops=1)
-> Index Scan using patents_pkey on patents p (cost=0.29..2275.63 rows=68258 width=11) (actual time=0.003..11.942 rows=68258 loops=1)
Planning time: 1.007 ms
Execution time: 150.399 ms
(8 rows)
恐怕现在没有好的解决方案,除了重写成 UNION(或非规范化/重构数据)。
有一项提议将ORs 自动转换为 UNIONs,但它需要更多的测试和审查才能使其进入 v11 of
PostgreSQL
,但我已经证实它确实适用于像你这样的案例。