(Abreviado) resumo da tabela:
-- Table cases:
id SERIAL PRIMARY KEY,
application_number VARCHAR(30) NOT NULL,
publication_number VARCHAR(30) NOT NULL,
-- Table patents:
case_id INTEGER PRIMARY KEY, -- FK to cases(id)
pct_application_number VARCHAR(30) NOT NULL,
pct_publication_number VARCHAR(30) NOT NULL,
-- All character columns have working trigram indexes:
CREATE INDEX cases_application_number_trgm_idx ON cases
USING GIN (application_number gin_trgm_ops);
-- (etc)
A consulta a seguir é lenta (~200ms) porque não usa os índices:
SELECT c.id
FROM cases c
JOIN patents p ON p.case_id = c.id
WHERE c.application_number ILIKE '%1234%' OR p.pct_application_number ILIKE '%1234%'
As seguintes consultas são todas rápidas (1-2ms):
-- AND instead of OR
WHERE c.application_number ILIKE '%1234%' AND p.pct_application_number ILIKE '%1234%'
-- OR, but only table "cases"
WHERE c.application_number ILIKE '%1234%' OR c.publication_number ILIKE '%1234%'
-- OR, but only table "patents"
WHERE p.pct_application_number ILIKE '%1234%' OR p.pct_publication_number ILIKE '%1234%'
-- Simulating the OR with a UNION
SELECT c.id
FROM cases c
JOIN patents p ON p.case_id = c.id
WHERE c.application_number ILIKE '%1234%'
UNION
SELECT c.id
FROM cases c
JOIN patents p ON p.case_id = c.id
WHERE p.pct_application_number ILIKE '%1234%'
Aqui está a EXPLAIN ANALYZE
saída para a consulta lenta:
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------
Hash Join (cost=2329.80..10590.54 rows=7 width=4) (actual time=54.951..186.713 rows=35 loops=1)
Hash Cond: (c.id = p.case_id)
Join Filter: (((c.application_number)::text ~~* '%1234%'::text) OR ((p.pct_application_number)::text ~~* '%1234%'::text))
Rows Removed by Join Filter: 68223
-> Seq Scan on cases c (cost=0.00..4981.99 rows=142099 width=12) (actual time=0.011..32.875 rows=142099 loops=1)
-> Hash (cost=1142.58..1142.58 rows=68258 width=11) (actual time=31.105..31.105 rows=68258 loops=1)
Buckets: 131072 Batches: 2 Memory Usage: 2473kB
-> Seq Scan on patents p (cost=0.00..1142.58 rows=68258 width=11) (actual time=0.019..11.995 rows=68258 loops=1)
Planning time: 1.875 ms
Execution time: 186.780 ms
(10 rows)
A consulta como postada aqui é bastante reduzida para ilustrar o problema. A consulta real é mais complexa e envolve uma pesquisa de texto em seis (ou mais) colunas em cinco (ou mais) tabelas, com cerca de 10 colunas de saída. Acho que eu poderia reescrever tudo isso como uma série de consultas e conectá-las em um enorme UNION
... existe uma maneira melhor de lidar com esse problema?
Adicionando plano de consulta com enable_seqscan
desabilitado (conforme solicitado):
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------
Merge Join (cost=0.71..18767.92 rows=7 width=4) (actual time=4.809..150.368 rows=35 loops=1)
Merge Cond: (c.id = p.case_id)
Join Filter: (((c.application_number)::text ~~* '%1234%'::text) OR ((p.pct_application_number)::text ~~* '%1234%'::text))
Rows Removed by Join Filter: 68223
-> Index Scan using cases_pkey on cases c (cost=0.42..14942.96 rows=142099 width=12) (actual time=0.004..32.695 rows=142097 loops=1)
-> Index Scan using patents_pkey on patents p (cost=0.29..2275.63 rows=68258 width=11) (actual time=0.003..11.942 rows=68258 loops=1)
Planning time: 1.007 ms
Execution time: 150.399 ms
(8 rows)