我有一张简单的平板桌子,大约有。1800 万行。现在我需要使用 ILIKE 运算符对字段进行计数。此查询返回大约 50 万行。当搜索项“较窄”(较长的搜索字符串)时,一切都会很快。
SELECT count(*)::int
FROM "KbaData" AS k
WHERE k."KbaKey" ILIKE '1313%'
但这运行了 30 秒,太长了。该字段的索引定义为:
CREATE INDEX IF NOT EXISTS "IX_KbaData_KbaKey"
ON public."KbaData" USING gin
("KbaKey" COLLATE pg_catalog."default" gin_trgm_ops)
TABLESPACE pg_default;
分析返回:
"Finalize Aggregate (cost=3370561.68..3370561.69 rows=1 width=4) (actual time=35821.869..35832.270 rows=1 loops=1)"
" -> Gather (cost=3370561.46..3370561.67 rows=2 width=8) (actual time=35821.690..35832.235 rows=3 loops=1)"
" Workers Planned: 2"
" Workers Launched: 2"
" -> Partial Aggregate (cost=3369561.46..3369561.47 rows=1 width=8) (actual time=35787.156..35787.158 rows=1 loops=3)"
" -> Parallel Seq Scan on ""KbaData"" k (cost=0.00..3364441.38 rows=2048033 width=0) (actual time=104.308..35644.469 rows=1498715 loops=3)"
" Filter: (""KbaKey"" ~~* '1313%'::text)"
" Rows Removed by Filter: 3271997"
"Planning Time: 0.231 ms"
"JIT:"
" Functions: 14"
" Options: Inlining true, Optimization true, Expressions true, Deforming true"
" Timing: Generation 1.512 ms, Inlining 200.964 ms, Optimization 62.820 ms, Emission 48.232 ms, Total 313.528 ms"
"Execution Time: 35832.920 ms"
我怎样才能做得更好?
计算表的 1/3 很可能比计算表的 1/100000 慢。
对于模式中仅包含以下通配符的情况,使用带有 text_pattern_ops 运算符的 btree 索引可能会更快,因为它可以使用仅索引扫描。(不过还是占桌子的1/3,所以不要抱太大希望)
位图扫描可以使用仅索引扫描进行计数查询,但带有三元组的 LIKE 运算符始终需要重新检查,因此这不适用于此处。