我有一个名为的表my_table
,其中包含大约 30M 条记录。该表在 上设置了索引column_a
。但是,我注意到当返回的查询数量很高时,不使用索引。
未使用的索引:
explain (buffers, analyze) select * from my_table where column_a in (1);
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------
Seq Scan on my_table (cost=0.00..1306095.85 rows=7772384 width=648) (actual time=0.009..6231.312 rows=7720995 loops=1)
Filter: (column_a = 1)
Rows Removed by Filter: 7398657
Buffers: shared hit=463612 read=653528
Planning time: 0.840 ms
Execution time: 7717.923 ms
(6 rows)
使用的索引:
explain (buffers, analyze) select * from my_table where column_a in (8);
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on my_table (cost=1378.58..233236.51 rows=73567 width=648) (actual time=14.258..67.795 rows=74756 loops=1)
Recheck Cond: (column_a = 8)
Heap Blocks: exact=36425
Buffers: shared hit=36632
-> Bitmap Index Scan on my_table_column_a_idx (cost=0.00..1360.19 rows=73567 width=0) (actual time=8.595..8.596 rows=74756 loops=1)
Index Cond: (column_a = 8)
Buffers: shared hit=207
Planning time: 0.855 ms
Execution time: 82.253 ms
(9 rows)
我把桌子吸尘了:
select relallvisible, relpages, relallvisible/relpages as ratio from pg_class where relname='my_table';
relallvisible | relpages | ratio
---------------+----------+-------
1117140 | 1117140 | 1
(1 row)
这是应该的。
如果您返回表的重要部分,则位图索引扫描部分的开销不值得付出,因为无论如何您都必须访问大部分堆块。
所以 PostgreSQL 只是跳过那部分,直接进入堆扫描。结果是顺序扫描。