richie提出的问题 -dba

Asked: 2023-10-25 00:17:39 +0800 CST

PostgreSQL 可以在查询条件中使用两列的索引，并在单个查询中使用 order by 子句吗？

我shared_buffers在 Mac 上运行 PostgreSQL 11，并将内存设置为 3 GB。我有一个job包含 500 万行的表。表结构是

                           Table "public.job"
   Column   |           Type           | Collation | Nullable | Default
------------+--------------------------+-----------+----------+---------
 id         | uuid                     |           | not null |
 name       | text                     |           |          |
 created_on | timestamp with time zone |           |          |
 updated_on | timestamp with time zone |           |          |
Indexes:
    "job_pkey" PRIMARY KEY, btree (id)
    "job_created_on_idx" btree (created_on)
    "job_name_idx" btree (name)
    "job_updated_on_idx" btree (updated_on)
    "job_updated_on_name_compound_asc_idx" btree (updated_on, upper(name))
    "job_updated_on_name_compound_desc_idx" btree (updated_on DESC, upper(name))

注意我已经在updated_on和name列上创建了复合索引。

当我运行查询时select name, created_on from job where created_on >= '2023-10-08 00:00:00+08'::timestamp with time zone AND created_on < '2023-10-16 00:00:00+08' ORDER BY updated_on ASC, UPPER(name::text) ASC limit 25，PostgreSQL 使用复合索引job_updated_on_name_compound_asc_idx，花费了超过 4 秒的时间。

执行计划

Limit  (cost=0.43..102.29 rows=25 width=61) (actual time=4549.668..4550.235 rows=25 loops=1)
   Buffers: shared hit=4859940
   ->  Index Scan using job_updated_on_name_compound_asc_idx on job  (cost=0.43..416764.16 rows=102293 width=61) (actual time=4549.667..4550.230 rows=25 loops=1)
         Filter: ((created_on >= '2023-10-08 00:00:00+08'::timestamp with time zone) AND (created_on < '2023-10-16 00:00:00+08'::timestamp with time zone))
         Rows Removed by Filter: 4828894
         Buffers: shared hit=4859940
 Planning Time: 0.218 ms
 Execution Time: 4550.260 ms

该列有索引created_on，但未使用。created_on我可以通过附加id到order by子句来强制 PostgreSQL 使用列索引。查询是select name, created_on from job where created_on >= '2023-10-08 00:00:00+08'::timestamp with time zone AND created_on < '2023-10-16 00:00:00+08' ORDER BY updated_on ASC, UPPER(name::text) ASC, id limit 25;. 这次，PostgreSQL 使用了列上的索引created_on，并且非常快地返回结果。

执行计划

Limit  (cost=52190.61..52193.52 rows=25 width=77) (actual time=125.192..138.055 rows=25 loops=1)
   Buffers: shared hit=42788
   ->  Gather Merge  (cost=52190.61..62136.44 rows=85244 width=77) (actual time=125.191..138.049 rows=25 loops=1)
         Workers Planned: 2
         Workers Launched: 2
         Buffers: shared hit=42788
         ->  Sort  (cost=51190.58..51297.14 rows=42622 width=77) (actual time=119.359..119.362 rows=20 loops=3)
               Sort Key: updated_on, (upper(name)), id
               Sort Method: top-N heapsort  Memory: 30kB
               Worker 0:  Sort Method: top-N heapsort  Memory: 31kB
               Worker 1:  Sort Method: top-N heapsort  Memory: 31kB
               Buffers: shared hit=42788
               ->  Parallel Bitmap Heap Scan on job  (cost=2512.94..49987.82 rows=42622 width=77) (actual time=19.915..109.984 rows=36562 loops=3)
                     Recheck Cond: ((created_on >= '2023-10-08 00:00:00+08'::timestamp with time zone) AND (created_on < '2023-10-16 00:00:00+08'::timestamp with time zone))
                     Heap Blocks: exact=24557
                     Buffers: shared hit=42738
                     ->  Bitmap Index Scan on job_created_on_idx  (cost=0.00..2487.36 rows=102293 width=0) (actual time=16.909..16.909 rows=109685 loops=1)
                           Index Cond: ((created_on >= '2023-10-08 00:00:00+08'::timestamp with time zone) AND (created_on < '2023-10-16 00:00:00+08'::timestamp with time zone))
                           Buffers: shared hit=395
 Planning Time: 0.168 ms
 Execution Time: 138.115 ms

如果数据库忙于更新大列行，则执行时间的差异会变得更大。

复合索引是为了提高排序性能而创建的，在某些情况下非常有用。由于我的系统根据用户选择动态生成 SQL，因此查询条件和排序可能会有所不同。在这种特定情况下，添加id到order by子句以避免使用复合索引可以提高性能，但也许在其他一些情况下使用复合索引更好，所以我不能只是简单地删除复合索引。

我还检查了pg_stats表，结果如下：

  attname   | inherited | n_distinct | most_common_vals
------------+-----------+------------+------------------
 id         | f         |         -1 |
 name       | f         |         -1 |
 created_on | f         |  -0.908167 |
 updated_on | f         |         -1 |

我有两个问题：

对于上面的查询，显然使用索引created_on更好。为什么PostgreSQL选择order by子句的复合索引？我可以在 PostgreSQL 上配置什么让它使用正确的索引吗？
看起来 PostgreSQL 不会在查询条件和order by中同时使用列索引。Filter尽管中使用的列已建立索引，但它位于Filter复合索引下。PostgreSQL 是否可以在单个查询中同时使用order by的复合索引和查询条件列的索引？

PostgreSQL 可以在查询条件中使用两列的索引，并在单个查询中使用 order by 子句吗？

连接到 PostgreSQL 服务器：致命：主机没有 pg_hba.conf 条目

如何让sqlplus的输出出现在一行中？

选择具有最大日期或最晚日期的日期

如何列出 PostgreSQL 中的所有模式？

列出指定表的所有列

如何在不修改我自己的 tnsnames.ora 的情况下使用 sqlplus 连接到位于另一台主机上的 Oracle 数据库

你如何mysqldump特定的表？

使用 psql 列出数据库权限

如何从 PostgreSQL 中的选择查询中将值插入表中？

如何使用 psql 列出所有数据库和表？

richie's questions