Asked: 2024-10-15 23:01:34 +0800 CST

如何预测查询计划是否涉及使用索引？

我正在使用最新的 PostgreSQL docker 镜像来创建本地数据库（在 Apple M1 Pro - MacOS Sonoma 14.5 机器上）。我创建一个table0包含单个列的表，并用 2 到 16 个字符的随机字符串填充它。我在和上col0创建了一个三元索引。具体步骤：col0vacuum (analyze)

create table public.table0 (
    col0 varchar(25)
);

select setseed(0.12343);

insert into table0 (col0)
select substring(md5(random()::text), 1, (2 + (random() * 14))::int)
from generate_series(1, 12345678);

create extension pg_trgm;
create index col0_gin_trgm_idx on table0 using gin (col0 gin_trgm_ops);

vacuum (analyze) table0;

我检查了选择包含字符串的 200 行的查询计划abc：

explain analyze
select * from table0 where col0 like '%abc%' limit 200;

输出，确认不是三字母索引，而是使用顺序扫描：

                                                     QUERY PLAN                                                     
--------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.00..351.78 rows=200 width=10) (actual time=0.313..15.640 rows=200 loops=1)
   ->  Seq Scan on table0  (cost=0.00..216614.79 rows=123154 width=10) (actual time=0.312..15.620 rows=200 loops=1)
         Filter: ((col0)::text ~~ '%abc%'::text)
         Rows Removed by Filter: 115643
 Planning Time: 4.401 ms
 Execution Time: 15.841 ms
(6 rows)

但是，如果我不搜索包含以下内容的行，而是abc搜索包含以下内容的行bcd：

explain analyze
select * from table0 where col0 like '%bcd%' limit 200;

然后我得到一个不同的查询计划，它现在包括一个索引扫描：

                                                               QUERY PLAN                                                               
----------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=52.34..764.83 rows=200 width=10) (actual time=7.032..7.230 rows=200 loops=1)
   ->  Bitmap Heap Scan on table0  (cost=52.34..4394.94 rows=1219 width=10) (actual time=7.031..7.220 rows=200 loops=1)
         Recheck Cond: ((col0)::text ~~ '%bcd%'::text)
         Heap Blocks: exact=169
         ->  Bitmap Index Scan on col0_gin_trgm_idx  (cost=0.00..52.04 rows=1219 width=0) (actual time=5.100..5.100 rows=21264 loops=1)
               Index Cond: ((col0)::text ~~ '%bcd%'::text)
 Planning Time: 0.521 ms
 Execution Time: 7.366 ms
(8 rows)

setseed(0.12343);即使使用了，此设置也可能无法在第一次尝试时完全重现，因为analyze“根据其自己的随机行选择收集统计信息”（请参阅此处的第二段）。我多次重现了上述情况，并且我从未尝试过设置步骤超过 4 次，所以我希望它很容易重现，即使我提供的代码不是完全确定的。（我在尝试之间删除并重新启动了 docker 容器。）

这个答案对为什么曾经使用顺序扫描以及为什么有时使用索引扫描给出了基本解释。它还提供了 2 条关于如何阻止顺序扫描的建议：修改random_page_cost和STATISTICS值。

我设置random_page_cost为 1.1 (通过)。我还通过(通过) ALTER DATABASE postgres SET random_page_cost = 1.1;“提高了收集的统计数据量” 。再一次我重新运行：analyzeALTER TABLE table0 ALTER COLUMN col0 SET STATISTICS 1000;vacuum (analyze) table0;

explain analyze
select * from table0 where col0 like '%abc%' limit 200;

这次col0_gin_trgm_idx确实使用了 trigram 索引。之后，我重新创建了上述场景，在不修改random_page_cost或STATISTICS重新运行的情况下vacuum (analyze) table0;，这也修改了行为并导致从顺序扫描切换到索引扫描。我相信这是由于收集的统计数据具有不确定性analyze。

这次，我不再能够触发索引的使用（我现在能够这样做，这主要归功于提到的答案），而是想了解如何在顺序扫描和索引扫描之间做出决定的细节。理想情况下，我希望能够预测查询是否：

explain analyze
select * from table0 where col0 like '%xyz%' limit 200;

将触发索引扫描或顺序扫描，了解xyz，以及与数据库统计信息或设置相关的任何内容。以前，在类似问题的背景下，我被建议检查SELECT name, setting FROM pg_settings WHERE name = ANY ( '{shared_buffers, effective_cache_size, random_page_cost, effective_io_concurrency, work_mem}'::text[]);。它返回（在修改默认设置之前）：

           name           | setting 
--------------------------+---------
 effective_cache_size     | 524288
 effective_io_concurrency | 1
 random_page_cost         | 4
 shared_buffers           | 16384
 work_mem                 | 4096
(5 rows)

我认为这些值会影响顺序扫描与索引扫描的决定。我希望存在一个具有两个可能输出的函数f ：顺序扫描或索引扫描。我想象f将xyz、、random_page_cost收集的统计数据analyze等作为输入。我想了解输入列表（即什么是等？）以及f如何处理它们。

如何预测查询计划是否涉及使用索引？

zabop

Asked: 2024-10-12 05:02:23 +0800 CST

查询计划似乎取决于查询和表设置之间的时间间隔？

我正在使用 PostgreSQL 17。例如，我有一张表，table0其中有一列col0，其中随机生成的字符串作为GIN 索引值。我使用以下命令setup.sql创建这样的表：

create table public.table0 (
    col0 varchar(25)
);

select setseed(0.12345);

insert into table0 (col0)
select substring(md5(random()::text), 1, (2 + (random() * 14))::int)
from generate_series(1, 12345678);

create extension pg_trgm;
create index col0_gin_trgm_idx on table0 using gin (col0 gin_trgm_ops);

vacuum (full, analyze) table0;

使用select setseed(0.12345);可确保每次执行时创建的表都是相同的setup.sql。我想观察一个简单的部分字符串匹配查询的执行计划。为此，我使用query.sql：

explain (analyze, buffers)
select * from table0 where col0 like '%abc%' limit 500;

令我惊讶的是，查询的执行计划并不是恒定的，它似乎取决于表创建和查询执行之间的时间。为了演示这一点，我创建了以下 bash 脚本：

#!/bin/bash

set -o errexit
set -o nounset
set -o pipefail

rm -f records.txt

for i in {1..9}; do

    docker run \
        --name postgres-db \
        --env POSTGRES_DB=postgres \
        --env POSTGRES_USER=postgres \
        --env POSTGRES_PASSWORD=mysecretpassword \
        --publish 5432:5432\
        --detach postgres

    sleep 2 # wait for docker container to start

    PGPASSWORD=mysecretpassword psql \
        --host=localhost \
        --port=5432 \
        --username=postgres \
        --dbname=postgres \
        --set=ON_ERROR_STOP=1 \
        --file=setup.sql

    sleep $((RANDOM % 10))

    PGPASSWORD=mysecretpassword psql \
        --host=localhost \
        --port=5432 \
        --username=postgres \
        --dbname=postgres \
        --set=ON_ERROR_STOP=1 \
        --file=query.sql >> records.txt

    docker rm --force postgres-db

done

此脚本设置表，等待最多 10 秒的随机时间，然后执行上述操作query.sql。它会执行几次。它将query.sql输出保存到records.txt。我查看了一下records.txt，发现有时使用顺序扫描，有时使用索引扫描来执行查询。以下经过筛选的版本（通过cat records.txt | grep "\->"）records.txt：

   ->  Bitmap Heap Scan on table0  (cost=52.34..4394.93 rows=1219 width=10) (actual time=11.526..15.200 rows=500 loops=1)
         ->  Bitmap Index Scan on col0_gin_trgm_idx  (cost=0.00..52.04 rows=1219 width=0) (actual time=9.559..9.559 rows=20852 loops=1)
   ->  Seq Scan on table0  (cost=0.00..216612.01 rows=122742 width=9) (actual time=0.068..17.788 rows=500 loops=1)
   ->  Bitmap Heap Scan on table0  (cost=52.34..4394.93 rows=1219 width=9) (actual time=5.963..8.939 rows=500 loops=1)
         ->  Bitmap Index Scan on col0_gin_trgm_idx  (cost=0.00..52.04 rows=1219 width=0) (actual time=4.144..4.144 rows=20852 loops=1)
   ->  Bitmap Heap Scan on table0  (cost=52.32..4377.97 rows=1214 width=9) (actual time=7.447..11.406 rows=500 loops=1)
         ->  Bitmap Index Scan on col0_gin_trgm_idx  (cost=0.00..52.01 rows=1214 width=0) (actual time=5.594..5.594 rows=20852 loops=1)
   ->  Bitmap Heap Scan on table0  (cost=52.33..4384.75 rows=1216 width=9) (actual time=6.660..11.991 rows=500 loops=1)
         ->  Bitmap Index Scan on col0_gin_trgm_idx  (cost=0.00..52.02 rows=1216 width=0) (actual time=4.744..4.745 rows=20852 loops=1)
   ->  Bitmap Heap Scan on table0  (cost=52.34..4394.93 rows=1219 width=9) (actual time=9.153..13.563 rows=500 loops=1)
         ->  Bitmap Index Scan on col0_gin_trgm_idx  (cost=0.00..52.04 rows=1219 width=0) (actual time=7.141..7.141 rows=20852 loops=1)
   ->  Bitmap Heap Scan on table0  (cost=52.31..4374.58 rows=1213 width=10) (actual time=10.078..13.199 rows=500 loops=1)
         ->  Bitmap Index Scan on col0_gin_trgm_idx  (cost=0.00..52.01 rows=1213 width=0) (actual time=8.108..8.108 rows=20852 loops=1)
   ->  Bitmap Heap Scan on table0  (cost=52.33..4384.76 rows=1216 width=10) (actual time=7.322..12.073 rows=500 loops=1)
         ->  Bitmap Index Scan on col0_gin_trgm_idx  (cost=0.00..52.02 rows=1216 width=0) (actual time=5.526..5.527 rows=20852 loops=1)
   ->  Seq Scan on table0  (cost=0.00..216610.51 rows=245232 width=9) (actual time=0.073..23.047 rows=500 loops=1)

完整的records.txt是：

                                                               QUERY PLAN                                                               
----------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=52.34..1833.55 rows=500 width=10) (actual time=11.527..15.249 rows=500 loops=1)
   Buffers: shared hit=4 read=435
   ->  Bitmap Heap Scan on table0  (cost=52.34..4394.93 rows=1219 width=10) (actual time=11.526..15.200 rows=500 loops=1)
         Recheck Cond: ((col0)::text ~~ '%abc%'::text)
         Heap Blocks: exact=428
         Buffers: shared hit=4 read=435
         ->  Bitmap Index Scan on col0_gin_trgm_idx  (cost=0.00..52.04 rows=1219 width=0) (actual time=9.559..9.559 rows=20852 loops=1)
               Index Cond: ((col0)::text ~~ '%abc%'::text)
               Buffers: shared hit=4 read=7
 Planning:
   Buffers: shared hit=51 read=13 dirtied=1
 Planning Time: 4.666 ms
 Execution Time: 15.657 ms
(13 rows)

                                                    QUERY PLAN                                                     
-------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.00..882.39 rows=500 width=9) (actual time=0.068..17.831 rows=500 loops=1)
   Buffers: shared read=1439
   ->  Seq Scan on table0  (cost=0.00..216612.01 rows=122742 width=9) (actual time=0.068..17.788 rows=500 loops=1)
         Filter: ((col0)::text ~~ '%abc%'::text)
         Rows Removed by Filter: 282649
         Buffers: shared read=1439
 Planning:
   Buffers: shared hit=55 read=9 dirtied=1
 Planning Time: 2.427 ms
 Execution Time: 17.918 ms
(10 rows)

                                                               QUERY PLAN                                                               
----------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=52.34..1833.55 rows=500 width=9) (actual time=5.965..8.988 rows=500 loops=1)
   Buffers: shared hit=4 read=435
   ->  Bitmap Heap Scan on table0  (cost=52.34..4394.93 rows=1219 width=9) (actual time=5.963..8.939 rows=500 loops=1)
         Recheck Cond: ((col0)::text ~~ '%abc%'::text)
         Heap Blocks: exact=428
         Buffers: shared hit=4 read=435
         ->  Bitmap Index Scan on col0_gin_trgm_idx  (cost=0.00..52.04 rows=1219 width=0) (actual time=4.144..4.144 rows=20852 loops=1)
               Index Cond: ((col0)::text ~~ '%abc%'::text)
               Buffers: shared hit=4 read=7
 Planning:
   Buffers: shared hit=51 read=13 dirtied=1
 Planning Time: 2.427 ms
 Execution Time: 9.234 ms
(13 rows)

                                                               QUERY PLAN                                                               
----------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=52.32..1833.89 rows=500 width=9) (actual time=7.448..11.454 rows=500 loops=1)
   Buffers: shared hit=4 read=435
   ->  Bitmap Heap Scan on table0  (cost=52.32..4377.97 rows=1214 width=9) (actual time=7.447..11.406 rows=500 loops=1)
         Recheck Cond: ((col0)::text ~~ '%abc%'::text)
         Heap Blocks: exact=428
         Buffers: shared hit=4 read=435
         ->  Bitmap Index Scan on col0_gin_trgm_idx  (cost=0.00..52.01 rows=1214 width=0) (actual time=5.594..5.594 rows=20852 loops=1)
               Index Cond: ((col0)::text ~~ '%abc%'::text)
               Buffers: shared hit=4 read=7
 Planning:
   Buffers: shared hit=81 read=13 dirtied=1
 Planning Time: 2.835 ms
 Execution Time: 11.721 ms
(13 rows)

                                                               QUERY PLAN                                                               
----------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=52.33..1833.75 rows=500 width=9) (actual time=6.662..12.059 rows=500 loops=1)
   Buffers: shared hit=4 read=435
   ->  Bitmap Heap Scan on table0  (cost=52.33..4384.75 rows=1216 width=9) (actual time=6.660..11.991 rows=500 loops=1)
         Recheck Cond: ((col0)::text ~~ '%abc%'::text)
         Heap Blocks: exact=428
         Buffers: shared hit=4 read=435
         ->  Bitmap Index Scan on col0_gin_trgm_idx  (cost=0.00..52.02 rows=1216 width=0) (actual time=4.744..4.745 rows=20852 loops=1)
               Index Cond: ((col0)::text ~~ '%abc%'::text)
               Buffers: shared hit=4 read=7
 Planning:
   Buffers: shared hit=51 read=13 dirtied=1
 Planning Time: 3.332 ms
 Execution Time: 12.511 ms
(13 rows)

                                                               QUERY PLAN                                                               
----------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=52.34..1833.55 rows=500 width=9) (actual time=9.154..13.621 rows=500 loops=1)
   Buffers: shared hit=4 read=435
   ->  Bitmap Heap Scan on table0  (cost=52.34..4394.93 rows=1219 width=9) (actual time=9.153..13.563 rows=500 loops=1)
         Recheck Cond: ((col0)::text ~~ '%abc%'::text)
         Heap Blocks: exact=428
         Buffers: shared hit=4 read=435
         ->  Bitmap Index Scan on col0_gin_trgm_idx  (cost=0.00..52.04 rows=1219 width=0) (actual time=7.141..7.141 rows=20852 loops=1)
               Index Cond: ((col0)::text ~~ '%abc%'::text)
               Buffers: shared hit=4 read=7
 Planning:
   Buffers: shared hit=51 read=13 dirtied=1
 Planning Time: 4.113 ms
 Execution Time: 14.018 ms
(13 rows)

                                                               QUERY PLAN                                                               
----------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=52.31..1833.95 rows=500 width=10) (actual time=10.079..13.249 rows=500 loops=1)
   Buffers: shared hit=4 read=435
   ->  Bitmap Heap Scan on table0  (cost=52.31..4374.58 rows=1213 width=10) (actual time=10.078..13.199 rows=500 loops=1)
         Recheck Cond: ((col0)::text ~~ '%abc%'::text)
         Heap Blocks: exact=428
         Buffers: shared hit=4 read=435
         ->  Bitmap Index Scan on col0_gin_trgm_idx  (cost=0.00..52.01 rows=1213 width=0) (actual time=8.108..8.108 rows=20852 loops=1)
               Index Cond: ((col0)::text ~~ '%abc%'::text)
               Buffers: shared hit=4 read=7
 Planning:
   Buffers: shared hit=51 read=13 dirtied=1
 Planning Time: 3.596 ms
 Execution Time: 13.682 ms
(13 rows)

                                                               QUERY PLAN                                                               
----------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=52.33..1833.75 rows=500 width=10) (actual time=7.323..12.126 rows=500 loops=1)
   Buffers: shared hit=4 read=435
   ->  Bitmap Heap Scan on table0  (cost=52.33..4384.76 rows=1216 width=10) (actual time=7.322..12.073 rows=500 loops=1)
         Recheck Cond: ((col0)::text ~~ '%abc%'::text)
         Heap Blocks: exact=428
         Buffers: shared hit=4 read=435
         ->  Bitmap Index Scan on col0_gin_trgm_idx  (cost=0.00..52.02 rows=1216 width=0) (actual time=5.526..5.527 rows=20852 loops=1)
               Index Cond: ((col0)::text ~~ '%abc%'::text)
               Buffers: shared hit=4 read=7
 Planning:
   Buffers: shared hit=51 read=13 dirtied=1
 Planning Time: 5.907 ms
 Execution Time: 12.485 ms
(13 rows)

                                                    QUERY PLAN                                                     
-------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.00..441.64 rows=500 width=9) (actual time=0.074..23.096 rows=500 loops=1)
   Buffers: shared read=1439
   ->  Seq Scan on table0  (cost=0.00..216610.51 rows=245232 width=9) (actual time=0.073..23.047 rows=500 loops=1)
         Filter: ((col0)::text ~~ '%abc%'::text)
         Rows Removed by Filter: 282649
         Buffers: shared read=1439
 Planning:
   Buffers: shared hit=51 read=13 dirtied=1
 Planning Time: 2.087 ms
 Execution Time: 23.218 ms
(10 rows)

我使用了dockerised Postgres 17来提供简单的可重复性。

重新启动docker容器，运行setup.sql，然后：

explain (analyze, buffers, settings)
select * from table0 where col0 like '%abc%' limit 500;

                                                               QUERY PLAN                                                               
----------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=52.32..1833.89 rows=500 width=9) (actual time=9.476..17.923 rows=500 loops=1)
   Buffers: shared hit=4 read=435
   ->  Bitmap Heap Scan on table0  (cost=52.32..4377.97 rows=1214 width=9) (actual time=9.475..17.858 rows=500 loops=1)
         Recheck Cond: ((col0)::text ~~ '%abc%'::text)
         Heap Blocks: exact=428
         Buffers: shared hit=4 read=435
         ->  Bitmap Index Scan on col0_gin_trgm_idx  (cost=0.00..52.01 rows=1214 width=0) (actual time=7.662..7.662 rows=20852 loops=1)
               Index Cond: ((col0)::text ~~ '%abc%'::text)
               Buffers: shared hit=4 read=7
 Planning:
   Buffers: shared hit=55 read=9 dirtied=1
 Planning Time: 5.597 ms
 Execution Time: 18.334 ms
(13 rows)

然而，我不明白为什么有时使用顺序扫描，有时使用索引扫描。

为什么查询计划似乎取决于查询和表设置之间的时间间隔？

下面我回答一下评论中出现的一些问题。

数据库中还有其他写入活动吗？
我没有发起任何写入活动，而且由于它只是我在本地机器上运行的一个容器，除非存在一些我不知道的自动化过程，否则不会有任何写入活动。

您的服务器有多忙？
我没有发现任何不正常的情况。这是一台 Apple M1 Pro，运行 MacOS Sonoma 14.5。

zabop

Asked: 2024-10-11 17:17:17 +0800 CST

我应该怎么做才能使返回 200 行至少与在索引列上执行部分匹配查询时返回 1000 行一样快？

我有一张 PostgreSQL 17 表table0。它有一个col0包含 2 到 16 个字符的随机字符串的列。我想执行部分匹配查询，因此我创建了一个 GIN 索引gin_trgm_ops。令我惊讶的是，我发现我可以选择最多 1000 行包含的行，abc这比最多 200 行包含的行快得多abc。可重现的设置：

使用 Docker启动数据库：

docker run \
    --name postgres-db \
    -e POSTGRES_DB=postgres \
    -e POSTGRES_USER=postgres \
    -e POSTGRES_PASSWORD=mysecretpassword \
    -p 5432:5432\
    -d postgres

我使用DBeaver执行查询或将其保存到query.sql然后：

PGPASSWORD=mysecretpassword psql \
    -h localhost \
    -p 5432 \
    -U postgres \
    -d postgres \
    -v ON_ERROR_STOP=1 \
    -f query.sql

设置表格：

create table public.table0 (
    col0 varchar(25)
);

select setseed(0.12343);

insert into table0 (col0)
select substring(md5(random()::text), 1, (2 + (random() * 14))::int)
from generate_series(1, 12345678);

create extension pg_trgm;
create index col0_gin_trgm_idx on table0 using gin (col0 gin_trgm_ops);

vacuum (full, analyze) table0;

检查选择 200 行数据的执行计划和运行时间，其中包含abc：

explain analyze
select * from table0 where col0 like '%abc%' limit 200;

输出：

                                                    QUERY PLAN                                                     
-------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.00..352.27 rows=200 width=9) (actual time=0.672..49.646 rows=200 loops=1)
   ->  Seq Scan on table0  (cost=0.00..216621.29 rows=122985 width=9) (actual time=0.671..49.599 rows=200 loops=1)
         Filter: ((col0)::text ~~ '%abc%'::text)
         Rows Removed by Filter: 114081
 Planning Time: 4.540 ms
 Execution Time: 50.960 ms
(6 rows)

检查选择最多 1000 行的执行计划和运行时间，其中包含abc：

explain analyze
select * from table0 where col0 like '%abc%' limit 1000;

输出：

                                                                 QUERY PLAN                                                                  
---------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=848.36..1369.03 rows=1000 width=9) (actual time=17.373..26.987 rows=1000 loops=1)
   ->  Bitmap Heap Scan on table0  (cost=848.36..64883.21 rows=122985 width=9) (actual time=17.371..26.931 rows=1000 loops=1)
         Recheck Cond: ((col0)::text ~~ '%abc%'::text)
         Heap Blocks: exact=846
         ->  Bitmap Index Scan on col0_gin_trgm_idx  (cost=0.00..817.62 rows=122985 width=0) (actual time=14.689..14.690 rows=21318 loops=1)
               Index Cond: ((col0)::text ~~ '%abc%'::text)
 Planning Time: 2.165 ms
 Execution Time: 27.356 ms
(8 rows)

可以看出，当我使用时LIMIT 200，引擎执行Seq Scan on table0，但是当我使用时LIMIT 1000，Bitmap Index Scan on col0_gin_trgm_idx则使用。这从表面上解释了为什么使用的查询LIMIT 200花费了 4.540+50.960= 55.5 毫秒，而LIMIT 1000查询花费的时间更少，为 2.165+27.356=29.521 毫秒。

我读到（参见此或此），理想情况下，我不应该在生产环境中强制使用索引。天真地讲，使用索引搜索包含的 200 行似乎abc比当前使用的顺序扫描更快，因为使用索引搜索 1000 行比使用顺序扫描搜索 200 行更快。

在我的实际场景中（在AWS RDS上运行的 Aurora PostgreSQL实例），这种差异是有限的：当我需要从表中选择 25 行时，选择其中的 100 行要快得多，然后只需通过其他方式过滤这 100 行（或者我可以修改应用程序，以便选择 100 行而不是 25 行也是可以的）。postgresql

我想知道我是否在索引方面做了一些不太优化的事情，或者我遗漏了一些东西。

我应该怎么做才能使查询LIMIT 200至少与查询一样快LIMIT 1000？

我主要对建议在生产环境中使用的方法感兴趣。可以肯定地说，table0永远不需要修改就可以编辑其内容。

kadircancetin

Asked: 2024-10-10 17:34:28 +0800 CST

WHERE A=x DISTINCT ON (B)，在 (A, B, C) 上有复合索引

我有一张带有复合索引的巨大表格(A, B, C)。

-- psql (13.16 (Debian 13.16-0+deb11u1), server 14.12)

\d index_a_b_c
         Index "public.index_a_b_c"
  Column  |         Type          | Key? | 
----------+-----------------------+------+
 A        | character varying(44) | yes  |
 B        | numeric(20,0)         | yes  |
 C        | numeric(20,0)         | yes  |
btree, for table "public.table_a_b_c"

我需要所有不同的`B`。

此查询使用运行Index Only Scan，但会扫描所有A匹配项。这不适用于我的情况，因为对于某些As 来说，有数百万行。数百万Index Only Scan行很慢。

EXPLAIN (ANALYZE true) 
SELECT DISTINCT ON ("B") "B"
  FROM "table_a_b_c"
 WHERE "A" = 'astring'

-- Execution time: 0.172993s
-- Unique  (cost=0.83..105067.18 rows=1123 width=5) (actual time=0.037..19.468 rows=67 loops=1)
--  ->  Index Only Scan using index_a_b_c on table_a_b_c  (cost=0.83..104684.36 rows=153129 width=5) (actual time=0.036..19.209 rows=1702 loops=1)
--        Index Cond: (A = 'astring'::text)
--        Heap Fetches: 351
-- Planning Time: 0.091 ms
-- Execution Time: 19.499 ms

如您所见，运行超过 1.7k 行并手动过滤并返回 67 行。从 1.7k 到数百万，20ms 需要几十秒。

我还需要所有最大的`C`s 来表示不同的`B`s。

与1)相同。理论上，Postgres 可以知道可能的B，而不需要检查与匹配的整个列表A。

EXPLAIN (ANALYZE true)
SELECT DISTINCT ON ("B") *
  FROM "table_a_b_c"
 WHERE "A" = 'astring'
 ORDER BY "B" DESC,
          "C" DESC

-- Execution time: 0.822705s 
-- Unique  (cost=0.83..621264.51 rows=1123 width=247) (actual time=0.957..665.927 rows=67 loops=1)
--   ->  Index Scan using index_a_b_c on table_a_b_c  (cost=0.83..620881.69 rows=153130 width=247) (actual time=0.955..664.408 rows=1702 loops=1)
--         Index Cond: (a = 'astring'::text)
-- Planning Time: 0.116 ms
-- Execution Time: 665.978 ms

但例如，这很快：

SELECT * WHERE A="x" AND B=1 ORDER BY C DESC
  UNION
SELECT * WHERE A="x" AND B=2 ORDER BY C DESC
  UNION
....

对于所有可能的Bs。这就像次数循环B。

问题

a)从理论上讲，上的索引不应该(A, B, C)是的超集吗？对于不同来说会非常快。(A, B)(A, B)

Bb) 为什么对于 Postgres 来说很难找到不同的s？

c) 如果没有新索引该如何处理？

mtbossa

Asked: 2024-10-07 20:30:50 +0800 CST

相同的查询，相同的数据，一个服务器查询规划器花费 50000，其他花费 100

我在 AWS 中运行两个 Postgres 15（ x86_64-pc-linux-gnu 上的 PostgreSQL 15.7，由 gcc (GCC) 7.3.1 20180712 (Red Hat 7.3.1-12) 编译），64 位）RDS 实例，一个用于我的暂存环境，一个用于我的生产环境。我们正在运行一个查询，该查询在生产环境中比在暂存环境中花费的时间长得多。生产环境的数据甚至比暂存环境还要少（至少在选定/连接的表中）。此外，生产环境的使用并不频繁，我们处于早期测试阶段，因此基本上只有一个人在晚上使用它进行测试。

这是查询：

SELECT arenas.id,
       arenas.display_name,
       arenas.cover_image_path,
       arenas.slug,
       addresses.zip_code,
       addresses.street,
       addresses.number,
       addresses.complement,
       addresses.district,
       addresses.latitude,
       addresses.longitude,
       cities.NAME                                       AS city_name,
       states.NAME                                       AS state_name,
       states.uf                                         AS state_uf,
       Array_to_json(Array_agg(DISTINCT ss2.sport_code)) AS available_sports,
       Earth_distance(Ll_to_earth (addresses.latitude, addresses.longitude),
       Ll_to_earth (-10.5555, -41.2751))                 AS
       meters_distance_between_user_and_arena
FROM   "arenas"
       INNER JOIN "addresses"
               ON "arenas"."id" = "addresses"."addressable_id"
                  AND "addresses"."addressable_type" = 'App\Models\Arena'
                  AND "addresses"."type" = 'COMMERCIAL'
                  AND Earth_distance(Ll_to_earth (addresses.latitude,
                                     addresses.longitude),
                          Ll_to_earth (-10.5555, -41.2751)) < 20000
       INNER JOIN "services"
               ON "services"."arena_id" = "arenas"."id"
                  AND "services"."status" = 'A'
                  AND "services"."deleted_at" IS NULL
                  AND "is_private" = false
       INNER JOIN "service_sport"
               ON "service_sport"."service_id" = "services"."id"
                  AND "service_sport"."sport_code" = 'BEACH_TENNIS'
       INNER JOIN "service_prices"
               ON "service_prices"."service_id" = "services"."id"
                  AND "service_prices"."is_default" = true
       INNER JOIN "field_service"
               ON "field_service"."service_id" = "services"."id"
       INNER JOIN "fields"
               ON "fields"."arena_id" = "arenas"."id"
                  AND "fields"."status" = 'A'
                  AND "fields"."deleted_at" IS NULL
       INNER JOIN "contacts"
               ON "contacts"."contactable_id" = "arenas"."id"
                  AND "contacts"."contactable_type" = 'App\Models\Arena'
                  AND "contacts"."is_main" = true
       INNER JOIN "field_time_slots"
               ON "field_time_slots"."arena_id" = "arenas"."id"
       INNER JOIN "cities"
               ON "cities"."ibge_code" = "addresses"."city_ibge_code"
       INNER JOIN "states"
               ON "states"."ibge_code" = "cities"."state_ibge_code"
       INNER JOIN "sports"
               ON "sports"."code" = "service_sport"."sport_code"
       INNER JOIN "service_sport" AS "ss2"
               ON "ss2"."arena_id" = "arenas"."id"
WHERE  "approved_at" IS NOT NULL
       AND EXISTS (SELECT *
                   FROM   "subscriptions"
                   WHERE  "arenas"."id" = "subscriptions"."arena_id"
                          AND "type" = 'access'
                          AND ( "ends_at" IS NULL
                                 OR ( "ends_at" IS NOT NULL
                                      AND "ends_at" > '2024-10-06 01:31:18' ) )
                          AND "stripe_status" != 'incomplete_expired'
                          AND "stripe_status" != 'unpaid'
                          AND "stripe_status" != 'past_due'
                          AND "stripe_status" != 'incomplete')
       AND "business_hours_data" IS NOT NULL
       AND "arenas"."deleted_at" IS NULL
GROUP  BY "arenas"."id",
          "arenas"."cover_image_path",
          "addresses"."latitude",
          "addresses"."longitude",
          "addresses"."zip_code",
          "addresses"."street",
          "addresses"."number",
          "addresses"."complement",
          "addresses"."district",
          "cities"."name",
          "states"."name",
          "states"."uf"
ORDER  BY "meters_distance_between_user_and_arena" ASC;

这是来自生产环境的EXPLAIN ANALYSE：

Sort  (cost=55657.12..55795.57 rows=55380 width=315) (actual time=563.084..563.104 rows=1 loops=1)
  Sort Key: (sec_to_gc(cube_distance((ll_to_earth((addresses.latitude)::double precision, (addresses.longitude)::double precision))::cube, '(3491544.0649759113, -4339378.172513269, -3108045.069568795)'::cube)))
  Sort Method: quicksort  Memory: 25kB
  ->  GroupAggregate  (cost=12417.08..43152.98 rows=55380 width=315) (actual time=563.077..563.097 rows=1 loops=1)
        Group Key: arenas.id, addresses.latitude, addresses.longitude, addresses.zip_code, addresses.street, addresses.number, addresses.complement, addresses.district, cities.name, states.name, states.uf
        ->  Sort  (cost=12417.08..12555.53 rows=55380 width=286) (actual time=222.049..445.141 rows=102240 loops=1)
              Sort Key: arenas.id, addresses.latitude, addresses.longitude, addresses.zip_code, addresses.street, addresses.number, addresses.complement, addresses.district, cities.name, states.name, states.uf
              Sort Method: external merge  Disk: 28144kB
              ->  Hash Join  (cost=17.39..668.95 rows=55380 width=286) (actual time=0.709..15.847 rows=102240 loops=1)
                    Hash Cond: (arenas.id = field_time_slots.arena_id)
                    ->  Hash Join  (cost=9.60..37.48 rows=260 width=382) (actual time=0.604..1.425 rows=480 loops=1)
                          Hash Cond: (arenas.id = ss2.arena_id)
                          ->  Nested Loop  (cost=7.39..32.21 rows=52 width=339) (actual time=0.523..1.121 rows=96 loops=1)
                                ->  Seq Scan on sports  (cost=0.00..1.16 rows=1 width=9) (actual time=0.006..0.009 rows=1 loops=1)
                                      Filter: (code = 'BEACH_TENNIS'::text)
                                      Rows Removed by Filter: 12
                                ->  Hash Join  (cost=7.39..30.53 rows=52 width=350) (actual time=0.515..1.070 rows=96 loops=1)
                                      Hash Cond: (cities.state_ibge_code = states.ibge_code)
                                      ->  Nested Loop  (cost=5.78..28.77 rows=52 width=340) (actual time=0.488..0.953 rows=96 loops=1)
                                            ->  Nested Loop  (cost=5.49..11.03 rows=52 width=332) (actual time=0.466..0.640 rows=96 loops=1)
                                                  Join Filter: (arenas.id = services.arena_id)
                                                  ->  Nested Loop  (cost=2.05..4.72 rows=4 width=305) (actual time=0.422..0.444 rows=4 loops=1)
                                                        Join Filter: (arenas.id = fields.arena_id)
                                                        ->  Nested Loop  (cost=2.05..3.62 rows=1 width=289) (actual time=0.412..0.417 rows=1 loops=1)
                                                              Join Filter: (arenas.id = addresses.addressable_id)
                                                              ->  Merge Join  (cost=2.05..2.08 rows=1 width=174) (actual time=0.023..0.027 rows=1 loops=1)
                                                                    Merge Cond: (arenas.id = contacts.contactable_id)
                                                                    ->  Sort  (cost=1.02..1.02 rows=1 width=158) (actual time=0.012..0.013 rows=1 loops=1)
                                                                          Sort Key: arenas.id
                                                                          Sort Method: quicksort  Memory: 25kB
                                                                          ->  Seq Scan on arenas  (cost=0.00..1.01 rows=1 width=158) (actual time=0.006..0.006 rows=1 loops=1)
                                                                                Filter: ((approved_at IS NOT NULL) AND (business_hours_data IS NOT NULL) AND (deleted_at IS NULL))
                                                                    ->  Sort  (cost=1.03..1.04 rows=1 width=16) (actual time=0.008..0.009 rows=1 loops=1)
                                                                          Sort Key: contacts.contactable_id
                                                                          Sort Method: quicksort  Memory: 25kB
                                                                          ->  Seq Scan on contacts  (cost=0.00..1.02 rows=1 width=16) (actual time=0.006..0.006 rows=1 loops=1)
                                                                                Filter: (is_main AND ((contactable_type)::text = 'App\Models\Arena'::text))
                                                                                Rows Removed by Filter: 1
                                                              ->  Seq Scan on addresses  (cost=0.00..1.52 rows=1 width=115) (actual time=0.386..0.387 rows=1 loops=1)
                                                                    Filter: (((addressable_type)::text = 'App\Models\Arena'::text) AND ((type)::text = 'COMMERCIAL'::text) AND (sec_to_gc(cube_distance((ll_to_earth((latitude)::double precision, (longitude)::double precision))::cube, '(3491544.0649759113, -4339378.172513269, -3108045.069568795)'::cube)) < '20000'::double precision))
                                                        ->  Seq Scan on fields  (cost=0.00..1.05 rows=4 width=16) (actual time=0.009..0.020 rows=4 loops=1)
                                                              Filter: ((deleted_at IS NULL) AND ((status)::text = 'A'::text))
                                                  ->  Materialize  (cost=3.43..5.57 rows=13 width=27) (actual time=0.011..0.035 rows=24 loops=4)
                                                        ->  Hash Join  (cost=3.43..5.50 rows=13 width=27) (actual time=0.041..0.092 rows=24 loops=1)
                                                              Hash Cond: (service_prices.service_id = service_sport.service_id)
                                                              ->  Seq Scan on service_prices  (cost=0.00..1.75 rows=36 width=8) (actual time=0.006..0.032 rows=36 loops=1)
                                                                    Filter: is_default
                                                                    Rows Removed by Filter: 39
                                                              ->  Hash  (cost=3.41..3.41 rows=2 width=51) (actual time=0.030..0.036 rows=3 loops=1)
                                                                    Buckets: 1024  Batches: 1  Memory Usage: 9kB
                                                                    ->  Hash Join  (cost=2.20..3.41 rows=2 width=51) (actual time=0.025..0.034 rows=3 loops=1)
                                                                          Hash Cond: (service_sport.service_id = services.id)
                                                                          ->  Hash Join  (cost=1.07..2.27 rows=3 width=27) (actual time=0.014..0.019 rows=3 loops=1)
                                                                                Hash Cond: (field_service.service_id = service_sport.service_id)
                                                                                ->  Seq Scan on field_service  (cost=0.00..1.13 rows=13 width=8) (actual time=0.003..0.004 rows=13 loops=1)
                                                                                ->  Hash  (cost=1.06..1.06 rows=1 width=19) (actual time=0.005..0.006 rows=1 loops=1)
                                                                                      Buckets: 1024  Batches: 1  Memory Usage: 9kB
                                                                                      ->  Seq Scan on service_sport  (cost=0.00..1.06 rows=1 width=19) (actual time=0.003..0.004 rows=1 loops=1)
                                                                                            Filter: (sport_code = 'BEACH_TENNIS'::text)
                                                                                            Rows Removed by Filter: 4
                                                                          ->  Hash  (cost=1.07..1.07 rows=4 width=24) (actual time=0.008..0.009 rows=4 loops=1)
                                                                                Buckets: 1024  Batches: 1  Memory Usage: 9kB
                                                                                ->  Seq Scan on services  (cost=0.00..1.07 rows=4 width=24) (actual time=0.004..0.006 rows=4 loops=1)
                                                                                      Filter: ((deleted_at IS NULL) AND (NOT is_private) AND ((status)::text = 'A'::text))
                                                                                      Rows Removed by Filter: 2
                                            ->  Memoize  (cost=0.29..8.31 rows=1 width=24) (actual time=0.001..0.001 rows=1 loops=96)
                                                  Cache Key: addresses.city_ibge_code
                                                  Cache Mode: logical
                                                  Hits: 95  Misses: 1  Evictions: 0  Overflows: 0  Memory Usage: 1kB
                                                  ->  Index Scan using cities_ibge_code_unique on cities  (cost=0.28..8.30 rows=1 width=24) (actual time=0.016..0.016 rows=1 loops=1)
                                                        Index Cond: (ibge_code = addresses.city_ibge_code)
                                      ->  Hash  (cost=1.27..1.27 rows=27 width=16) (actual time=0.020..0.020 rows=27 loops=1)
                                            Buckets: 1024  Batches: 1  Memory Usage: 10kB
                                            ->  Seq Scan on states  (cost=0.00..1.27 rows=27 width=16) (actual time=0.006..0.010 rows=27 loops=1)
                          ->  Hash  (cost=2.15..2.15 rows=5 width=43) (actual time=0.076..0.078 rows=5 loops=1)
                                Buckets: 1024  Batches: 1  Memory Usage: 9kB
                                ->  Nested Loop  (cost=1.03..2.15 rows=5 width=43) (actual time=0.070..0.074 rows=5 loops=1)
                                      Join Filter: (ss2.arena_id = subscriptions.arena_id)
                                      ->  HashAggregate  (cost=1.03..1.04 rows=1 width=16) (actual time=0.060..0.061 rows=1 loops=1)
                                            Group Key: subscriptions.arena_id
                                            Batches: 1  Memory Usage: 24kB
                                            ->  Seq Scan on subscriptions  (cost=0.00..1.02 rows=1 width=16) (actual time=0.008..0.009 rows=1 loops=1)
                                                  Filter: (((ends_at IS NULL) OR ((ends_at IS NOT NULL) AND (ends_at > '2024-10-06 01:31:18'::timestamp without time zone))) AND ((stripe_status)::text <> 'incomplete_expired'::text) AND ((stripe_status)::text <> 'unpaid'::text) AND ((stripe_status)::text <> 'past_due'::text) AND ((stripe_status)::text <> 'incomplete'::text) AND ((type)::text = 'access'::text))
                                      ->  Seq Scan on service_sport ss2  (cost=0.00..1.05 rows=5 width=27) (actual time=0.006..0.007 rows=5 loops=1)
                    ->  Hash  (cost=5.13..5.13 rows=213 width=16) (actual time=0.098..0.099 rows=213 loops=1)
                          Buckets: 1024  Batches: 1  Memory Usage: 18kB
                          ->  Seq Scan on field_time_slots  (cost=0.00..5.13 rows=213 width=16) (actual time=0.023..0.055 rows=213 loops=1)
Planning Time: 8.780 ms
Execution Time: 568.033 ms

这是来自暂存环境的EXPLAIN ANALYSE：

Sort  (cost=102.30..102.31 rows=2 width=405) (actual time=85.416..85.430 rows=1 loops=1)
  Sort Key: (sec_to_gc(cube_distance((ll_to_earth((addresses.latitude)::double precision, (addresses.longitude)::double precision))::cube, '(3491544.0649759113, -4339378.172513269, -3108045.069568795)'::cube)))
  Sort Method: quicksort  Memory: 25kB
  ->  GroupAggregate  (cost=101.18..102.29 rows=2 width=405) (actual time=85.406..85.420 rows=1 loops=1)
        Group Key: arenas.id, addresses.latitude, addresses.longitude, addresses.zip_code, addresses.street, addresses.number, addresses.complement, addresses.district, cities.name, states.name, states.uf
        ->  Sort  (cost=101.18..101.19 rows=2 width=397) (actual time=65.212..66.575 rows=10800 loops=1)
              Sort Key: arenas.id, addresses.latitude, addresses.longitude, addresses.zip_code, addresses.street, addresses.number, addresses.complement, addresses.district, cities.name, states.name, states.uf
              Sort Method: quicksort  Memory: 3448kB
              ->  Nested Loop  (cost=72.78..101.17 rows=2 width=397) (actual time=34.249..43.485 rows=10800 loops=1)
                    ->  Index Only Scan using sports_pkey on sports  (cost=0.15..8.17 rows=1 width=32) (actual time=0.019..0.024 rows=1 loops=1)
                          Index Cond: (code = 'BEACH_TENNIS'::text)
                          Heap Fetches: 1
                    ->  Hash Join  (cost=72.63..92.98 rows=2 width=429) (actual time=34.228..41.163 rows=10800 loops=1)
                          Hash Cond: (ss2.arena_id = arenas.id)
                          ->  Seq Scan on service_sport ss2  (cost=0.00..17.50 rows=750 width=48) (actual time=0.004..0.011 rows=5 loops=1)
                          ->  Hash  (cost=72.62..72.62 rows=1 width=493) (actual time=34.210..34.221 rows=2700 loops=1)
                                Buckets: 4096 (originally 1024)  Batches: 1 (originally 1)  Memory Usage: 1053kB
                                ->  Nested Loop  (cost=50.45..72.62 rows=1 width=493) (actual time=0.641..31.447 rows=2700 loops=1)
                                      ->  Nested Loop  (cost=50.30..72.44 rows=1 width=452) (actual time=0.629..23.566 rows=2700 loops=1)
                                            ->  Nested Loop Semi Join  (cost=50.01..64.14 rows=1 width=468) (actual time=0.617..7.491 rows=2700 loops=1)
                                                  Join Filter: (arenas.id = subscriptions.arena_id)
                                                  ->  Hash Join  (cost=49.88..61.77 rows=8 width=452) (actual time=0.605..2.421 rows=2700 loops=1)
                                                        Hash Cond: (field_time_slots.arena_id = arenas.id)
                                                        ->  Seq Scan on field_time_slots  (cost=0.00..10.23 rows=423 width=16) (actual time=0.004..0.047 rows=423 loops=1)
                                                        ->  Hash  (cost=49.86..49.86 rows=1 width=436) (actual time=0.589..0.596 rows=18 loops=1)
                                                              Buckets: 1024  Batches: 1  Memory Usage: 14kB
                                                              ->  Nested Loop  (cost=9.47..49.86 rows=1 width=436) (actual time=0.317..0.582 rows=18 loops=1)
                                                                    Join Filter: (arenas.id = contacts.contactable_id)
                                                                    Rows Removed by Join Filter: 18
                                                                    ->  Nested Loop  (cost=9.47..48.81 rows=1 width=420) (actual time=0.312..0.543 rows=18 loops=1)
                                                                          Join Filter: (services.id = service_sport.service_id)
                                                                          ->  Nested Loop  (cost=9.32..48.58 rows=1 width=412) (actual time=0.299..0.474 rows=58 loops=1)
                                                                                Join Filter: (services.id = field_service.service_id)
                                                                                ->  Nested Loop  (cost=0.57..34.38 rows=1 width=404) (actual time=0.289..0.377 rows=34 loops=1)
                                                                                      Join Filter: (arenas.id = fields.arena_id)
                                                                                      Rows Removed by Join Filter: 30
                                                                                      ->  Nested Loop  (cost=0.42..26.21 rows=1 width=388) (actual time=0.279..0.335 rows=16 loops=1)
                                                                                            Join Filter: (services.id = service_prices.service_id)
                                                                                            Rows Removed by Join Filter: 76
                                                                                            ->  Nested Loop  (cost=0.42..25.01 rows=1 width=380) (actual time=0.272..0.305 rows=4 loops=1)
                                                                                                  Join Filter: (addresses.addressable_id = arenas.id)
                                                                                                  ->  Nested Loop  (cost=0.28..16.84 rows=1 width=268) (actual time=0.262..0.288 rows=4 loops=1)
                                                                                                        Join Filter: (addresses.addressable_id = services.arena_id)
                                                                                                        Rows Removed by Join Filter: 4
                                                                                                        ->  Index Scan using addresses_addressable_type_addressable_id_index on addresses  (cost=0.14..8.67 rows=1 width=244) (actual time=0.254..0.272 rows=2 loops=1)
                                                                                                              Index Cond: ((addressable_type)::text = 'App\Models\Arena'::text)
                                                                                                              Filter: (((type)::text = 'COMMERCIAL'::text) AND (sec_to_gc(cube_distance((ll_to_earth((latitude)::double precision, (longitude)::double precision))::cube, '(3491544.0649759113, -4339378.172513269, -3108045.069568795)'::cube)) < '20000'::double precision))
                                                                                                        ->  Index Scan using services_arena_id_name_deleted_at_unique on services  (cost=0.14..8.16 rows=1 width=24) (actual time=0.004..0.006 rows=4 loops=2)
                                                                                                              Filter: ((NOT is_private) AND ((status)::text = 'A'::text))
                                                                                                              Rows Removed by Filter: 1
                                                                                                  ->  Index Scan using arenas_pkey on arenas  (cost=0.14..8.16 rows=1 width=112) (actual time=0.003..0.003 rows=1 loops=4)
                                                                                                        Index Cond: (id = services.arena_id)
                                                                                                        Filter: ((approved_at IS NOT NULL) AND (business_hours_data IS NOT NULL) AND (deleted_at IS NULL))
                                                                                            ->  Seq Scan on service_prices  (cost=0.00..1.12 rows=6 width=8) (actual time=0.002..0.004 rows=23 loops=4)
                                                                                                  Filter: is_default
                                                                                      ->  Index Scan using fields_arena_id_name_deleted_at_unique on fields  (cost=0.14..8.16 rows=1 width=16) (actual time=0.001..0.002 rows=4 loops=16)
                                                                                            Filter: ((status)::text = 'A'::text)
                                                                                ->  Bitmap Heap Scan on field_service  (cost=8.76..14.14 rows=5 width=8) (actual time=0.001..0.001 rows=2 loops=34)
                                                                                      Recheck Cond: (service_id = service_prices.service_id)
                                                                                      Heap Blocks: exact=34
                                                                                      ->  Bitmap Index Scan on field_service_arena_id_service_id_field_id_unique  (cost=0.00..8.76 rows=5 width=0) (actual time=0.001..0.001 rows=2 loops=34)
                                                                                            Index Cond: (service_id = service_prices.service_id)
                                                                          ->  Index Only Scan using service_sport_service_id_sport_code_unique on service_sport  (cost=0.15..0.22 rows=1 width=40) (actual time=0.001..0.001 rows=0 loops=58)
                                                                                Index Cond: ((service_id = field_service.service_id) AND (sport_code = 'BEACH_TENNIS'::text))
                                                                                Heap Fetches: 18
                                                                    ->  Seq Scan on contacts  (cost=0.00..1.04 rows=1 width=16) (actual time=0.001..0.001 rows=2 loops=18)
                                                                          Filter: (is_main AND ((contactable_type)::text = 'App\Models\Arena'::text))
                                                                          Rows Removed by Filter: 1
                                                  ->  Index Scan using subscriptions_arena_id_stripe_status_index on subscriptions  (cost=0.14..0.28 rows=1 width=16) (actual time=0.001..0.001 rows=1 loops=2700)
                                                        Index Cond: (arena_id = field_time_slots.arena_id)
                                                        Filter: (((ends_at IS NULL) OR ((ends_at IS NOT NULL) AND (ends_at > '2024-10-06 01:31:18'::timestamp without time zone))) AND ((stripe_status)::text <> 'incomplete_expired'::text) AND ((stripe_status)::text <> 'unpaid'::text) AND ((stripe_status)::text <> 'past_due'::text) AND ((stripe_status)::text <> 'incomplete'::text) AND ((type)::text = 'access'::text))
                                            ->  Index Scan using cities_ibge_code_unique on cities  (cost=0.28..8.30 rows=1 width=24) (actual time=0.005..0.005 rows=1 loops=2700)
                                                  Index Cond: (ibge_code = addresses.city_ibge_code)
                                      ->  Index Scan using states_pkey on states  (cost=0.15..0.18 rows=1 width=56) (actual time=0.002..0.002 rows=1 loops=2700)
                                            Index Cond: (ibge_code = cities.state_ibge_code)
Planning Time: 6.426 ms
Execution Time: 85.641 ms

知道为什么会发生这种情况吗？我没有关注性能，因为我们处于早期阶段，而且我认为现在这无关紧要，因为我们的数据太少了。

我们已经尝试将服务器实例从 db.t3.micro 升级到 db.t3.small，但没有任何变化。我们还尝试在另一个可用区中恢复它，但什么也没发生。我尝试在本地恢复生产转储并运行查询，它的成本为 6000，但仍然比 50000 少很多。在本地开发环境中运行查询时，它的成本也是 100。

显然，我可以重写并改进此查询，我很快就会这样做。但我真的很想了解这里发生了什么。

编辑：数据量

WITH tbl AS
  (SELECT table_schema,
          TABLE_NAME
   FROM information_schema.tables
   WHERE TABLE_NAME not like 'pg_%'
     AND table_schema in ('public'))
SELECT sum((xpath('/row/c/text()', query_to_xml(format('select count(*) as c from %I.%I', table_schema, TABLE_NAME), FALSE, TRUE, '')))[1]::text::int) AS rows_n
FROM tbl
ORDER BY rows_n DESC;

每个表中的行数产量：https://pastebin.com/7gj5PxRw

每个表中的行数暂存量：https ://pastebin.com/507x3mP0

编辑 2：Laurenz 的建议确实有所帮助，执行时间也得到了改善，但是，我仍然不明白为什么该计划有这么多行。我真的很想深入研究一下。目前，我已将此查询分为 2 个不同的查询，性能急剧提高。

Di Si

Asked: 2024-10-09 02:34:40 +0800 CST

是否可以使用 pg_createcluster 将 PostgreSQL 热备用数据库添加到使用 initdb 创建的主数据库？

您好，我们在 Ubuntu 22 LTS 上使用 PostgreSQL 16，并使用 initdb 构建了一个集群。我想知道是否可以使用 pg_createcluster 重建备用服务器，以利用各种管理工具并将其加入集群，以便我们最终可以进行故障转移（我们使用 repmgr），然后删除旧节点并使用 pg_createcluster 重建它们。

据我所知，pg_createcluster 本质上是 initdb 的包装器。不幸的是，在生产中，简单地重建整个集群是行不通的，所以我们的选择是要么使用 initdb 构建它，要么添加/删除节点，直到使用 pg_createcluster 重建整个集群。

M.A. Heshmat Khah

Asked: 2024-10-09 00:55:28 +0800 CST

PostgreSQL 无重音和阿拉伯语/波斯语全文搜索

我正在使用一款使用 PostgreSQL 作为数据库的应用程序，它使用扩展unaccent来规范化文本。我想通过修改unaccent.rules文件来改进其搜索功能。

我编辑/usr/share/postgresql/16/tsearch_data/unaccent.rules并添加了一些阿拉伯语 Unicode 块的规则（U+0600至U+06ff）：

并且运行良好。

SELECT unaccent('سَلام ۱۳۲');
 unaccent
----------
 سلام 123
(1 row)

问题 1：

问题出在零宽度非连接符（ZWNJ- U+200C），应该用空格（U+0020）代替。

سَلام‌علیکم->سلام علیکم

我尝试过的：

我尝试了以下行，但是不起作用或者给出错误：

"‌" " "：（invalid syntax: more than two strings in unaccent rule警告）+它不起作用。
‌ " "：（invalid syntax: more than two strings in unaccent rule警告）+它不起作用。
\u200C \u0020：由ChatGPT建议，但是没有起作用。
\u200C " "：由ChatGPT建议，但是没有起作用。

注1：

上面的前两行中，有一个不可见的 ZWNJ 字符，在 VIM 中显示为 <200c>，但在本文中看不到。

注2：

我没有同时添加所有这些行，而是逐一尝试。

注3：

没有其他关于 ZWNJ 的规则unaccent.rules

问题2：

有没有办法添加新规则文件而不是编辑默认文件？我无法编辑应用程序源代码并更改查询。

添加类似/usr/share/postgresql/16/tsearch_data/arabic.stop或/usr/share/postgresql/16/tsearch_data/arabic.rules重新启动服务是否可以使 PostgreSQL 理解它？

是否需要运行一些查询来重新加载文件？

是否需要改变应用程序请求搜索的方式？

Jeff

Asked: 2024-10-08 07:33:56 +0800 CST

SQL 查询和 max_stack_depth 之间有什么关系？

关于的大多数问题max_stack_depth都是关于如何绕过它，或者增加它等等。我知道默认限制是 2MB，不能设置为高于ulimit -s。我想知道 Postgres 究竟是如何确定查询超出上述限制的。

它是解释计划中的节点数吗？原始 SQL 查询中的字符数（参数后，有或没有绑定参数）？等等。

例如我想进行如下查询

SELECT a, b, c 
FROM table 
WHERE x IN (1, 2, 3);

并将其转换为数值（字节），该值可以告诉我距离max_stack_depth我有多远。

philomathic_life

Asked: 2024-10-08 04:37:12 +0800 CST

是否可以定义一个可空复合类型，其字段不为空？

我想定义一个复合类型，其字段为NOT NULL；同时允许值本身位于NULL表列中。我的第一个尝试是DOMAIN在复合类型上定义一个，并设置一个CHECK约束，以确保字段为NOT NULL；不幸的是，这会阻止NULL其自身被放入INSERT表中：

BEGIN;
    CREATE TYPE foo AS (x int, y int);
    CREATE DOMAIN non_null_foo AS foo CHECK((VALUE).x IS NOT NULL AND (VALUE).y IS NOT NULL);
    CREATE TABLE bar(y non_null_foo);
    INSERT INTO bar VALUES (NULL);
ROLLBACK;

错误：ERROR: value for domain non_null_foo violates check constraint "non_null_foo_check"。

我的第二次尝试是允许NULL在VALUE中DOMAIN，但这也不起作用，因为它现在允许所有字段都是的值NULL：

BEGIN;
    CREATE TYPE foo AS (x int, y int);
    CREATE DOMAIN non_null_foo AS foo CHECK(VALUE IS NULL OR ((VALUE).x IS NOT NULL AND (VALUE).y IS NOT NULL));
    CREATE TABLE bar(y non_null_foo);
    INSERT INTO bar VALUES ((NULL, NULL)); --succeeds
    INSERT INTO bar VALUES ((1, NULL)); --fails
ROLLBACK;

就好像 Postgresql 无法区分NULL所有字段都是的值NULL。我是不是漏掉了什么？

user1708730

Asked: 2024-10-05 22:09:39 +0800 CST

最后一天的部分值的滚动总和

我不知道如何编写一个查询，根据另一个表中的规则按项目返回滚动总和。

下面是按时间顺序列出特定日期某一商品的库存价值的表格。

表 1：库存

物品	库存	日期
刀刃	10	2020 年 1 月 3 日
刀刃	20	2020 年 1 月 4 日
刀刃	三十	2020 年 1 月 5 日
刀刃	40	2020 年 1 月 6 日
刀刃	50	2020 年 1 月 7 日
刀刃	60	2020 年 1 月 8 日
刀刃	70	2020 年 1 月 9 日
桌子	10	2020 年 1 月 3 日
桌子	20	2020 年 1 月 4 日
桌子	三十	2020 年 1 月 5 日
桌子	40	2020 年 1 月 6 日
桌子	50	2020 年 1 月 7 日
桌子	60	2020 年 1 月 8 日
桌子	70	2020 年 1 月 9 日

另一个表针对每个项目有两条规则，即使用多少天来计算滚动总和值。

表 2：规则

物品	规则	价值
刀刃	累计总数	2.5
刀刃	lead_sum	2.5
桌子	累计总数	3
桌子	lead_sum	3

输出： cum_sum：对于 Balde，日期 - 2020 年 1 月 3 日，规则为 2.5，因此值 = 10+20+30 * 0.5 lead_sum：对于 Balde，日期 - 2020 年 1 月 3 日，规则为 2.5，因此值 = 20+30+40 * 0.5

我如何编写查询来考虑最后日期的部分值。

物品	库存	日期	累计总数	lead_sum
刀刃	10	2020 年 1 月 3 日	四十五	70
刀刃	20	2020 年 1 月 4 日	70	95
刀刃	三十	2020 年 1 月 5 日	95	120
刀刃	40	2020 年 1 月 6 日	120	145
刀刃	50	2020 年 1 月 7 日	145	130
刀刃	60	2020 年 1 月 8 日	130	70
刀刃	70	2020 年 1 月 9 日	70	0


桌子	10	2020 年 1 月 3 日	60	90
桌子	20	2020 年 1 月 4 日	90	120
桌子	三十	2020 年 1 月 5 日	120	150
桌子	40	2020 年 1 月 6 日	150	180
桌子	50	2020 年 1 月 7 日	180	130
桌子	60	2020 年 1 月 8 日	130	70
桌子	70	2020 年 1 月 9 日	70	0

https://sqlfiddle.com/postgresql/online-compiler?id=c87e6a47-0949-4781-b8b5-3559929a063d

如何预测查询计划是否涉及使用索引？

查询计划似乎取决于查询和表设置之间的时间间隔？

我应该怎么做才能使返回 200 行至少与在索引列上执行部分匹配查询时返回 1000 行一样快？

WHERE A=x DISTINCT ON (B)，在 (A, B, C) 上有复合索引

我需要所有不同的`B`。

我还需要所有最大的`C`s 来表示不同的`B`s。

问题

相同的查询，相同的数据，一个服务器查询规划器花费 50000，其他花费 100

是否可以使用 pg_createcluster 将 PostgreSQL 热备用数据库添加到使用 initdb 创建的主数据库？

PostgreSQL 无重音和阿拉伯语/波斯语全文搜索

问题 1：

我尝试过的：

问题2：

SQL 查询和 max_stack_depth 之间有什么关系？

是否可以定义一个可空复合类型，其字段不为空？

最后一天的部分值的滚动总和

连接到 PostgreSQL 服务器：致命：主机没有 pg_hba.conf 条目

如何让sqlplus的输出出现在一行中？

选择具有最大日期或最晚日期的日期

如何列出 PostgreSQL 中的所有模式？

列出指定表的所有列

如何在不修改我自己的 tnsnames.ora 的情况下使用 sqlplus 连接到位于另一台主机上的 Oracle 数据库

你如何mysqldump特定的表？

使用 psql 列出数据库权限

如何从 PostgreSQL 中的选择查询中将值插入表中？

如何使用 psql 列出所有数据库和表？

问题[postgresql](dba)

我需要所有不同的B。

我还需要所有最大的Cs 来表示不同的Bs。

问题

问题 1：

我尝试过的：

问题2：

我需要所有不同的`B`。

我还需要所有最大的`C`s 来表示不同的`B`s。