James Healy提出的问题 -dba

James Healy

Asked: 2022-10-14 20:21:22 +0800 CST

为什么有序分区扫描没有像我预期的那样频繁发生？

5

我在 postgres 13 中有一个表，声明范围按 ID 分区。

我正在按 ID 降序选择少量行，我希望它能够可靠地使用有序分区扫描，其中以相反的顺序搜索分区，直到找到所需的行数。我知道几乎在每种情况下，结果都会在最近的分区中找到，而旧的分区可以跳过。

我有时会得到有序的分区扫描，但有时规划器决定查询每个分区，它的性能比只检查最近的分区要差。我有兴趣了解原因，如果有什么我可以做的来影响计划者

测试设置：

CREATE SEQUENCE public.measurements_id_seq
    START WITH 1
    INCREMENT BY 1
    NO MINVALUE
    NO MAXVALUE
    CACHE 1;

ALTER SEQUENCE measurements_id_seq RESTART WITH 1;
    
CREATE TABLE measurements (
    id integer DEFAULT nextval('public.measurements_id_seq'::regclass) PRIMARY KEY,
    uuid uuid NOT NULL,
    num integer NOT NULL,
    created_at timestamp without time zone NOT NULL
)
PARTITION BY RANGE (id);

CREATE INDEX ON measurements (num);
CREATE INDEX ON measurements (uuid);


CREATE TABLE measurements_p0 PARTITION OF measurements FOR VALUES FROM (0) TO (1000000);
CREATE TABLE measurements_p1 PARTITION OF measurements FOR VALUES FROM (1000000) TO (2000000);
CREATE TABLE measurements_p2 PARTITION OF measurements FOR VALUES FROM (2000000) TO (3000000);
CREATE TABLE measurements_p3 PARTITION OF measurements FOR VALUES FROM (3000000) TO (4000000);
CREATE TABLE measurements_p4 PARTITION OF measurements FOR VALUES FROM (4000000) TO (5000000);
CREATE TABLE measurements_p5 PARTITION OF measurements FOR VALUES FROM (5000000) TO (6000000);

然后我插入批量样本数据，包含 100 个随机数（每个占表的 1%）和 10 个随机 UUID（每个占表的 10%）：

with uuids AS (
  select gen_random_uuid() as uuid from generate_series(1, 10) s(i)
)
insert into measurements (
    num, uuid, created_at
)
select
    random() * 100, 
    (select array_agg(uuid) from uuids)[floor(random() * 10 + 1)],
    clock_timestamp()
from generate_series(1, 4999999) s(i);

最后，我添加了一些额外的样本数据，其 UUID 远小于行数的 10%，然后分析：

with uuids AS (
  select gen_random_uuid() as uuid from generate_series(1, 10) s(i)
)
insert into measurements (
    num, uuid, created_at
)
select
    random() * 100, 
    (select array_agg(uuid) from uuids)[floor(random() * 10 + 1)],
    clock_timestamp()
from generate_series(1, 10000) s(i);

analyze measurements;

✅ 如果我选择 1% 的行，我会得到一个有序的分区扫描，按 id desc 排序

# explain (analyze, buffers) select * from measurements where num=5 order by id desc limit 4;
                                                                                  QUERY PLAN                                                                                  
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=2.41..16.66 rows=4 width=32) (actual time=0.266..0.540 rows=4 loops=1)
   Buffers: shared hit=11
   ->  Append  (cost=2.41..179806.71 rows=50464 width=32) (actual time=0.260..0.531 rows=4 loops=1)
         Buffers: shared hit=11
         ->  Index Scan Backward using measurements_p5_pkey on measurements_p5 measurements_6  (cost=0.29..372.29 rows=97 width=32) (actual time=0.257..0.525 rows=4 loops=1)
               Filter: (num = 5)
               Rows Removed by Filter: 533
               Buffers: shared hit=11
         ->  Index Scan Backward using measurements_p4_pkey on measurements_p4 measurements_5  (cost=0.42..35836.43 rows=8400 width=32) (never executed)
               Filter: (num = 5)
         ->  Index Scan Backward using measurements_p3_pkey on measurements_p3 measurements_4  (cost=0.42..35836.43 rows=10900 width=32) (never executed)
               Filter: (num = 5)
         ->  Index Scan Backward using measurements_p2_pkey on measurements_p2 measurements_3  (cost=0.42..35836.43 rows=9867 width=32) (never executed)
               Filter: (num = 5)
         ->  Index Scan Backward using measurements_p1_pkey on measurements_p1 measurements_2  (cost=0.42..35836.43 rows=10200 width=32) (never executed)
               Filter: (num = 5)
         ->  Index Scan Backward using measurements_p0_pkey on measurements_p0 measurements_1  (cost=0.42..35836.41 rows=11000 width=32) (never executed)
               Filter: (num = 5)
 Planning Time: 1.227 ms
 Execution Time: 0.827 ms

✅ 如果我选择 10% 的行，我会得到一个有序的分区扫描，按 id desc 排序：

# explain (analyze, buffers) select * from measurements where uuid='0a246187-edf6-44f3-8517-2a899667db0f' order by id desc limit 4;
                                                                                    QUERY PLAN                                                                                     
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=2.41..3.88 rows=4 width=32) (actual time=5.378..5.395 rows=4 loops=1)
   Buffers: shared hit=135
   ->  Append  (cost=2.41..182034.40 rows=496001 width=32) (actual time=5.374..5.389 rows=4 loops=1)
         Buffers: shared hit=135
         ->  Index Scan Backward using measurements_p5_pkey on measurements_p5 measurements_6  (cost=0.29..372.29 rows=1 width=32) (actual time=5.293..5.294 rows=0 loops=1)
               Filter: (uuid = '0a246187-edf6-44f3-8517-2a899667db0f'::uuid)
               Rows Removed by Filter: 10000
               Buffers: shared hit=131
         ->  Index Scan Backward using measurements_p4_pkey on measurements_p4 measurements_5  (cost=0.42..35836.43 rows=99400 width=32) (actual time=0.076..0.088 rows=4 loops=1)
               Filter: (uuid = '0a246187-edf6-44f3-8517-2a899667db0f'::uuid)
               Rows Removed by Filter: 42
               Buffers: shared hit=4
         ->  Index Scan Backward using measurements_p3_pkey on measurements_p3 measurements_4  (cost=0.42..35836.43 rows=100733 width=32) (never executed)
               Filter: (uuid = '0a246187-edf6-44f3-8517-2a899667db0f'::uuid)
         ->  Index Scan Backward using measurements_p2_pkey on measurements_p2 measurements_3  (cost=0.42..35836.43 rows=97567 width=32) (never executed)
               Filter: (uuid = '0a246187-edf6-44f3-8517-2a899667db0f'::uuid)
         ->  Index Scan Backward using measurements_p1_pkey on measurements_p1 measurements_2  (cost=0.42..35836.43 rows=97833 width=32) (never executed)
               Filter: (uuid = '0a246187-edf6-44f3-8517-2a899667db0f'::uuid)
         ->  Index Scan Backward using measurements_p0_pkey on measurements_p0 measurements_1  (cost=0.42..35836.41 rows=100467 width=32) (never executed)
               Filter: (uuid = '0a246187-edf6-44f3-8517-2a899667db0f'::uuid)
 Planning Time: 0.728 ms
 Execution Time: 5.630 ms

❌ 如果我选择的 UUID 在最终分区中只有少量行（按 id desc 排序），我不会得到有序分区扫描：

# explain (analyze, buffers) select * from measurements where uuid='1d58534d-c795-4f9b-b1d9-ab6316a8fb9a' order by id desc limit 4;
                                                                                 QUERY PLAN                                                                                 
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=164.31..164.32 rows=4 width=32) (actual time=1.952..1.960 rows=4 loops=1)
   Buffers: shared hit=91
   ->  Sort  (cost=164.31..166.80 rows=996 width=32) (actual time=1.949..1.954 rows=4 loops=1)
         Sort Key: measurements.id DESC
         Sort Method: top-N heapsort  Memory: 25kB
         Buffers: shared hit=91
         ->  Append  (cost=0.42..149.37 rows=996 width=32) (actual time=0.456..1.470 rows=991 loops=1)
               Buffers: shared hit=91
               ->  Index Scan using measurements_p0_uuid_idx on measurements_p0 measurements_1  (cost=0.42..8.39 rows=1 width=32) (actual time=0.057..0.058 rows=0 loops=1)
                     Index Cond: (uuid = '1d58534d-c795-4f9b-b1d9-ab6316a8fb9a'::uuid)
                     Buffers: shared hit=3
               ->  Index Scan using measurements_p1_uuid_idx on measurements_p1 measurements_2  (cost=0.42..8.41 rows=1 width=32) (actual time=0.073..0.073 rows=0 loops=1)
                     Index Cond: (uuid = '1d58534d-c795-4f9b-b1d9-ab6316a8fb9a'::uuid)
                     Buffers: shared hit=3
               ->  Index Scan using measurements_p2_uuid_idx on measurements_p2 measurements_3  (cost=0.42..8.40 rows=1 width=32) (actual time=0.038..0.038 rows=0 loops=1)
                     Index Cond: (uuid = '1d58534d-c795-4f9b-b1d9-ab6316a8fb9a'::uuid)
                     Buffers: shared hit=3
               ->  Index Scan using measurements_p3_uuid_idx on measurements_p3 measurements_4  (cost=0.42..8.40 rows=1 width=32) (actual time=0.047..0.047 rows=0 loops=1)
                     Index Cond: (uuid = '1d58534d-c795-4f9b-b1d9-ab6316a8fb9a'::uuid)
                     Buffers: shared hit=3
               ->  Index Scan using measurements_p4_uuid_idx on measurements_p4 measurements_5  (cost=0.42..8.44 rows=1 width=32) (actual time=0.041..0.042 rows=0 loops=1)
                     Index Cond: (uuid = '1d58534d-c795-4f9b-b1d9-ab6316a8fb9a'::uuid)
                     Buffers: shared hit=3
               ->  Bitmap Heap Scan on measurements_p5 measurements_6  (cost=15.97..102.35 rows=991 width=32) (actual time=0.195..0.961 rows=991 loops=1)
                     Recheck Cond: (uuid = '1d58534d-c795-4f9b-b1d9-ab6316a8fb9a'::uuid)
                     Heap Blocks: exact=74
                     Buffers: shared hit=76
                     ->  Bitmap Index Scan on measurements_p5_uuid_idx  (cost=0.00..15.72 rows=991 width=0) (actual time=0.146..0.146 rows=991 loops=1)
                           Index Cond: (uuid = '1d58534d-c795-4f9b-b1d9-ab6316a8fb9a'::uuid)
                           Buffers: shared hit=2
 Planning Time: 1.024 ms
 Execution Time: 2.280 ms

如果我将最后一个查询更改为仅针对最近的分区，我们可以看到它只命中少数几个缓冲区，并且将是订单分区扫描的一个很好的候选者：

# explain (analyze, buffers) select * from measurements_p5 where uuid='1d58534d-c795-4f9b-b1d9-ab6316a8fb9a' order by id desc limit 4;
                                                                        QUERY PLAN                                                                        
----------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.29..1.79 rows=4 width=32) (actual time=0.127..0.143 rows=4 loops=1)
   Buffers: shared hit=3
   ->  Index Scan Backward using measurements_p5_pkey on measurements_p5  (cost=0.29..372.29 rows=991 width=32) (actual time=0.123..0.137 rows=4 loops=1)
         Filter: (uuid = '1d58534d-c795-4f9b-b1d9-ab6316a8fb9a'::uuid)
         Rows Removed by Filter: 37
         Buffers: shared hit=3
 Planning Time: 0.245 ms
 Execution Time: 0.234 ms

我可以看到的一个区别是所有带有有序分区扫描的查询计划都使用主键索引，这也是分区键。有序分区扫描是否仅在计划者认为可以对分区键索引进行反向扫描时发生？

James Healy

Asked: 2021-10-26 23:22:07 +0800 CST

检测内联、内联压缩和 TOAST 存储

7

想象一下，我在 Postgres 13 中有一张这样的表：

CREATE TABLE public.people (
    id integer PRIMARY KEY,
    full_name character varying(255),
    bio text
);

然后我插入一行，其中包含足够的字符，以便将 bio 写入 TOAST 表（4000 个随机字节，应该压缩到 > 2Kb）：

# insert into people values (1, 'joe toast', (SELECT array_to_string(ARRAY(SELECT chr((65 + round(random() * 25)) :: integer) FROM generate_series(1,4000)), '')));
INSERT 0 1

然后插入一行，其中包含足够的字符用于 bio fit 内联（3000 个重复字节，应该压缩到 < 2Kb）：

# insert into people values (2, 'joe compressed', (SELECT array_to_string(ARRAY(SELECT chr(65) FROM generate_series(1,3000)), '')));
INSERT 0 1

最后在 bio 中插入一行只有几个字符的行，这样它将内联存储（10 个重复字节）：

# insert into people values (3, 'joe inline', 'aaaaaaaaaa');
INSERT 0 1

我有什么方法可以检测每个元组中 bio 的存储策略吗？我可以报告内联行或 TOAST 中的行的百分比（“22% 的元组存储内联生物，78% 在 TOAST 中”）？

一个相关的问题：我是否知道磁盘上按内联、内联压缩和 TOAST 存储分解的元组的字节数？

上下文：我正在使用一个总计超过 10 亿行的分区表，我想知道特定列的内联存储频率与 TOAST 存储的频率。

研究

我可以获得每个 bio 的磁盘大小，在一种情况下，它显然是内联压缩的大小：

# select id, full_name, pg_column_size(bio) from people order by id;
 id |   full_name    | pg_column_size 
----+----------------+----------------
  1 | joe toast      |           4000
  2 | joe compressed |             44
  3 | joe inline     |             11
(3 rows)

将该大小与未压缩数据的大小进行比较可以告诉我们一些关于压缩的信息，但是它可以告诉我们关于 TOAST 状态的任何信息吗？

# select id, full_name, pg_column_size(bio), length(bio) from people order by id;
 id |   full_name    | pg_column_size | length 
----+----------------+----------------+--------
  1 | joe toast      |           4000 |   4000
  2 | joe compressed |             44 |   3000
  3 | joe inline     |             11 |     10

我可以手动检查 TOAST 表中有一些行：

# select relname from pg_class where oid = (select reltoastrelid from pg_class where relname='people');
    relname     
----------------
 pg_toast_20138

# select chunk_id, sum(length(chunk_data)) from pg_toast.pg_toast_20138 group by chunk_id;
 chunk_id | sum  
----------+------
    20149 | 4000

在一般情况下，以下情况是否正确？

# select id, full_name, pg_column_size(bio), length(bio),
case
  when pg_column_size(bio) < length(bio) then 'inline-compressed'
  when pg_column_size(bio) = length(bio) then 'toast'
  else 
    'inline'
end as storage_strategy
from people order by id;

 id |   full_name    | pg_column_size | length | storage_strategy  
----+----------------+----------------+--------+-------------------
  1 | joe toast      |           4000 |   4000 | toast
  2 | joe compressed |             44 |   3000 | inline-compressed
  3 | joe inline     |             11 |     10 | inline

James Healy

Asked: 2021-10-26 18:57:39 +0800 CST

为 UPDATE 编写的 TOAST 行是否不会更改 TOASTable 列？

6

想象一下，我在 Postgres 13 中有一张这样的表：

CREATE TABLE public.people (
    id integer PRIMARY KEY,
    full_name character varying(255),
    bio text
);

然后我插入一行，其中包含足够的字符，以便将 bio 写入 TOAST 表：

# insert into people values (1, 'joe user', (SELECT array_to_string(ARRAY(SELECT chr((65 + round(random() * 25)) :: integer) FROM generate_series(1,4000)), '')));
INSERT 0 1

最后，我在不更改 TOAST 列的情况下更新该行：

# update people set full_name='jane user' where id=1;
UPDATE 1

是否UPDATE更改了关联 TOAST 表中的任何行（或根本需要任何写入）？

上下文：我正在处理一些每秒有数千个事务的数据库表，并且观察到服务器上的写入负载非常高。我想知道UPDATEs 是否在 TOAST 中具有大值的元组，但 TOAST 值本身大部分没有变化会导致写入负载，值得优化。

James Healy

Asked: 2020-01-17 06:04:20 +0800 CST

我可以改进 pg_dump 导出继承表的方式吗？

1

给定一个具有某些表继承的 postgresql 12 数据库，并且子表不添加其他列：

CREATE TABLE parent (
    id integer NOT NULL PRIMARY KEY,
    name text NOT NULL
);

CREATE TABLE child () INHERITS (parent);

如果我运行 pg_dump（带pg_dump foo），导出中的表如下所示：

CREATE TABLE public.parent (
    id integer NOT NULL,
    name text NOT NULL
);

CREATE TABLE public.child (
)
INHERITS (public.parent);

如果我然后分离并重新连接孩子：

$ psql foo
psql (12.1 (Debian 12.1-2))
Type "help" for help.

foo=# ALTER TABLE child NO INHERIT parent;
ALTER TABLE

foo=# ALTER TABLE child INHERIT parent;
ALTER TABLE

...并尝试另一个 pg_dump，导出已更改。子表现在显式列出了从父表继承的列：

CREATE TABLE public.parent (
    id integer NOT NULL,
    name text NOT NULL
);

CREATE TABLE public.child (
    id integer,
    name text
)
INHERITS (public.parent);

我能做些什么来让 pg_dump 输出没有显式列的子表吗？

这听起来像是一个人为的例子，但我有一个多 Tb 数据库，它的子表在时间的迷雾中被分离和重新连接，我希望模式导出恢复到简单（并且可读) 尽可能。

特别是，如果每个人一眼就能看到子表没有添加额外的列，那就太好了。

James Healy

Asked: 2016-03-28 03:19:51 +0800 CST

如果我在单个事务中进行多次更新，为什么性能是非线性的

6

一个较旧的问题涵盖了为什么随着 INSERT 计数的增长，单个事务中多个 INSERTS 的性能是非线性的。

按照那里的一些建议，我一直在尝试优化在单个事务中运行许多更新。在实际场景中，我们正在批处理来自另一个系统的数据，但我有一个较小的测试场景。

给定 postgresql 9.5.1 上的这张表：

\d+ foo
                                         Table "public.foo"
 Column |  Type   |                    Modifiers                     | Storage | Stats target | Description 
--------+---------+--------------------------------------------------+---------+--------------+-------------
 id     | bigint  | not null default nextval('foo_id_seq'::regclass) | plain   |              | 
 count  | integer | not null                                         | plain   |              |

我有以下测试文件：100.sql、1000.sql、10000.sql和。每行包含以下行，并根据文件名重复：50000.sql100000.sqlUPDATE

BEGIN;
UPDATE foo SET count=count+1 WHERE id=1;
...
UPDATE foo SET count=count+1 WHERE id=1;
COMMIT;

当我对加载每个文件进行基准测试时，结果如下所示：

              user     system      total        real   ms/update
100       0.000000   0.010000   0.040000 (  0.044277)  0.44277
1000      0.000000   0.000000   0.040000 (  0.097175)  0.09717
10000     0.020000   0.020000   0.230000 (  1.717170)  0.17171
50000     0.160000   0.130000   1.840000 ( 30.991350)  0.61982
100000    0.440000   0.380000   5.320000 (149.199524)  1.49199

每个 UPDATE 的平均时间随着事务包含更多行而增加，这表明性能是非线性的。

我链接到的较早的问题表明索引可能是一个问题，但是该表没有索引并且只有一行。

这只是“这就是它的工作原理”的情况，还是我可以调整一些设置来改善这种情况？

更新

根据当前答案中的理论，我进行了额外的测试。表结构相同，但 UPDATE 都更改了不同的行。输入文件现在看起来像这样：

BEGIN;
UPDATE foo SET count=count+1 WHERE id=1;
UPDATE foo SET count=count+1 WHERE id=2;
...
UPDATE foo SET count=count+1 WHERE id=n;
COMMIT;

当我对加载这些文件进行基准测试时，结果如下所示：

              user     system      total        real   ms/update
100       0.000000   0.000000   0.030000 (  0.044876)  0.44876
1000      0.010000   0.000000   0.050000 (  0.102998)  0.10299
10000     0.000000   0.040000   0.140000 (  0.666050)  0.06660
50000     0.070000   0.140000   0.550000 (  3.150734)  0.06301
100000    0.130000   0.280000   1.110000 (  6.458655)  0.06458

从 10,000 次更新开始（一旦摊销设置成本），性能是线性的。

为什么有序分区扫描没有像我预期的那样频繁发生？

检测内联、内联压缩和 TOAST 存储

研究

为 UPDATE 编写的 TOAST 行是否不会更改 TOASTable 列？

我可以改进 pg_dump 导出继承表的方式吗？

如果我在单个事务中进行多次更新，为什么性能是非线性的

更新

连接到 PostgreSQL 服务器：致命：主机没有 pg_hba.conf 条目

如何让sqlplus的输出出现在一行中？

选择具有最大日期或最晚日期的日期

如何列出 PostgreSQL 中的所有模式？

列出指定表的所有列

如何在不修改我自己的 tnsnames.ora 的情况下使用 sqlplus 连接到位于另一台主机上的 Oracle 数据库

你如何mysqldump特定的表？

使用 psql 列出数据库权限

如何从 PostgreSQL 中的选择查询中将值插入表中？

如何使用 psql 列出所有数据库和表？

James Healy's questions

研究

更新