我使用分区和大量数据对 posgtres 12 进行性能测试。每个分区仅包含一个具有 400k 行的站点,并且我生成了大约 1k 个分区表;
对于第一个测试套件,我使用 UUID 作为 id,但我认为如果我将 id 类型更改为 bigint,将使用更少的空间,因此性能更高。填充表格后,我使用不同的数据运行以下选择一百次
SELECT SUM(amount),
FROM test_table
WHERE date >= '2021-02-06'
AND date <= '2021-02-07'
AND site IN ('c3b3771c-4b48-41a9-88eb-4c47d1630644', 'cbb11cdc-cd31-4da2-b14e-9ef878ce03c5', '2609ac86-995b-4320-a3b7-46ba175aa5e2') // randomly picked from site pool
GROUP BY site
ORDER BY site;
UUID 测试套件无日期索引:
CREATE TABLE public.test_table
(
id UUID NOT NULL,
site UUID,
archive UUID,
location UUID,
col_1 UUID,
col_2 UUID,
col_3 UUID,
amount numeric(8,2)
date timestamp with time zone,
....
) PARTITION BY LIST (site);
CREATE TABLE test_table_${site} PARTITION OF test_table FOR VALUES IN ('${site}');
One table size: "265 MB"
BIGINT 测试套件无日期索引:
CREATE TABLE public.test_table
(
id bigint NOT NULL,
site bigint,
archive bigint,
location bigint,
col_1 bigint,
col_2 bigint,
col_3 bigint,
amount numeric(8,2)
date timestamp with time zone,
...
) PARTITION BY LIST (site);
CREATE TABLE test_table_${site} PARTITION OF test_table FOR VALUES IN ('${site}');
One table size: "118 MB"
试验结果
UUID test results (ms) for 100 serial selects
median 1,425.00
percentile 95% 1,930.05
BIGINT test results (ms) for 100 serial selects
median 4,456.00
percentile 95% 9,037.50
同解释:
UUID
"GroupAggregate (cost=61944.56..61947.03 rows=90 width=88)"
" Group Key: test_table_c3b3771c_4b48_41a9_88eb_4c47d1630644.site"
" -> Sort (cost=61944.56..61944.78 rows=90 width=48)"
" Sort Key: test_table_c3b3771c_4b48_41a9_88eb_4c47d1630644.site"
" -> Gather (cost=1000.00..61941.63 rows=90 width=48)"
" Workers Planned: 3"
" -> Parallel Append (cost=0.00..60932.63 rows=30 width=48)"
" -> Parallel Seq Scan on test_table_c3b3771c_4b48_41a9_88eb_4c47d1630644 (cost=0.00..20311.16 rows=10 width=48)"
" Filter: ((date_fiscal >= '2021-02-06 00:00:00+00'::timestamp with time zone) AND (date_fiscal <= '2021-02-07 00:00:00+00'::timestamp with time zone) AND (site = ANY ('{c3b3771c-4b48-41a9-88eb-4c47d1630644,cbb11cdc-cd31-4da2-b14e-9ef878ce03c5,2609ac86-995b-4320-a3b7-46ba175aa5e2}'::uuid[])))"
" -> Parallel Seq Scan on test_table_cbb11cdc_cd31_4da2_b14e_9ef878ce03c5 (cost=0.00..20311.16 rows=10 width=48)"
" Filter: ((date_fiscal >= '2021-02-06 00:00:00+00'::timestamp with time zone) AND (date_fiscal <= '2021-02-07 00:00:00+00'::timestamp with time zone) AND (site = ANY ('{c3b3771c-4b48-41a9-88eb-4c47d1630644,cbb11cdc-cd31-4da2-b14e-9ef878ce03c5,2609ac86-995b-4320-a3b7-46ba175aa5e2}'::uuid[])))"
" -> Parallel Seq Scan on test_table_2609ac86_995b_4320_a3b7_46ba175aa5e2 (cost=0.00..20310.16 rows=10 width=48)"
" Filter: ((date_fiscal >= '2021-02-06 00:00:00+00'::timestamp with time zone) AND (date_fiscal <= '2021-02-07 00:00:00+00'::timestamp with time zone) AND (site = ANY ('{c3b3771c-4b48-41a9-88eb-4c47d1630644,cbb11cdc-cd31-4da2-b14e-9ef878ce03c5,2609ac86-995b-4320-a3b7-46ba175aa5e2}'::uuid[])))"
大整数
"Finalize GroupAggregate (cost=47951.35..47954.22 rows=21 width=80)"
" Group Key: test_table_121.site"
" -> Gather Merge (cost=47951.35..47953.63 rows=18 width=80)"
" Workers Planned: 3"
" -> Partial GroupAggregate (cost=46951.31..46951.48 rows=6 width=80)"
" Group Key: test_table_121.site"
" -> Sort (cost=46951.31..46951.33 rows=6 width=40)"
" Sort Key: test_table_121.site"
" -> Parallel Append (cost=0.00..46951.24 rows=6 width=40)"
" -> Parallel Seq Scan on test_table_121 (cost=0.00..15651.09 rows=2 width=40)"
" Filter: ((date_fiscal >= '2021-02-06 00:00:00+00'::timestamp with time zone) AND (date_fiscal <= '2021-02-07 00:00:00+00'::timestamp with time zone) AND (site = ANY ('{121,122,242}'::bigint[])))"
" -> Parallel Seq Scan on test_table_242 (cost=0.00..15651.09 rows=2 width=40)"
" Filter: ((date_fiscal >= '2021-02-06 00:00:00+00'::timestamp with time zone) AND (date_fiscal <= '2021-02-07 00:00:00+00'::timestamp with time zone) AND (site = ANY ('{121,122,242}'::bigint[])))"
" -> Parallel Seq Scan on test_table_122 (cost=0.00..15649.02 rows=2 width=40)"
" Filter: ((date_fiscal >= '2021-02-06 00:00:00+00'::timestamp with time zone) AND (date_fiscal <= '2021-02-07 00:00:00+00'::timestamp with time zone) AND (site = ANY ('{121,122,242}'::bigint[])))"
怎么可能在数据量较小的情况下,在选择时间上有如此大的差异?或者我在测试过程中犯了一个错误。
提前致谢!
我的猜测是后果来自您如何运行测试。我相信在检查您的示例测试查询后,您的一组数据与另一组数据可能会遇到非常受欢迎的测试参数。特别是您的
WHERE
条款:如果没有看到您
BIGINT
为数据集运行的等效测试查询,很难进行比较,但我认为由于以下可能的原因,可能会发生不平衡测试:您使用的日期范围可能非常有利于通过
UUID
site 字段而不是 site 字段对数据进行分区BIGINT
,特别是因为BIGINT
假设它的值比UUID
.site
您在子句中为谓词选择值的方式WHERE
也可能有利于您的UUID
测试分区而不是BIGINT
测试。看起来您说您是从站点池中随机选择它们,但这实际上取决于它的真正随机性,再加上一个事实,您的分区顺序将与 aUUID
的顺序大不相同那些分区为BIGINT
.BIGINT
再次没有看到您对测试的等效示例查询,以及您如何在两种情况下随机选择该谓词,很难说这有多大的影响。总而言之,我没有看到任何其他可以保证结果有很大差异的东西,这让我对上述内容进行理论化。不幸的是,如果这是您如何测试数据的问题,例如我怀疑的那样,那么将没有任何有信誉的来源可以为您提供答案。相反,您应该首先简化您的测试,以消除导致加权结果的潜在变量,然后从那里开始。
例如,可以通过手动选择第一个分区值、最后一个分区值、接近中间分区值的边界测试开始,然后对所有分区运行测试,
date
在任何这些情况下都没有谓词,以消除潜在的来源我上面提到的错误。然后针对您正在测试的特定谓词引入一个date
您知道的范围谓词,该谓词包含相等数量的分区,每个分区具有相等的行数。site
本质上受控的测试将在这里为您提供比随机测试更有意义的信息。我对相同的数据运行相同的测试
测试1 天间隔,3 个站点 ID:
测试1 周间隔,3 个站点 ID:
测试1 个月的间隔,3 个站点 id:
当我在选择中添加 10 个站点 ID 时,我收到了更明确的区别:
测试 1 天间隔,10 个站点 ID:
测试1 周间隔,10 个站点 ID:
测试1 个月间隔,10 个站点 id:
我认为问题在于测试是在生成数据之前运行的,并且 postgres 服务器可能会运行一些数据重组任务