Kyle Chamberlin提出的问题 -dba

Asked: 2024-02-15 11:34:47 +0800 CST

需要帮助确定表配置和查询设计的改进机会

我正在开发一个具有相当大数据集的项目。我们需要对该数据集进行任意聚合，这些聚合是在用户请求时生成的。以下是我们当前在 PostgresQL v11 中设置的基本描述（是的，我们知道它已经停产，升级计划将于下个季度进行）

基本表结构如下：

create table if not exists sales
(
    category_a  smallint,    -- sequential integer values from 0 - 10000
    category_b  varchar(3),  -- 3-digit ids (all numeric, padded with zeros)
    product     varchar(14), -- essentially random 14 character identifiers
    location_id varchar(5),  -- location id, 5-digit number (left padded with zeros)
    units       int,         -- value of interest
    sales       float,       -- second value of interest
    primary key (category_a, category_b, product, location_id)
) partition by range (category_a);

我们目前的分区依据是A因为这些值在大约 200 个值之后轮换出来，并从数据集中删除。A分区进一步按进行子分区B。每个A_B分区包含大约 50-7000 万行。

的值B是非连续的并且有间隙。

产品的价值有多种，大约有一百万种。

location_id，每个类别大约有 50-100 个位置B，每个位置都有其中的大部分产品。

示例查询如下所示：

select category_a, category_b, product, sum(units), sum(sales)
from sales
where category_a between 1 and 100
  and sales.category_b in ('001', '010', '018', '019', '024')
  and product in ('00000000000147', '00000000000900', '00000000000140', '00000000009999')
group by category_a, category_b, product;

此查询的解释表明我们对数据集中的每个分区进行了完整的顺序扫描。这看起来很奇怪，因为我们有唯一索引，左边的三个值是 where 和 group 子句中的三个值。我不明白为什么这不使用索引。

这是一个将示例数据加载到表中的查询：

insert into sales
    (category_a, category_b, product, location_id, units, sales)
select cat_a,
       lpad(cat_b::varchar, 3, '0'),
       lpad(product::varchar, 14, '0'),
       lpad(location_id::varchar, 5, '0'),
       (random() * 10000)::int,
       (random() * 100000)::int
from generate_series(1, 50) cat_a
         cross join generate_series(1, 25) cat_b
         cross join generate_series(1, 10) location_id
         cross join generate_series(1, 5000) product;

该查询的解释很长，但如果我们认为有帮助，我可以提供它。

这些查询可能非常慢（几分钟，有时超过 10 分钟）。非常乐意提供更多详细信息，但这是基本信息（无论如何我都这么认为）。

我们可以/应该对表或查询进行哪些更改来提高该查询的性能？

需要帮助确定表配置和查询设计的改进机会

连接到 PostgreSQL 服务器：致命：主机没有 pg_hba.conf 条目

如何让sqlplus的输出出现在一行中？

选择具有最大日期或最晚日期的日期

如何列出 PostgreSQL 中的所有模式？

列出指定表的所有列

如何在不修改我自己的 tnsnames.ora 的情况下使用 sqlplus 连接到位于另一台主机上的 Oracle 数据库

你如何mysqldump特定的表？

使用 psql 列出数据库权限

如何从 PostgreSQL 中的选择查询中将值插入表中？

如何使用 psql 列出所有数据库和表？

Kyle Chamberlin's questions