我可以在使用数据库后激活 PITR 吗？

Question

A_V

Asked: 2019-02-08 18:33:28 +0800 CST2019-02-08 18:33:28 +0800 CST 2019-02-08 18:33:28 +0800 CST

Postgres CTE 优化与嵌套 json_build_object

772

我正在尝试编写一个查询，该查询从多个表中返回数据并将其聚合到一个嵌套的 JSON 字段中。我觉得这在 SqlServer 上会表现出色，但正如Brent Ozar在这篇文章中所写，Postgres 优化器将 CTE 查询隔离在一起。这迫使我在第一个 CTE 级别使用WHERE语句，否则每次都会加载整个数据集。那和我不太习惯的特定 JSON 函数让我想知道这是否可以更好地执行。

我尝试在没有 CTE 的情况下编写此代码，但不确定如何嵌套子查询。

我在这里缺少任何 postgres 技巧吗？这些指标有效吗？

输出如下所示：

[{
    "item_property_id": 1001010,
    "property_name": "aadb480d8716e52da33ed350b00d6cef",
    "values": [
        "1f64450fae03b127cf95f9b06fca4bca",
        "9a6883b8a87a5028bf7dfc27412c2de8"
    ]
},{
    "item_property_id": 501010,
    "property_name": "e870e8d81e16ee46c75493856b4c6b66",
    "values": [
        "a6bed25b407c515bb8a55f2e239066ec",
        "feb10299fd6408e0d37a8761e334c97a"
    ]
},{
    "item_property_id": 1010,
    "property_name": "f2d7b27c50a059d9337c949c13aa3396",
    "values": [
        "56674c1c3d66c832abf87b436a4fd095",
        "ff88fe69f4438a6277c792faaf485368"
    ]
}]

这是生成模式和测试数据的脚本

--create schema
drop table if exists public.items;
drop table if exists public.items_properties;
drop table if exists public.items_properties_values;
create table public.items(
    item_id integer primary key,
    item_name varchar(250));                      
create table public.items_properties(
    item_property_id serial primary key,
    item_id integer,
    property_name varchar(250));                      
create table public.items_properties_values(
    item_property_value_id serial primary key,
    item_property_id integer,
    property_value varchar(250));
CREATE INDEX items_index
    ON public.items USING btree
    (item_id ASC NULLS LAST,item_name asc nulls last)
    TABLESPACE pg_default; 
CREATE INDEX properties_index
    ON public.items_properties USING btree
    (item_property_id ASC NULLS LAST,item_id asc nulls last,property_name asc nulls last)
    TABLESPACE pg_default;
CREATE INDEX values_index
    ON public.items_properties_values USING btree
    (item_property_value_id ASC NULLS LAST,item_property_id asc nulls last,property_value asc nulls last)
    TABLESPACE pg_default;

--insert dummy data
insert into public.items                        
SELECT generate_series(1,500000),md5(random()::text);

insert into public.items_properties (item_id,property_name)
SELECT item_id,md5(random()::text) from public.items;
insert into public.items_properties (item_id,property_name)
SELECT item_id,md5(random()::text) from public.items;
insert into public.items_properties (item_id,property_name)
SELECT item_id,md5(random()::text) from public.items;


insert into public.items_properties_values (item_property_id,property_value)
select item_property_id,md5(random()::text) from public.items_properties;
insert into public.items_properties_values (item_property_id,property_value)
select item_property_id,md5(random()::text) from public.items_properties;

--Query returned successfully in 22 secs 704 msec.

这是SQL命令

如果没有第三行的 where ，加载大约需要 15 秒。我知道这正在加载数千条记录，所以它可能表现得很好，但我真的很喜欢第二个意见。

with cte_items as (
    select item_id,item_name from public.items  
    --where item_id between 1000 and 1010
),cte_properties as (
    select ip.item_id,ip.item_property_id,ip.property_name from public.items_properties ip
    inner join cte_items i on i.item_id=ip.item_id
),cte_values as (
    select ipv.item_property_value_id,ipv.item_property_id,ipv.property_value from public.items_properties_values ipv
    inner join cte_properties p on ipv.item_property_id=p.item_property_id
)
select i.item_id,i.item_name,json_agg(json_build_object('item_property_id',prop.item_property_id,'property_name',prop.property_name,'values',prop.values))
from cte_items i
left join (
    select cp.item_id,cp.item_property_id,cp.property_name,json_agg(to_json(cv.property_value)) "values"
    from cte_properties cp
    left join ( select val.item_property_id,val.property_value from cte_values val ) cv on cv.item_property_id=cp.item_property_id
    group by cp.item_id,cp.item_property_id,cp.property_name
) prop
on i.item_id=prop.item_id
group by i.item_id,i.item_name

2 个回答

Voted

jjanes · Answer 1 · 2019-02-10T07:05:07+08:00

jjanes

2019-02-10T07:05:07+08:002019-02-10T07:05:07+08:00

您（或布伦特）是正确的，CTE 是 PostgreSQL 中的优化栅栏。目前正在积极开展消除该限制的工作，~~但我并不十分乐观地认为这项工作将被纳入下一个版本 v12~~。

我很少在生产代码中使用 select-only CTE。如果 CTE 是仅选择的并且不包含任何可替换的参数，我通常只是从中创建一个视图。我认为这是更好的代码，以及摆脱优化围栏问题。实际上，在我的生产代码中唯一可以找到一些仅选择 CTE 的地方是我特别需要优化围栏行为的地方，以防止规划器根据我知道但规划器不知道的相关性错误优化查询.

1

Erwin Brandstetter · Answer 2 · 2019-02-10T16:46:55+08:00

Best Answer

Erwin Brandstetter

2019-02-10T16:46:55+08:002019-02-10T16:46:55+08:00

@jjanes写的关于 CTE 作为优化围栏的内容。

您的特定查询不需要以 CTE 开头 - 也不需要大多数其他包含的噪音。我看到的可以简化为SELECT具有两级嵌套子查询的：

SELECT item_id, item_name, js
FROM   items i
LEFT   JOIN (
   SELECT item_id, json_agg(json_build_object('item_property_id',item_property_id,'property_name',property_name,'values',values)) AS js
   FROM   items_properties
   LEFT   JOIN (
      SELECT item_property_id, json_agg(property_value) AS values
      FROM   items_properties_values
      GROUP  BY 1
      ) ipv USING (item_property_id)
   GROUP  BY 1
   ) ip USING (item_id)
ORDER  BY 1, 2;

db<>在这里摆弄

在我的快速测试中，速度是我的两倍多。

在查询整个表时，先聚合再加入也更快。当您的演示中每个聚合的行数超过 2 或 3 行时，更是如此——这可能过于简化了。

有关的：

1

Postgres CTE 优化与嵌套 json_build_object

连接到 PostgreSQL 服务器：致命：主机没有 pg_hba.conf 条目

如何让sqlplus的输出出现在一行中？

选择具有最大日期或最晚日期的日期

如何列出 PostgreSQL 中的所有模式？

列出指定表的所有列

如何在不修改我自己的 tnsnames.ora 的情况下使用 sqlplus 连接到位于另一台主机上的 Oracle 数据库

你如何mysqldump特定的表？

使用 psql 列出数据库权限

如何从 PostgreSQL 中的选择查询中将值插入表中？

如何使用 psql 列出所有数据库和表？

Postgres CTE 优化与嵌套 json_build_object

2 个回答

相关问题