假设您有一个多商店平台的下表
CREATE TABLE orders (
id BIGINT PRIMARY KEY GENERATED ALWAYS AS IDENTITY,
store_id BIGINT NOT NULL,
ordered_at TIMESTAMPTZ NOT NULL
);
CREATE INDEX ON orders (store_id);
CREATE TABLE order_lines (
id BIGINT PRIMARY KEY GENERATED ALWAYS AS IDENTITY,
order_id BIGINT NOT NULL REFERENCES orders (id) ON DELETE CASCADE,
item_id BIGINT NOT NULL REFERENCES items (id),
quantity INT
);
CREATE INDEX ON order_lines (order_id);
CREATE INDEX ON order_lines (item_id);
CREATE TABLE items (
id BIGINT PRIMARY KEY GENERATED ALWAYS AS IDENTITY,
store_id BIGINT NOT NULL,
color VARCHAR(255) NOT NULL,
size VARCHAR(255) NOT NULL,
category VARCHAR(255)
);
CREATE TABLE returns (
id BIGINT PRIMARY KEY GENERATED ALWAYS AS IDENTITY,
order_line_id BIGINT NOT NULL REFERENCES order_lines (id) ON DELETE CASCADE,
returned_at TIMESTAMPTZ NOT NULL,
quantity INT NOT NULL,
reason VARCHAR(255)
);
CREATE INDEX ON returns (order_line_id);
由此,我想获取某个日期范围内所有订单的列表,并计算某些指标,例如购买了多少商品,退回了多少商品等。我还想根据颜色对商品的子集执行此操作或大小,但我也希望它们被“排名”。例如,我想显示总体上返回次数最多的颜色中颜色的项目的这些指标。
到目前为止,我想到的是通过两个查询来完成此操作。第一个是遍历所有退货并按颜色对它们进行分组并对项目进行求和,退货,如下所示:
select
i.color,
trunc(sum(r.quantity)::numeric / sum(ol.quantity)::numeric, 2) as return_rate
from orders o
inner join
order_lines ol on ol.order_id = o.id
inner join
items i on i.id = ol.item_id
left outer join
returns r on r.order_line_id = ol.id
group by i.color
order by return_rate desc nulls last
limit 4;
color | return_rate
--------------+-------------
Black | 0.43
Blue | 0.41
White | 0.40
Yellow | 0.39
然后,基于此查询,我将执行一个新查询,按天(日期)对所有订单进行分组,然后对某个时间范围内的总退货率和返回次数最多的颜色的退货率进行求和。我还希望能够通过项目的尺寸、类别等进行过滤。它将用于动态报告,人们可以在其中查看订单总数、退货、比率以及平均退货率的折线图、选定时间范围内最高退货颜色。
有更好的方法吗?遍历并执行所有连接两次感觉有点错误。阅读一些有关窗口函数的内容,但无法确定它是否适用于此处。
我开始用我幼稚的方法以及 CTE 的方法来尝试重用批量数据。https://www.db-fiddle.com/f/4jyoMCicNSZpjMt4jFYoz5/10700
显然,它在本地执行的操作与大约 100k 行有所不同,这是本地运行的 CTE 查询的解释分析计划。
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
GroupAggregate (cost=50730.74..50809.91 rows=2262 width=68) (actual time=350.787..351.912 rows=722 loops=1)
Group Key: ((order_return_items.date)::date), order_return_items.color
CTE order_return_items
-> HashAggregate (cost=33884.69..39240.19 rows=226239 width=33) (actual time=211.390..258.041 rows=194309 loops=1)
Group Key: o.ordered_at, i.color
Planned Partitions: 4 Batches: 5 Memory Usage: 9265kB Disk Usage: 7736kB
-> Hash Left Join (cost=5740.52..14725.07 rows=226239 width=25) (actual time=44.080..158.610 rows=226902 loops=1)
Hash Cond: (ol.id = r.order_line_id)
-> Hash Join (cost=3754.69..9321.56 rows=226239 width=29) (actual time=31.592..107.385 rows=226239 loops=1)
Hash Cond: (ol.item_id = i.id)
-> Hash Join (cost=3572.53..8544.81 rows=226239 width=28) (actual time=29.966..79.287 rows=226239 loops=1)
Hash Cond: (ol.order_id = o.id)
-> Seq Scan on order_lines ol (cost=0.00..4378.39 rows=226239 width=28) (actual time=0.012..10.556 rows=226239 loops=1)
-> Hash (cost=2363.90..2363.90 rows=96690 width=16) (actual time=29.931..29.931 rows=96690 loops=1)
Buckets: 131072 Batches: 1 Memory Usage: 5557kB
-> Seq Scan on orders o (cost=0.00..2363.90 rows=96690 width=16) (actual time=0.005..13.289 rows=96690 loops=1)
-> Hash (cost=137.63..137.63 rows=3563 width=17) (actual time=1.620..1.621 rows=4462 loops=1)
Buckets: 8192 (originally 4096) Batches: 1 (originally 1) Memory Usage: 300kB
-> Seq Scan on items i (cost=0.00..137.63 rows=3563 width=17) (actual time=0.014..0.754 rows=4462 loops=1)
-> Hash (cost=1248.70..1248.70 rows=58970 width=12) (actual time=12.385..12.385 rows=61098 loops=1)
Buckets: 65536 Batches: 1 Memory Usage: 3376kB
-> Seq Scan on returns r (cost=0.00..1248.70 rows=58970 width=12) (actual time=0.007..5.392 rows=61098 loops=1)
-> Sort (cost=11490.55..11496.20 rows=2262 width=52) (actual time=350.765..351.008 rows=6567 loops=1)
Sort Key: ((order_return_items.date)::date), order_return_items.color
Sort Method: quicksort Memory: 626kB
-> Hash Join (cost=6227.62..11364.52 rows=2262 width=52) (actual time=323.014..349.528 rows=6567 loops=1)
Hash Cond: (order_return_items.color = most_returned_colors.color)
-> CTE Scan on order_return_items (cost=0.00..4524.78 rows=226239 width=56) (actual time=211.393..223.913 rows=194309 loops=1)
-> Hash (cost=6227.60..6227.60 rows=2 width=32) (actual time=111.569..111.571 rows=2 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 9kB
-> Subquery Scan on most_returned_colors (cost=6227.57..6227.60 rows=2 width=32) (actual time=111.564..111.566 rows=2 loops=1)
-> Limit (cost=6227.57..6227.58 rows=2 width=64) (actual time=111.564..111.565 rows=2 loops=1)
-> Sort (cost=6227.57..6228.07 rows=200 width=64) (actual time=111.562..111.563 rows=2 loops=1)
Sort Key: (trunc((sum(order_return_items_1.r_q) / sum(order_return_items_1.ol_q)), 3)) DESC NULLS LAST
Sort Method: top-N heapsort Memory: 25kB
-> HashAggregate (cost=6221.57..6225.57 rows=200 width=64) (actual time=111.538..111.550 rows=44 loops=1)
Group Key: order_return_items_1.color
Batches: 1 Memory Usage: 48kB
-> CTE Scan on order_return_items order_return_items_1 (cost=0.00..4524.78 rows=226239 width=48) (actual time=0.000..85.098 rows=194309 loops=1)
如果有关于整体结构和索引的反馈,我们也将不胜感激!