我有下表:
CREATE TABLE items
(
id NUMERIC(20, 0) NOT NULL DEFAULT NEXTVAL('items_sequence') PRIMARY KEY,
item_price NUMERIC(19, 2) DEFAULT NULL NULL,
status NUMERIC(2, 0) NOT NULL,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP NOT NULL
);
具有以下索引:
CREATE INDEX items_dash_idx ON items (status, DATE(created_at));
我想按状态和每天对我的商品进行分组,持续约 30 天。这意味着我想获取过去 30 天内每个状态每天的商品数量和总价格,包括数量/金额为 0 的情况。我有 5 个状态,其中一个 (50) 不相关且行数太多(例如,状态 50 有约 400k 行,而状态 10、20、30 和 40 有约 1k 行,持续过去 30 天)。
我有以下疑问:
SELECT COUNT(i.id) AS count,
COALESCE(SUM(i.item_price), 0) AS amount,
dates_table.status,
dates_table.created_at
FROM (SELECT created_at::DATE AS created_at, 10 AS status
FROM GENERATE_SERIES('2024-09-18'::DATE, '2024-10-18'::DATE, INTERVAL '1 DAY') AS created_at
UNION
SELECT created_at::DATE AS created_at, 20 AS status
FROM GENERATE_SERIES('2024-09-18'::DATE, '2024-10-18'::DATE, INTERVAL '1 DAY') AS created_at
UNION
SELECT created_at::DATE AS created_at, 30 AS status
FROM GENERATE_SERIES('2024-09-18'::DATE, '2024-10-18'::DATE, INTERVAL '1 DAY') AS created_at
UNION
SELECT created_at::DATE AS created_at, 40 AS status
FROM GENERATE_SERIES('2024-09-18'::DATE, '2024-10-18'::DATE, INTERVAL '1 DAY') AS created_at
) AS dates_table
LEFT JOIN items i
ON i.status = dates_table.status
AND DATE(i.created_at) = dates_table.created_at
GROUP BY dates_table.created_at, dates_table.status
ORDER BY dates_table.created_at, dates_table.status;
此查询似乎需要 10 多秒,输出如下EXPLAIN (ANALYZE, BUFFERS)
:
QUERY PLAN
Sort (cost=2242005.05..2242006.05 rows=400 width=48) (actual time=21950.589..21950.601 rows=72 loops=1)
Sort Key: dates_table.created_at, dates_table.status
Sort Method: quicksort Memory: 29kB
Buffers: shared hit=676950 read=747852 dirtied=755, temp read=28515 written=28531
-> HashAggregate (cost=2241982.76..2241987.76 rows=400 width=48) (actual time=21950.436..21950.492 rows=72 loops=1)
Group Key: dates_table.created_at, dates_table.status
Batches: 1 Memory Usage: 61kB
Buffers: shared hit=676947 read=747852 dirtied=755, temp read=28515 written=28531
-> Merge Left Join (cost=2161026.21..2239512.33 rows=247043 width=20) (actual time=21834.112..21948.382 rows=11066 loops=1)
Merge Cond: ((dates_table.created_at = (date(i.created_at))) AND (((dates_table.status)::numeric) = i.status))
Buffers: shared hit=676947 read=747852 dirtied=755, temp read=28515 written=28531
-> Sort (cost=449.35..459.35 rows=4000 width=8) (actual time=895.905..895.933 rows=72 loops=1)
Sort Key: dates_table.created_at, ((dates_table.status)::numeric)
Sort Method: quicksort Memory: 28kB
Buffers: shared hit=4
-> Subquery Scan on dates_table (cost=130.03..210.03 rows=4000 width=8) (actual time=895.792..895.846 rows=72 loops=1)
-> HashAggregate (cost=130.03..170.03 rows=4000 width=8) (actual time=895.788..895.831 rows=72 loops=1)
Group Key: ((created_ai.created_at)::date), (10)
Batches: 1 Memory Usage: 217kB
-> Append (cost=0.01..110.03 rows=4000 width=8) (actual time=895.697..895.749 rows=72 loops=1)
-> Function Scan on generate_series created_at (cost=0.01..12.51 rows=1000 width=8) (actual time=895.694..895.697 rows=18 loops=1)
-> Function Scan on generate_series created_at_1 (cost=0.01..12.51 rows=1000 width=8) (actual time=0.012..0.014 rows=18 loops=1)
-> Function Scan on generate_series created_at_2 (cost=0.01..12.51 rows=1000 width=8) (actual time=0.010..0.012 rows=18 loops=1)
-> Function Scan on generate_series created_at_3 (cost=0.01..12.51 rows=1000 width=8) (actual time=0.010..0.012 rows=18 loops=1)
-> Materialize (cost=2160576.87..2185898.76 rows=5064379 width=25) (actual time=19123.895..20601.926 rows=5066445 loops=1)
Buffers: shared hit=676943 read=747852 dirtied=755, temp read=28515 written=28531
-> Sort (cost=2160576.87..2173237.82 rows=5064379 width=25) (actual time=19123.888..20125.620 rows=5066445 loops=1)
Sort Key: (date(i.created_at)), i.status
Sort Method: external merge Disk: 228120kB
Buffers: shared hit=676943 read=747852 dirtied=755, temp read=28515 written=28531
-> Seq Scan on items i (cost=0.00..1475438.79 rows=5064379 width=25) (actual time=0.064..16526.846 rows=5066445 loops=1)
Buffers: shared hit=676943 read=747852 dirtied=755
Planning Time: 0.399 ms
JIT:
Functions: 44
Options: Inlining true, Optimization true, Expressions true, Deforming true
Timing: Generation 2.096 ms, Inlining 293.474 ms, Optimization 383.558 ms, Emission 218.758 ms, Total 897.885 ms
Execution Time: 21989.150 ms
当我运行此查询时,我的缓存命中率从 99.9% 降至 50%。Oracle 中的相同索引(显然TRUNC(created_at)
不是DATE(created_at)
)和相同查询大约需要 500ms。
更新:
SELECT COUNT(i.id) AS count,
SUM(i.item_price) AS amount,
DATE(i.created_at) AS created_at,
i.status AS status
FROM items i
WHERE i.status = 10
AND DATE(i.created_at) >= '2024-10-01'
GROUP BY i.status, DATE(i.created_at)
ORDER BY i.status, DATE(i.created_at)
速度很快,输出如下EXPLAIN (ANALYZE, VERBOSE, BUFFERS, SETTINGS)
:
QUERY PLAN
GroupAggregate (cost=0.43..27.72 rows=9 width=49) (actual time=0.044..0.117 rows=9 loops=1)
Output: count(id), sum(item_price), (date(created_at)), status
Group Key: i.status, date(i.created_at)
Buffers: shared hit=26
-> Index Scan using items_dash_idx on tenant_xxx.items i (cost=0.43..27.49 rows=9 width=21) (actual time=0.025..0.097 rows=24 loops=1)
Output: date(created_at), status, id, item_price
Index Cond: ((i.status = '10'::numeric) AND (date(i.created_at) >= '2024-10-01'::date))
Buffers: shared hit=26
Settings: effective_cache_size = '16GB', search_path = 'tenant_xxx', work_mem = '32MB'
Planning:
Buffers: shared hit=3
Planning Time: 0.255 ms
Execution Time: 0.140 ms
更新 #2:
如果我预先过滤项目,然后执行连接:
WITH items_table AS (SELECT COUNT(i.id) AS count,
SUM(i.item_price) AS amount,
DATE(i.created_at) AS created_at,
i.status AS status
FROM items t
WHERE i.status IN (10, 20, 30, 40)
AND DATE(i.created_at) >= '2024-10-01'
AND DATE(i.created_at) < '2024-10-18'::DATE + INTERVAL '1 DAY'
GROUP BY i.status, DATE(i.created_at)
ORDER BY i.status, DATE(i.created_at))
SELECT dates_table.created_at AS created_at,
dates_table.status AS status,
COALESCE(items_table.count, 0) AS count,
COALESCE(items_table.amount, 0) AS amount
FROM (SELECT created_at::DATE AS created_at
, UNNEST(ARRAY [10, 20, 30, 40]) AS status
FROM GENERATE_SERIES('2024-10-01'::DATE, '2024-10-18'::DATE, '1 DAY') AS created_at) AS dates_table
LEFT JOIN items_table ON items_table.status = dates_table.status AND
items_table.created_at = dates_table.created_at
ORDER BY dates_table.created_at, dates_table.status
它会在大约~100ms 后返回。
VACUUM ANALYZE items
。如果我设置一个与您描述的类似的数据集,我会得到一个索引扫描。这个答案主要是为了引起编辑,您可以添加您的
explain(analyze, verbose, buffers, settings)
内容,说明它与此有何不同,并填写有关数据集的其他详细信息,使其与此数据集不同。您可以稍微缩短、简化和加快查询速度:生成一次日历,然后让目标状态
unnest()
成为每天的状态。您还可以在加入日历之前进行聚合,然后根据@Dunes的建议coalesce()
将不匹配设置为:db<>fiddle 上的演示0