我在查询包含超过一百万行的事件表时遇到了性能问题。我们的应用程序逻辑规定用户只能访问他们可以通过其“组”访问的事件。组与资产具有多对多关系,资产与事件具有多对多关系。因此,为了找到用户可以访问的事件,我们目前加入了事件 -> events_assets -> assets -> groups_assets
我有 events.id、events_assets.event_id、assets.id 和 groups_assets.asset_id 的索引。
应用程序的本质是事件的数量以及 events_assets 的数量不断增长,而资产和 groups_assets 的数量保持相对较低且稳定。
架构如下:
create table events (id text, last_updated TIMESTAMP);
create table events_assets (event_id text, asset_id text);
create table assets (id text);
create table groups_assets (group_id text, asset_id text);
查询:
EXPLAIN ANALYZE
SELECT
events.*
FROM
events
WHERE
(
events.id IN (
SELECT
events.id
FROM
events
INNER JOIN events_assets ON (events.id = events_assets.event_id)
INNER JOIN assets ON (assets.id = events_assets.asset_id)
WHERE
(
assets.id in (
(
SELECT
id
FROM
assets
LEFT JOIN groups_assets ON (groups_assets.asset_id = assets.id)
WHERE
(groups_assets.group_id IN ('default'))
)
)
)
)
)
ORDER BY
last_updated DESC
LIMIT
25 OFFSET 0
查询计划如下:
Limit (cost=77402.59..77402.66 rows=25 width=1970) (actual time=2147.720..2148.033 rows=25 loops=1)
-> Sort (cost=77402.59..77462.96 rows=24147 width=1970) (actual time=2147.714..2148.024 rows=25 loops=1)
Sort Key: events.last_updated DESC
Sort Method: top-N heapsort Memory: 109kB
-> Nested Loop (cost=15979.68..76721.18 rows=24147 width=1970) (actual time=1074.494..2097.334 rows=144882 loops=1)
-> HashAggregate (cost=15979.26..16220.73 rows=24147 width=74) (actual time=1074.361..1225.749 rows=144882 loops=1)
Group Key: events_1.id
Batches: 5 Memory Usage: 8241kB Disk Usage: 11304kB
-> Gather (cost=1013.12..15918.90 rows=24147 width=74) (actual time=0.872..99.711 rows=144882 loops=1)
Workers Planned: 1
Workers Launched: 1
-> Nested Loop (cost=13.12..12504.20 rows=14204 width=74) (actual time=1.567..278.523 rows=72441 loops=2)
-> Hash Join (cost=12.70..3543.54 rows=14254 width=37) (actual time=1.019..34.109 rows=72441 loops=2)
Hash Cond: (events_assets.asset_id = assets.id)
-> Parallel Seq Scan on events_assets (cost=0.00..3103.23 rows=85523 width=74) (actual time=0.183..18.023 rows=72441 loops=2)
-> Hash (cost=12.68..12.68 rows=1 width=100) (actual time=0.808..0.813 rows=8 loops=2)
Buckets: 1024 Batches: 1 Memory Usage: 10kB
-> Nested Loop (cost=9.37..12.68 rows=1 width=100) (actual time=0.777..0.793 rows=8 loops=2)
Join Filter: (assets.id = groups_assets.asset_id)
-> HashAggregate (cost=9.24..9.25 rows=1 width=66) (actual time=0.755..0.759 rows=8 loops=2)
Group Key: assets_1.id
Batches: 1 Memory Usage: 24kB
Worker 0: Batches: 1 Memory Usage: 24kB
-> Nested Loop (cost=0.13..9.23 rows=1 width=66) (actual time=0.702..0.730 rows=9 loops=2)
-> Seq Scan on groups_assets (cost=0.00..1.07 rows=1 width=32) (actual time=0.220..0.222 rows=9 loops=2)
Filter: (group_id = 'default'::text)
-> Index Only Scan using assets_id on assets assets_1 (cost=0.13..8.15 rows=1 width=34) (actual time=0.055..0.056 rows=1 loops=18)
Index Cond: (id = groups_assets.asset_id)
Heap Fetches: 18
-> Index Only Scan using assets_id on assets (cost=0.13..3.42 rows=1 width=34) (actual time=0.003..0.003 rows=1 loops=16)
Index Cond: (id = assets_1.id)
Heap Fetches: 16
-> Index Only Scan using events_id on events events_1 (cost=0.42..0.62 rows=1 width=37) (actual time=0.003..0.003 rows=1 loops=144882)
Index Cond: (id = events_assets.event_id)
Heap Fetches: 18621
-> Index Scan using events_id on events (cost=0.42..2.50 rows=1 width=1970) (actual time=0.006..0.006 rows=1 loops=144882)
Index Cond: (id = events_1.id)
Planning Time: 1.994 ms
Execution Time: 2159.146 ms