我有一个递归查询需要很长时间 - 30+ 毫秒,其中手动提取相同数据的单个查询需要 < 0.12 毫秒。所以我们说的时间是 250 倍。
我有以下数据库结构,允许组成员身份的 DAG(此处为 db-fiddle):
create table subjects
(
subject_id bigint not null
constraint pk_subjects
primary key
);
create table subject_group_members
(
subject_group_id bigint not null
constraint fk_subject_group_members_subject_group_id_subjects_subject_id
references subjects(subject_id)
on delete cascade,
subject_id bigint not null
constraint fk_subject_group_members_subject_id_subjects_subject_id
references subjects(subject_id)
on delete cascade,
constraint pk_subject_group_members
primary key (subject_group_id, subject_id)
);
create index idx_subject_group_members_subject_id
on subject_group_members (subject_id);
create index idx_subject_group_members_subject_group_id
on subject_group_members (subject_group_id);
数据可能如下所示:
subject_group_id | 主题ID |
---|---|
1 | 2 |
1 | 3 |
1 | 4 |
2 | 5 |
3 | 5 |
我想知道 5 所属的所有组(1 通过继承,2 和 3 直接,不是 4 或任何其他主题 ID)。
此查询按预期工作:
with recursive flat_members(subject_group_id, subject_id) as (
select subject_group_id, subject_id
from subject_group_members gm
union
select
flat_members.subject_group_id as subject_group_id,
subject_group_members.subject_id as subject_id
from subject_group_members
join flat_members on flat_members.subject_id = subject_group_members.subject_group_id
)
select * from flat_members where subject_id = 5
但是使用真实数据运行,我得到了这个查询计划:
CTE Scan on flat_members (cost=36759729.47..59962757.76 rows=5156229 width=16) (actual time=26.526..55.166 rows=3 loops=1)
Filter: (subject_id = 30459)
Rows Removed by Filter: 48984
CTE flat_members
-> Recursive Union (cost=0.00..36759729.47 rows=1031245702 width=16) (actual time=0.022..47.638 rows=48987 loops=1)
-> Seq Scan on subject_group_members gm (cost=0.00..745.82 rows=48382 width=16) (actual time=0.019..4.286 rows=48382 loops=1)
-> Merge Join (cost=63629.74..1613406.96 rows=103119732 width=16) (actual time=10.897..11.038 rows=320 loops=2)
Merge Cond: (subject_group_members.subject_group_id = flat_members_1.subject_id)
-> Index Scan using idx_subject_group_members_subject_group_id on subject_group_members (cost=0.29..1651.02 rows=48382 width=16) (actual time=0.009..1.987 rows=24192 loops=2)
-> Materialize (cost=63629.45..66048.55 rows=483820 width=16) (actual time=4.124..6.592 rows=24668 loops=2)
-> Sort (cost=63629.45..64839.00 rows=483820 width=16) (actual time=4.120..5.034 rows=24494 loops=2)
Sort Key: flat_members_1.subject_id
Sort Method: quicksort Memory: 53kB
-> WorkTable Scan on flat_members flat_members_1 (cost=0.00..9676.40 rows=483820 width=16) (actual time=0.001..0.916 rows=24494 loops=2)
Planning Time: 0.296 ms
Execution Time: 56.735 ms
现在,如果我手动执行,查询select subject_group_id from subject_group_members where subject_id = 30459
并跟踪树,则有 4 个查询,每个查询大约需要 0.02 毫秒。
有没有一种方法可以使递归查询接近手动进行递归的速度?
看起来您无意中颠倒了连接条件。
小提琴
另外,将过滤器
WHERE subject_id = 5
移到初始位置SELECT
以及早过滤不相关的行 - 并允许优化查询计划,通常使用索引。说到这一点,这个多列索引会更好,允许仅索引扫描:(还不如
UNIQUE
。)除了你的PK上(subject_group_id, subject_id)
。或者反转 PK 定义中的列,或者可能有用。关于仅索引扫描:
通常最好只在 上设置一个 PK,在 上设置
(subject_id, subject_group_id)
另一个多列索引,然后只在和(subject_group_id, subject_id)
上删除这两个索引。看:(subject_id)
(subject_group_id)