我正在使用 PostgreSQL 17
我正在为 Haskell 生态系统建模一个包索引,一个有用的功能是确定传递依赖关系。Haskell 包可以规范化为:
Package
(name + package-specific metadata)
\-> Releases
(version + release-specific metadata like synopsis, attached data files)
\-> Components
(library, executable, test suite, benchmark suite)
\-> Dependencies
(Each component declares a dependency of a package name and version expression).
(每个部分都是一个表,它们通过一对多关系链接在一起。一个包链接到多个版本,每个版本链接到多个组件,每个组件链接到多个依赖项)
为了我自己的启发,我首先降低了模型的复杂性,以创建一个符合我期望的 CTE。特别是,我没有使用 bigint 作为代码库中表的 PK,而是使用 UUID。
(完整 dbfiddle 可在https://dbfiddle.uk/hVOmMdYQ上找到)
-- Data model where packages and versions are combined,
-- and dependencies refer to packages
create table packages (
package_id bigint GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
name text unique not null,
version int[] not null
);
create unique index on packages(name, version);
create table dependencies (
dependency_id bigint generated always as identity PRIMARY KEY,
dependent_id bigint references packages(package_id),
depended_id bigint references packages(package_id)
);
create unique index on dependencies(dependent_id, depended_id);
以下是数据:
insert into packages (name, version) values ('base', '{1,0,0,0}');
insert into packages (name, version) values ('vector', '{0,0,7,0}');
insert into packages (name, version) values ('random', '{0,1,5,8}');
insert into packages (name, version) values ('unix', '{1,2,1,0}');
insert into packages (name, version) values ('time', '{3,14,1,2}');
insert into dependencies (dependent_id, depended_id) values (2, 1);
insert into dependencies (dependent_id, depended_id) values (3, 1);
insert into dependencies (dependent_id, depended_id) values (3, 2);
insert into dependencies (dependent_id, depended_id) values (4, 1);
insert into dependencies (dependent_id, depended_id) values (5, 1);
insert into dependencies (dependent_id, depended_id) values (5, 3);
insert into dependencies (dependent_id, depended_id) values (5, 4);
初步结果如下:
select dependent.package_id, dependent.name as dependent, depended.name as depended
from dependencies as d1
inner join packages as dependent on d1.dependent_id = dependent.package_id
inner join packages as depended on d1.depended_id = depended.package_id;
软件包 ID | 受抚养人 | 依赖 |
---|---|---|
2 | 向量 | 根据 |
3 | 随机的 | 根据 |
3 | 随机的 | 向量 |
4 | unix | 根据 |
5 | 时间 | 根据 |
5 | 时间 | 随机的 |
5 | 时间 | unix |
到目前为止,一切看起来都很好。然后我制作了这个递归 CTE 来创建传递依赖关系的视图,其中包含面包屑:
with recursive transitive_dependencies ( dependent_id, dependent, depended_id, breadcrumbs) as
( select dependent.package_id as dependent_id
, dependent.name as dependent
, depended.package_id as depended_id
, concat_ws(' > ', dependent.name, depended.name) as breadcrumbs
from dependencies as d1
inner join packages as dependent on d1.dependent_id = dependent.package_id
inner join packages as depended on d1.depended_id = depended.package_id
where dependent_id = 5
union all
select dependent.package_id as dependent_id
, dependent.name as dependent
, depended.package_id as depended_id
, concat_ws(' > ', t2.breadcrumbs, depended.name) as breadcrumbs
from dependencies as d1
inner join packages as dependent on d1.dependent_id = dependent.package_id
inner join packages as depended on d1.depended_id = depended.package_id
inner join transitive_dependencies as t2 on t2.depended_id = dependent.package_id -- ← This is where we refer to the CTE
)
cycle dependent_id set is_cycle using path
select t3.dependent_id
, t3.dependent
, t3.depended_id
, t3.breadcrumbs
from transitive_dependencies as t3;
依赖项 ID | 受抚养人 | 依赖的 ID | 面包屑 |
---|---|---|---|
5 | 时间 | 1 | 时间 > 基准 |
5 | 时间 | 3 | 时间 > 随机 |
5 | 时间 | 4 | 时间 > unix |
3 | 随机的 | 1 | 时间 > 随机 > 基础 |
3 | 随机的 | 2 | 时间 > 随机 > 矢量 |
4 | unix | 1 | 时间 > unix > 基础 |
2 | 向量 | 1 | 时间 > 随机 > 矢量 > 基数 |
瞧,它成功了!
现在,我正在考虑进一步拆分。也就是说,软件包和发布版本将被分开。这是因为 Haskell 生态系统中有一些特定元数据针对“软件包”的概念,而有些元数据仅与“发布版本”相关,并且它们不能互换。
-- Data model where packages and releases are separated
create table packages2 (
package_id bigint GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
name text unique not null
);
create table releases2 (
release_id bigint GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
package_id bigint references packages2,
version text not null
);
create unique index on releases2(package_id, version);
create table dependencies2 (
dependency_id bigint GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
release_id bigint references releases2 not null,
package_id bigint references packages2 not null,
requirement int[] not null
);
以下是数据
insert into packages2 (name) values ('base'); -- 1
insert into packages2 (name) values ('vector'); -- 2
insert into packages2 (name) values ('random'); -- 3
insert into packages2 (name) values ('unix'); -- 4
insert into packages2 (name) values ('time'); -- 5
insert into releases2 (package_id, version) values (1, '{1,0,0,0}');
insert into releases2 (package_id, version) values (2, '{0,0,7,0}');
insert into releases2 (package_id, version) values (3, '{0,1,5,8}');
insert into releases2 (package_id, version) values (4, '{1,2,1,0}');
insert into releases2 (package_id, version) values (5, '{3,14,1,2}');
insert into dependencies2 (release_id, package_id, requirement) values ( 2, 1, '== 1.0.0.0' );
insert into dependencies2 (release_id, package_id, requirement) values ( 3, 1, '== 1.0.0.0' );
insert into dependencies2 (release_id, package_id, requirement) values ( 3, 2, '>= 0.0.7.0' );
insert into dependencies2 (release_id, package_id, requirement) values ( 4, 1, '== 1.0.0.0' );
insert into dependencies2 (release_id, package_id, requirement) values ( 5, 1, '== 1.0.0.0' );
insert into dependencies2 (release_id, package_id, requirement) values ( 5, 3, '<= 0.1.5.8' );
insert into dependencies2 (release_id, package_id, requirement) values ( 5, 4, '== 1.2.1.0' );
我尝试将上述 CTE 的教训应用到这个模式中:
with recursive transitive_dependencies2 ( dependent_id, dependent, dependency_id, breadcrumbs) as
(select p2.package_id as dependent_id
, p2.name as dependent
, p3.package_id as dependency_id
, concat_ws(' > ', p2.name, p3.name) as breadcrumbs
from dependencies2 as d0
-- Dependent releases
inner join releases2 as r1 on d0.release_id = r1.release_id
-- Dependent packages
inner join packages2 as p2 on r1.package_id = p2.package_id
-- Dependencies packages
inner join packages2 as p3 on d0.package_id = p3.package_id
where r1.release_id = 5
union
select p2.package_id as dependent_id
, p2.name as dependent
, p3.package_id as dependency_id
, concat_ws(' > ', p2.name, p3.name) as breadcrumbs
from dependencies2 as d0
-- Dependent releases
inner join releases2 as r1 on d0.release_id = r1.release_id
-- Dependent packages
inner join packages2 as p2 on r1.package_id = p2.package_id
-- Dependencies packages
inner join packages2 as p3 on d0.package_id = p3.package_id
inner join transitive_dependencies2 as t2 on t2.dependency_id = p2.package_id ← This is where we refer to the CTE
)
cycle dependent_id set is_cycle using path
select t3.dependent_id
, t3.dependent
, t3.dependency_id
, t3.breadcrumbs
from transitive_dependencies2 as t3;
不幸的是,这并没有给出预期的结果:
依赖项 ID | 受抚养人 | 依赖项 ID | 面包屑 |
---|---|---|---|
5 | 时间 | 1 | 时间 > 基准 |
5 | 时间 | 3 | 时间 > 随机 |
5 | 时间 | 4 | 时间 > unix |
3 | 随机的 | 1 | 随机 > 基础 |
3 | 随机的 | 2 | 随机 > 向量 |
4 | unix | 1 | unix > 基础 |
2 | 向量 | 1 | 矢量 > 基数 |
我的问题如下:我如何建立直觉来进一步将有效的 CTE 拆分到更细粒度的表上?我对这一切还很陌生,这是我的第一个“真实世界” CTE 用例。
很高兴澄清或消除歧义。
通过这种方式,我也对数据建模方面的最佳实践感兴趣。例如,过去有人警告我不要存储外键数组,并努力达到范式并拆分具有不同生命周期的实体。