AskOverflow.Dev

AskOverflow.Dev Logo AskOverflow.Dev Logo

AskOverflow.Dev Navigation

  • 主页
  • 系统&网络
  • Ubuntu
  • Unix
  • DBA
  • Computer
  • Coding
  • LangChain

Mobile menu

Close
  • 主页
  • 系统&网络
    • 最新
    • 热门
    • 标签
  • Ubuntu
    • 最新
    • 热门
    • 标签
  • Unix
    • 最新
    • 标签
  • DBA
    • 最新
    • 标签
  • Computer
    • 最新
    • 标签
  • Coding
    • 最新
    • 标签
主页 / dba / 问题 / 219990
Accepted
Parker
Parker
Asked: 2018-10-13 11:31:09 +0800 CST2018-10-13 11:31:09 +0800 CST 2018-10-13 11:31:09 +0800 CST

通过 PostgreSQL 视图或函数从两个表的“叉积”创建二维真值表

  • 772

我有一个基于 Excel 的工作方法,用于从从 PostgreSQL 数据库导出的两个向量创建真值表。由于大量的VLOOKUPandCOUNTIFS操作,该过程大约需要 4 个小时才能完成,因此我正在寻找一种直接在数据库中将其实现为视图的方法。

输入向量是从我的数据库中的两个现有视图生成的,它们没有外键。

为了使这个问题和解决方案尽可能通用,我使用两个包含示例数据的简单表格创建了一个并行问题,以涵盖所有可能的情况:

CREATE TABLE group_membership
(
  member character varying(6) NOT NULL,
  group_name character varying(64) NOT NULL
);

INSERT INTO group_membership VALUES ('000001','A');
INSERT INTO group_membership VALUES ('000001','B');
INSERT INTO group_membership VALUES ('000001','B'); -- A value may occur more than once.
INSERT INTO group_membership VALUES ('000001','D'); -- A value may not necessarily have a corresponding row in the group table.
INSERT INTO group_membership VALUES ('000001','D');

INSERT INTO group_membership VALUES ('000002','B');
INSERT INTO group_membership VALUES ('000002','C');
INSERT INTO group_membership VALUES ('000002','E');

INSERT INTO group_membership VALUES ('000003','A');
INSERT INTO group_membership VALUES ('000003','C');

INSERT INTO group_membership VALUES ('000004','D');
INSERT INTO group_membership VALUES ('000004','E');

CREATE TABLE groups
(
  name character varying(64) NOT NULL
);

INSERT INTO groups VALUES ('A');
INSERT INTO groups VALUES ('B');
INSERT INTO groups VALUES ('C');
INSERT INTO groups VALUES ('C'); -- A value may occur more than once.
INSERT INTO groups VALUES ('Z');
-- 'D' and 'E' not present in this table

这两个表之间没有关系。

我正在尝试构建一个视图,该视图将创建一个二进制真值表(矩阵),如下所示:

member A B C Z
000001 t t f f
000002 f t t f
000003 t f t f
000004 f f f f

其中第一列是表中的不同成员group_membership,后续列member仅显示表中定义的组中是否存在group。结果表应该仅为布尔值(TRUE如果成员在与组的元组中至少出现一次,FALSE否则)。

例如,上表中的某些特定“单元格”将符合以下内容:

SELECT COUNT(*) > 0 AS value FROM group_membership WHERE group_name='A' AND member='000001';
 value
-------
 t
(1 row)

SELECT COUNT(*) > 0 AS value FROM group_membership WHERE group_name='Z' AND member='000001';
 value
-------
 f
(1 row)

并创建第二列(“A”列):

SELECT COUNT(*) > 0 AS A FROM group_membership WHERE group_name='A' AND member='000001'
 UNION ALL
SELECT COUNT(*) > 0 AS A FROM group_membership WHERE group_name='A' AND member='000002'
 UNION ALL
SELECT COUNT(*) > 0 AS A FROM group_membership WHERE group_name='A' AND member='000003'
 UNION ALL
SELECT COUNT(*) > 0 AS A FROM group_membership WHERE group_name='A' AND member='000004'
;

更好的是这样的(1而0不是TRUEand FALSE):

member A B C Z
000001 1 1 0 0
000002 0 1 1 0
000003 1 0 1 0
000004 0 0 0 0

每个单独的“单元格”的查询可以采用以下形式:

SELECT CASE WHEN COUNT(*) > 0 THEN 1 ELSE 0 END FROM group_membership WHERE group_name='A' AND member='000001';

我的group_membership表有大约 50,000 行,我的group表有大约 200 行。


注意:如果你做类似下面的事情来忽略两个表中不常见的组,你最终会像000004上面的示例结果集中那样消除行,这不是我要找的(成员000004和组Z应该是出现在结果集中):

SELECT * FROM group_membership WHERE group_name IN (SELECT DISTINCT(name) FROM groups);

作为解决这个问题的第一步,我正在研究创建一个FUNCTION依赖于表递归JOIN来group构建结果表的表。

更新: AFUNCTION需要一个RETURNS TABLE定义,鉴于结果集中的列数可变,这看起来不是一个可行的解决方案。我有一些额外的想法是创建一个函数,该函数在一个维度上执行一系列UNIONs,然后用一个视图包装,该视图执行一个UNION以上crosstab()的结果SELECT DISTINCT(name) FROM groups ORDER BY name ASC;

postgresql pivot
  • 2 2 个回答
  • 298 Views

2 个回答

  • Voted
  1. Best Answer
    Jasen
    2018-10-14T12:05:22+08:002018-10-14T12:05:22+08:00

    看起来你基本上想要这个,而不是写这个:

    SELECT member
          ,bool_or(group_name='A')::int as "A"
          ,bool_or(group_name='B')::int as "B"
          ,bool_or(group_name='C')::int as "C"
          ,bool_or(group_name='Z')::int as "Z" 
      FROM group_membership
      GROUP BY member
      ORDER BY member;
    

    Postgres 的结构并不是使动态数据透视表变得容易。

     CREATE or replace FUNCTION prepare() returns void language plpgsql as $F$
     BEGIN
         execute (
              WITH g AS ( select name from groups group by name order by name)
              SELECT
              $$
              create or replace function pg_temp.pivot () 
                  RETURNS TABLE ( member text,
              $$ || string_agg( quote_ident( name )||' INT',',') || $$ 
              ) LANGUAGE SQL AS $X$
                  SELECT member
        $$ || string_agg( ',bool_or(group_name=' || quote_literal( name ) ||
        ')::int AS '|| quote_ident( name ),e'\n') || $$
                     FROM group_membership
                     GROUP BY member
                     ORDER BY member; $X$; $$
        FROM g ) ;
    END;
    $F$;
    
    
    select prepare();
    select * from pg_temp.pivot();
    
     member | A | B | C | Z
    --------+---+---+---+---
     000001 | 1 | 1 | 0 | 0
     000002 | 0 | 1 | 1 | 0
     000003 | 1 | 0 | 1 | 0
     000004 | 0 | 0 | 0 | 0
    (4 rows)
    

    在这里,我使用 SQL 将上述查询形成为一个临时函数,然后从中提取结果流

    在主函数中,我使用了一个子选择,它允许我使用 CTE,这意味着我可以对列名进行排序。

    我本可以在主函数中创建一个临时视图,但直到现在才想到。

    我假设 group_name 中的值不超过 64 个八位字节,varchar(64)不强制执行此操作 - 该类型执行此name操作,并且可能更适合此任务。

    • 3
  2. Parker
    2018-10-14T06:38:53+08:002018-10-14T06:38:53+08:00

    这是一个用于生成任何单元格值的函数(我使用text而不是int在稍后合并标题时避免类型冲突):

    CREATE OR REPLACE FUNCTION is_group_member(mname text, gname text)
    RETURNS text
    AS $$
      SELECT CASE WHEN COUNT(*) > 0 THEN '1' ELSE '0' END FROM group_membership WHERE group_name=gname AND member=mname
    $$ LANGUAGE SQL STABLE;
    
    SELECT is_group_member('000001','A');
    SELECT is_group_member('000001','Z');
    

    有了上面的功能,我们可以做到:

    SELECT DISTINCT(name) AS name,is_group_member('000001',name) FROM groups ORDER BY name ASC;
     name | is_group_member
    ------+-----------------
     A    | 1
     B    | 1
     C    | 0
     Z    | 0
    (4 rows)
    

    我们可以将上面的查询转换为返回一个有序数组:

    SELECT array_agg(is_member) AS membership FROM (SELECT DISTINCT(name) AS name,is_group_member('000001',name) AS is_member FROM groups ORDER BY name ASC) g;
    
     membership
    ------------
     {1,1,0,0}
    (1 row)
    

    但是,我真的不想要括号,所以我会string_agg改用:

    SELECT string_agg(is_member,',') AS membership FROM (SELECT DISTINCT(name) AS name,is_group_member('000001',name) AS is_member FROM groups ORDER BY name ASC) g;
    

    将上述查询转换为函数:

    CREATE OR REPLACE FUNCTION group_memberships(mname text)
    RETURNS text
    AS $$
      SELECT string_agg(is_member,',') AS membership FROM (SELECT DISTINCT(name) AS name,is_group_member(mname,name) AS is_member FROM groups ORDER BY name ASC) g
    $$ LANGUAGE SQL STABLE;
    
    SELECT group_memberships('000001');
    
     group_memberships
    -------------------
     1,1,0,0
    (1 row)
    

    然后在查询中调用函数:

    SELECT DISTINCT(member),group_memberships(member) FROM group_membership ORDER BY member;
     member | group_memberships
    --------+-------------------
     000001 | 1,1,0,0
     000002 | 0,1,1,0
     000003 | 1,0,1,0
     000004 | 0,0,0,0
    (4 rows)
    

    上面的结果正是我要找的,尽管我觉得可以通过以下方式对此进行改进:

    • 简化以使用更少的功能,甚至折叠成一个查询
    • 添加第一行,其中包含组名的有序数组
    • 将结果扩展到一个表中(如这个答案或可能这个答案)

    当然,回想起来,我的组名不能用作列名,因为大小写、特殊字符、空格等。

    因此,继续使用基于数组的方法,我将尝试获得下一个最好的方法。

    获取一行有序的组名:

    SELECT '' AS member,string_agg(DISTINCT(name),',' ORDER BY name ASC) AS group_memberships FROM groups;
    
     member | group_memberships
    --------+-------------------
            | A,B,C,Z
    (1 row)
    

    然后,我可以创建一个合并结果集以生成完整矩阵的视图:

    CREATE VIEW group_memberships AS
      SELECT '' AS member,string_agg(DISTINCT(name),',' ORDER BY name ASC) AS group_memberships FROM groups
       UNION
      SELECT DISTINCT(member),group_memberships(member) FROM group_membership ORDER BY member
    ;
    
     member | group_memberships
    --------+-------------------
            | A,B,C,Z
     000001 | 1,1,0,0
     000002 | 0,1,1,0
     000003 | 1,0,1,0
     000004 | 0,0,0,0
    (5 rows)
    

    使用此解决方案,没有临时表或物化视图。该视图以一种很容易导入到 Excel 中的形式生成结果,因此这对我的目的有用。我想用更少的功能(甚至没有功能)来解决这个问题。

    在命令行上导出,我可以从结果集中删除标头:

    C:\temp>C:\Progra~1\PostgreSQL\9.6\bin\psql.exe -U user -d testdb -c "\copy (SELECT * FROM group_memberships) to 'group_membership.csv' WITH (FORMAT CSV, HEADER FALSE);"
    

    生成以下文件:

    "","A,B,C,Z"
    000001,"1,1,0,0"
    000002,"0,1,1,0"
    000003,"1,0,1,0"
    000004,"0,0,0,0"
    

    由于引用,这并不完美,但它足够接近,可以在文本编辑器中进行一些最小的搜索/替换后导入到 Excel 中。

    要更接近所需的输出文件:

    DROP VIEW group_memberships;
    DROP FUNCTION group_memberships(text);
    DROP FUNCTION is_group_member(text,text);
    
    CREATE OR REPLACE FUNCTION is_group_member(mname text, gname text)
    RETURNS text
    AS $$
      SELECT CASE WHEN COUNT(*) > 0 THEN '"1"' ELSE '"0"' END FROM group_membership WHERE group_name=gname AND member=mname
    $$ LANGUAGE SQL STABLE;
    
    CREATE OR REPLACE FUNCTION group_memberships(mname text)
    RETURNS text
    AS $$
      SELECT '"' || mname || '"' || ',' || string_agg(is_member,',') AS membership FROM (SELECT DISTINCT(name) AS name,is_group_member(mname,name) AS is_member FROM groups ORDER BY name ASC) g
    $$ LANGUAGE SQL STABLE;
    
    CREATE OR REPLACE VIEW group_memberships AS
      SELECT '' AS member, '""' || ',"' || string_agg(DISTINCT(name),'","' ORDER BY name ASC) || '"' AS group_memberships FROM groups
       UNION
      SELECT DISTINCT(member),group_memberships(member) FROM group_membership ORDER BY member
    ;
    
    SELECT group_memberships FROM group_memberships;
    
        group_memberships
    --------------------------
     "","A","B","C","Z"
     "000001","1","1","0","0"
     "000002","0","1","1","0"
     "000003","1","0","1","0"
     "000004","0","0","0","0"
    (5 rows)
    

    并运行:

    C:\temp>C:\Progra~1\PostgreSQL\9.6\bin\psql.exe -U user -d testdb -c "\copy (SELECT group_memberships FROM group_memberships) to 'group_membership.csv' WITH (FORMAT CSV, HEADER FALSE);"
    

    生成以下文件:

    """"",""A"",""B"",""C"",""Z"""
    """000001"",""1"",""1"",""0"",""0"""
    """000002"",""0"",""1"",""1"",""0"""
    """000003"",""1"",""0"",""1"",""0"""
    """000004"",""0"",""0"",""0"",""0"""
    

    逐渐接近,但仍不完美。但是,两次搜索/替换""to 会"产生:

    "","A","B","C","Z"
    "000001","1","1","0","0"
    "000002","0","1","1","0"
    "000003","1","0","1","0"
    "000004","0","0","0","0"
    

    直接导入到 Excel 中。如果组或成员名称中有任何双引号,此方法可能会导致问题,因此如果有人对引用有更好的解决方法,我想听听。

    • 1

相关问题

  • 我可以在使用数据库后激活 PITR 吗?

  • 运行时间偏移延迟复制的最佳实践

  • 存储过程可以防止 SQL 注入吗?

  • PostgreSQL 中 UniProt 的生物序列

  • PostgreSQL 9.0 Replication 和 Slony-I 有什么区别?

Sidebar

Stats

  • 问题 205573
  • 回答 270741
  • 最佳答案 135370
  • 用户 68524
  • 热门
  • 回答
  • Marko Smith

    连接到 PostgreSQL 服务器:致命:主机没有 pg_hba.conf 条目

    • 12 个回答
  • Marko Smith

    如何让sqlplus的输出出现在一行中?

    • 3 个回答
  • Marko Smith

    选择具有最大日期或最晚日期的日期

    • 3 个回答
  • Marko Smith

    如何列出 PostgreSQL 中的所有模式?

    • 4 个回答
  • Marko Smith

    列出指定表的所有列

    • 5 个回答
  • Marko Smith

    如何在不修改我自己的 tnsnames.ora 的情况下使用 sqlplus 连接到位于另一台主机上的 Oracle 数据库

    • 4 个回答
  • Marko Smith

    你如何mysqldump特定的表?

    • 4 个回答
  • Marko Smith

    使用 psql 列出数据库权限

    • 10 个回答
  • Marko Smith

    如何从 PostgreSQL 中的选择查询中将值插入表中?

    • 4 个回答
  • Marko Smith

    如何使用 psql 列出所有数据库和表?

    • 7 个回答
  • Martin Hope
    Jin 连接到 PostgreSQL 服务器:致命:主机没有 pg_hba.conf 条目 2014-12-02 02:54:58 +0800 CST
  • Martin Hope
    Stéphane 如何列出 PostgreSQL 中的所有模式? 2013-04-16 11:19:16 +0800 CST
  • Martin Hope
    Mike Walsh 为什么事务日志不断增长或空间不足? 2012-12-05 18:11:22 +0800 CST
  • Martin Hope
    Stephane Rolland 列出指定表的所有列 2012-08-14 04:44:44 +0800 CST
  • Martin Hope
    haxney MySQL 能否合理地对数十亿行执行查询? 2012-07-03 11:36:13 +0800 CST
  • Martin Hope
    qazwsx 如何监控大型 .sql 文件的导入进度? 2012-05-03 08:54:41 +0800 CST
  • Martin Hope
    markdorison 你如何mysqldump特定的表? 2011-12-17 12:39:37 +0800 CST
  • Martin Hope
    Jonas 如何使用 psql 对 SQL 查询进行计时? 2011-06-04 02:22:54 +0800 CST
  • Martin Hope
    Jonas 如何从 PostgreSQL 中的选择查询中将值插入表中? 2011-05-28 00:33:05 +0800 CST
  • Martin Hope
    Jonas 如何使用 psql 列出所有数据库和表? 2011-02-18 00:45:49 +0800 CST

热门标签

sql-server mysql postgresql sql-server-2014 sql-server-2016 oracle sql-server-2008 database-design query-performance sql-server-2017

Explore

  • 主页
  • 问题
    • 最新
    • 热门
  • 标签
  • 帮助

Footer

AskOverflow.Dev

关于我们

  • 关于我们
  • 联系我们

Legal Stuff

  • Privacy Policy

Language

  • Pt
  • Server
  • Unix

© 2023 AskOverflow.DEV All Rights Reserve