Arun提出的问题 -dba

Arun

Asked: 2023-10-04 22:12:07 +0800 CST

Melhoria de desempenho de consulta recursiva PostgreSQL

Estou construindo um banco de dados que é usado para derivar relações hierárquicas para as quais escolhi um CTE recursivo PostgreSQL.

minha estrutura de tabela é

table_name             |column_name           |data_type               |
-----------------------+----------------------+------------------------+
source_relationship_dev|relationship_id       |character varying       |
source_relationship_dev|subject               |character varying       |
source_relationship_dev|predicate             |character varying       |
source_relationship_dev|object                |character varying       |
source_relationship_dev|rel_date              |character varying       |
source_relationship_dev|provided_by           |character varying       |
source_relationship_dev|harvested_date        |character varying       |
source_relationship_dev|rel_status            |character varying       |
source_relationship_dev|subject_status        |character varying       |
source_relationship_dev|object_status         |character varying       |
source_relationship_dev|source_relationship_id|integer                 |
source_relationship_dev|correction_id         |character varying       |
source_relationship_dev|harvested_date        |character varying       |
source_relationship_dev|obj_status            |character varying       |
source_relationship_dev|obj                   |character varying       |
source_relationship_dev|predicate             |character varying       |
source_relationship_dev|provided_by           |character varying       |
source_relationship_dev|rel_status            |character varying       |
source_relationship_dev|start_date            |character varying       |
source_relationship_dev|sub_status            |character varying       |
source_relationship_dev|subject               |character varying       |
source_relationship_dev|audit_created_date    |timestamp with time zone|
source_relationship_dev|audit_created_by      |character varying       |
source_relationship_dev|audit_modified_date   |timestamp with time zone|
source_relationship_dev|audit_modified_by     |character varying       |

Índice:

table_name             |column_name           |data_type               |
-----------------------+----------------------+------------------------+
source_relationship_dev|relationship_id       |character varying       |
source_relationship_dev|subject               |character varying       |
source_relationship_dev|predicate             |character varying       |
source_relationship_dev|object                |character varying       |
source_relationship_dev|rel_date              |character varying       |
source_relationship_dev|provided_by           |character varying       |
source_relationship_dev|harvested_date        |character varying       |
source_relationship_dev|rel_status            |character varying       |
source_relationship_dev|subject_status        |character varying       |
source_relationship_dev|object_status         |character varying       |
source_relationship_dev|source_relationship_id|integer                 |
source_relationship_dev|correction_id         |character varying       |
source_relationship_dev|harvested_date        |character varying       |
source_relationship_dev|obj_status            |character varying       |
source_relationship_dev|obj                   |character varying       |
source_relationship_dev|predicate             |character varying       |
source_relationship_dev|provided_by           |character varying       |
source_relationship_dev|rel_status            |character varying       |
source_relationship_dev|start_date            |character varying       |
source_relationship_dev|sub_status            |character varying       |
source_relationship_dev|subject               |character varying       |
source_relationship_dev|audit_created_date    |timestamp with time zone|
source_relationship_dev|audit_created_by      |character varying       |
source_relationship_dev|audit_modified_date   |timestamp with time zone|
source_relationship_dev|audit_modified_by     |character varying       |

Minha consulta:

select 
    correction_id as id,
    subject,
    obj,
    predicate,
    start_date
    from
(
with recursive cte as (
        select
            correction_id,
            subject ,
            predicate ,
            obj ,
            start_date,
            provided_by,
            harvested_date
        from
            source_relationship_dev
        where
            obj_status != 'DELETE'
            and rel_status != 'DELETE'
            and sub_status != 'DELETE'
    union all
        select
            c.correction_id,
            c.subject ,
            c.predicate ,
            c.obj ,
            c.start_date,
            c.provided_by,
            c.harvested_date
        from
            source_relationship_dev c
        join cte e on
            c.subject = e.predicate
        where
            c.obj_status != 'DELETE'
            and c.rel_status != 'DELETE'
            and c.sub_status != 'DELETE'
 )
        select 
            correction_id,
            subject,
            predicate,
            obj,
            harvested_date,
            provided_by,
            start_date,
            subject || '-' || predicate || '-' || obj as stored_relationship
        from
            cte
        join relation_inferences on
            relationship = predicate
    union all
        select
            correction_id,
            obj,
            relationship,
            subject,
            harvested_date,
            provided_by,
            start_date,
            subject || '-' || predicate || '-' || obj as stored_relationship
        from
            cte
        join relation_inferences on
            inverse_relationship = predicate
    union all
        select
            correction_id||'_inverse' as relationship_id,
            obj,
            inverse_relationship,
            subject,
            harvested_date,
            provided_by,
            start_date,
            subject || '-' || predicate || '-' || obj as stored_relationship
        from
            cte
        join relation_inferences on
            relationship = predicate
    union all
        select
            correction_id||'_inverse' as relationship_id,
            subject,
            inverse_relationship,
            obj,
            harvested_date,
            provided_by,
            start_date,
            subject || '-' || predicate || '-' || obj as stored_relationship
        from
            cte
        join relation_inferences on
            inverse_relationship = predicate) as chk
            where  provided_by = (select provided_by from source_relationship_dev 
                                where subject=chk.subject
                                or obj=chk.subject
                                order by harvested_Date desc fetch first row only);

Meu plano Explicar é

Subquery Scan on chk  (cost=313222.30..107250816.83 rows=866 width=1154) (actual time=0.043..52208.024 rows=28960 loops=1)
  Filter: ((chk.provided_by)::text = ((SubPlan 1))::text)
  ->  Append  (cost=313222.30..483994.99 rows=173152 width=1522) (actual time=0.030..103.675 rows=28960 loops=1)
        CTE cte
          ->  Recursive Union  (cost=0.00..313221.19 rows=1731501 width=284) (actual time=0.009..28.418 rows=14481 loops=1)
                ->  Seq Scan on source_relationship_dev source_relationship_dev_1  (cost=0.00..833.42 rows=14481 width=284) (actual time=0.007..7.088 rows=14481 loops=1)
                      Filter: (((obj_status)::text <> 'DELETE'::text) AND ((rel_status)::text <> 'DELETE'::text) AND ((sub_status)::text <> 'DELETE'::text))
                ->  Hash Join  (cost=1566.43..27775.77 rows=171702 width=284) (actual time=16.391..16.393 rows=0 loops=1)
                      Hash Cond: ((e.predicate)::text = (c.subject)::text)
                      ->  WorkTable Scan on cte e  (cost=0.00..2896.20 rows=144810 width=516) (actual time=0.005..1.590 rows=14481 loops=1)
                      ->  Hash  (cost=833.42..833.42 rows=14481 width=284) (actual time=10.504..10.505 rows=14481 loops=1)
                            Buckets: 16384  Batches: 2  Memory Usage: 1973kB
                            ->  Seq Scan on source_relationship_dev c  (cost=0.00..833.42 rows=14481 width=284) (actual time=0.008..4.008 rows=14481 loops=1)
                                  Filter: (((obj_status)::text <> 'DELETE'::text) AND ((rel_status)::text <> 'DELETE'::text) AND ((sub_status)::text <> 'DELETE'::text))
        ->  Hash Join  (cost=1.11..41990.02 rows=43288 width=2490) (actual time=0.029..49.124 rows=7239 loops=1)
              Hash Cond: ((cte.predicate)::text = (relation_inferences.relationship)::text)
              ->  CTE Scan on cte  (cost=0.00..34630.02 rows=1731501 width=2458) (actual time=0.010..38.674 rows=14481 loops=1)
              ->  Hash  (cost=1.05..1.05 rows=5 width=218) (actual time=0.006..0.007 rows=5 loops=1)
                    Buckets: 1024  Batches: 1  Memory Usage: 9kB
                    ->  Seq Scan on relation_inferences  (cost=0.00..1.05 rows=5 width=218) (actual time=0.002..0.004 rows=5 loops=1)
        ->  Hash Join  (cost=1.11..41990.02 rows=43288 width=2192) (actual time=0.033..15.323 rows=7241 loops=1)
              Hash Cond: ((cte_1.predicate)::text = (relation_inferences_1.inverse_relationship)::text)
              ->  CTE Scan on cte cte_1  (cost=0.00..34630.02 rows=1731501 width=2458) (actual time=0.001..4.770 rows=14481 loops=1)
              ->  Hash  (cost=1.05..1.05 rows=5 width=436) (actual time=0.016..0.016 rows=5 loops=1)
                    Buckets: 1024  Batches: 1  Memory Usage: 9kB
                    ->  Seq Scan on relation_inferences relation_inferences_1  (cost=0.00..1.05 rows=5 width=436) (actual time=0.010..0.012 rows=5 loops=1)
        ->  Subquery Scan on "*SELECT* 3"  (cost=1.11..42531.12 rows=43288 width=1708) (actual time=0.038..17.136 rows=7239 loops=1)
              ->  Hash Join  (cost=1.11..42098.24 rows=43288 width=1708) (actual time=0.037..15.375 rows=7239 loops=1)
                    Hash Cond: ((cte_2.predicate)::text = (relation_inferences_2.relationship)::text)
                    ->  CTE Scan on cte cte_2  (cost=0.00..34630.02 rows=1731501 width=2458) (actual time=0.001..4.601 rows=14481 loops=1)
                    ->  Hash  (cost=1.05..1.05 rows=5 width=436) (actual time=0.014..0.015 rows=5 loops=1)
                          Buckets: 1024  Batches: 1  Memory Usage: 9kB
                          ->  Seq Scan on relation_inferences relation_inferences_2  (cost=0.00..1.05 rows=5 width=436) (actual time=0.010..0.011 rows=5 loops=1)
        ->  Subquery Scan on "*SELECT* 4"  (cost=1.11..42531.12 rows=43288 width=1708) (actual time=0.042..16.969 rows=7241 loops=1)
              ->  Hash Join  (cost=1.11..42098.24 rows=43288 width=1708) (actual time=0.041..15.283 rows=7241 loops=1)
                    Hash Cond: ((cte_3.predicate)::text = (relation_inferences_3.inverse_relationship)::text)
                    ->  CTE Scan on cte cte_3  (cost=0.00..34630.02 rows=1731501 width=2458) (actual time=0.001..4.816 rows=14481 loops=1)
                    ->  Hash  (cost=1.05..1.05 rows=5 width=218) (actual time=0.014..0.015 rows=5 loops=1)
                          Buckets: 1024  Batches: 1  Memory Usage: 9kB
                          ->  Seq Scan on relation_inferences relation_inferences_3  (cost=0.00..1.05 rows=5 width=218) (actual time=0.008..0.010 rows=5 loops=1)
  SubPlan 1
    ->  Limit  (cost=0.29..616.59 rows=1 width=33) (actual time=1.798..1.798 rows=1 loops=28960)
          ->  Index Scan using source_hrvst_idx on source_relationship_dev  (cost=0.29..1232.90 rows=2 width=33) (actual time=1.797..1.797 rows=1 loops=28960)
                Filter: (((subject)::text = (chk.subject)::text) OR ((obj)::text = (chk.subject)::text))
                Rows Removed by Filter: 6783
Planning Time: 0.765 ms
Execution Time: 52211.281 ms

Esses 52.211,281 ms são para 14.441 registros.

Posso melhorar o desempenho da consulta tal como está, em vez de mexer na minha lógica?

Melhoria de desempenho de consulta recursiva PostgreSQL

conectar ao servidor PostgreSQL: FATAL: nenhuma entrada pg_hba.conf para o host

Como fazer a saída do sqlplus aparecer em uma linha?

Selecione qual tem data máxima ou data mais recente

Como faço para listar todos os esquemas no PostgreSQL?

Listar todas as colunas de uma tabela especificada

Como usar o sqlplus para se conectar a um banco de dados Oracle localizado em outro host sem modificar meu próprio tnsnames.ora

Como você mysqldump tabela (s) específica (s)?

Listar os privilégios do banco de dados usando o psql

Como inserir valores em uma tabela de uma consulta de seleção no PostgreSQL?

Como faço para listar todos os bancos de dados e tabelas usando o psql?

Arun's questions