Posso ativar o PITR depois que o banco de dados foi usado

Question

ankush chauhan

Asked: 2024-02-27 18:58:03 +0800 CST2024-02-27 18:58:03 +0800 CST 2024-02-27 18:58:03 +0800 CST

Como melhorar o desempenho desta consulta? Alguma sugestão

772

Consulta:

with firstData as (
   select r.display_name as counter,
          r.createdat,
          r.status,
          r.id as runid
   from run r
      inner join customer c on c.id = r.badmasterid
      inner join bank b on b.id = c.registerid
   where b.displayname in ('abc', 'pqr')
   order by r.createdat desc limit 100
),
secondData as (
   select fi.runid,
          count(1) filter (where action = 'Add' and type = 'Bank') as addcount,
          count(1) filter (where action = 'Modify' and type = 'Bank') as modifycount,
          count(1) filter (where action = 'Delete' and type = 'Bank') as deletecount,
          count(1) filter (where action is null and type = 'Bank') as pendingcount,
          count(1) filter (where (status = 'FAILED' or status = 'FAILED') and type = 'Bank') as failcount
   from bigTable fi
   where fi.runid in
      (select runid from firstData limit 100)
   group by fi.runid
)
select r1.*,
       case when c1.add_count is null then 0 else c1.add_count end,
       case when c1.modify_count is null then 0 else c1.modify_count end,
       case when c1.delete_count is null then 0 else c1.delete_count end,
       case when c1.pending_count is null then 0 else c1.pending_count end,
       case when c1.fail_count is null then 0 else c1.fail_count end
from firstData r1
   left join secondData c1 on r1.runid = c1.runid
order by r1.createdat;

**Indexs on Table:**
run : id column
customer : id column
bank : id column
bigTable : (id, status, type, runid) columns

Plano de execução:

Sort  (cost=839936.19..839939.32 rows=1250 width=116) (actual time=7838.265..7838.275 rows=100 loops=1)
  Sort Key: r1.createdat
  Sort Method: quicksort  Memory: 34kB
  CTE result
    ->  Limit  (cost=79.27..79.52 rows=100 width=52) (actual time=1.601..1.615 rows=100 loops=1)
          ->  Sort  (cost=79.27..79.90 rows=250 width=52) (actual time=1.600..1.607 rows=100 loops=1)
                Sort Key: br.createdat DESC
                Sort Method: top-N heapsort  Memory: 35kB
                ->  Hash Join  (cost=2.84..69.72 rows=250 width=52) (actual time=0.101..1.083 rows=2500 loops=1)
                      Hash Cond: (r.runid = b.id)
                      ->  Seq Scan on run r  (cost=0.00..55.00 rows=2500 width=28) (actual time=0.029..0.213 rows=2500 loops=1)
                      ->  Hash  (cost=2.79..2.79 rows=4 width=36) (actual time=0.066..0.068 rows=40 loops=1)
                            Buckets: 1024  Batches: 1  Memory Usage: 10kB
                            ->  Hash Join  (cost=1.27..2.79 rows=4 width=36) (actual time=0.046..0.061 rows=40 loops=1)
                                  Hash Cond: (b.counterid = rm.id)
                                  ->  Seq Scan on bank b  (cost=0.00..1.40 rows=40 width=12) (actual time=0.011..0.014 rows=40 loops=1)
                                  ->  Hash  (cost=1.25..1.25 rows=2 width=36) (actual time=0.026..0.027 rows=20 loops=1)
                                        Buckets: 1024  Batches: 1  Memory Usage: 9kB
                                        ->  Seq Scan on customer c  (cost=0.00..1.25 rows=2 width=36) (actual time=0.012..0.018 rows=20 loops=1)
                                              Filter: (display_name = ANY ('{abc,pqr}'::text[]))
  ->  Hash Right Join  (cost=839720.50..839792.38 rows=1250 width=116) (actual time=7838.151..7838.230 rows=100 loops=1)
        Hash Cond: (fi.runid = r1.runid)
        ->  HashAggregate  (cost=839717.25..839742.25 rows=2500 width=48) (actual time=7836.481..7836.521 rows=100 loops=1)
              Group Key: fi.runid
              Batches: 1  Memory Usage: 145kB
              ->  Hash Semi Join  (cost=3.25..810917.25 rows=720000 width=31) (actual time=615.639..7395.089 rows=719888 loops=1)
                    Hash Cond: (fi.runid = firstData.runid)
                    ->  Seq Scan on bigTable fi  (cost=0.00..755654.00 rows=18000000 width=31) (actual time=20.792..5093.711 rows=18000000 loops=1)
                    ->  Hash  (cost=2.00..2.00 rows=100 width=4) (actual time=0.046..0.047 rows=100 loops=1)
                          Buckets: 1024  Batches: 1  Memory Usage: 12kB
                          ->  Limit  (cost=0.00..2.00 rows=100 width=4) (actual time=0.004..0.023 rows=100 loops=1)
                                ->  CTE Scan on firstData   (cost=0.00..2.00 rows=100 width=4) (actual time=0.003..0.017 rows=100 loops=1)
        ->  Hash  (cost=2.00..2.00 rows=100 width=76) (actual time=1.660..1.661 rows=100 loops=1)
              Buckets: 1024  Batches: 1  Memory Usage: 15kB
              ->  CTE Scan on firstData r1  (cost=0.00..2.00 rows=100 width=76) (actual time=1.603..1.630 rows=100 loops=1)

Planning Time: 4.387 ms
Execution Time: 7838.458 ms

1 respostas

Voted

bobflux · Answer 1 · 2024-02-28T00:05:10+08:00

Criando uma configuração de teste:

CREATE UNLOGGED TABLE bigtable (
    runid       INTEGER NOT NULL,
    type        TEXT NOT NULL,
    action      TEXT NULL,
    status      TEXT NULL
);

INSERT INTO bigtable
SELECT n, t.column1, a.column1, CASE WHEN random()<0.01 THEN 'FAILED' ELSE NULL END FROM 
    (VALUES ('Bank'),('Foo'),('Bar')) t,
    (VALUES ('Add'),('Modify'),('Delete'),(NULL)) a,
    generate_series(1,200) x,
    generate_series(1,25000) n;
VACUUM ANALYZE bigtable;

Vamos tentar a consulta problemática:

EXPLAIN ANALYZE with firstData as MATERIALIZED (SELECT generate_series(1,100) runid)
    select fi.runid,
             count(1) filter (where action = 'Add' and type = 'Bank') as addcount,
             count(1) filter (where action = 'Modify' and type = 'Bank') as modifycount,
             count(1) filter (where action = 'Delete' and type = 'Bank') as deletecount,
             count(1) filter (where action is null and type = 'Bank') as pendingcount,
             count(1) filter (where (status = 'FAILED' or status = 'FAILED') and type = 'Bank') as failcount
    from bigTable fi JOIN firstData USING (runid)
    group by fi.runid;

Sem nenhum índice, consigo um plano semelhante ao da pergunta, com o seq scan no bigtable. É muito lento.

CREATE INDEX ON bigtable( runid, type, action );
CREATE INDEX ON bigtable( runid, type, status );

Mesma consulta:

 HashAggregate  (cost=426190.72..426359.97 rows=16925 width=44) (actual time=111.246..111.331 rows=100 loops=1)
    Group Key: fi.runid
    Batches: 1  Memory Usage: 817kB
    CTE firstdata
      ->  ProjectSet  (cost=0.00..0.52 rows=100 width=4) (actual time=21.951..21.962 rows=100 loops=1)
              ->  Result  (cost=0.00..0.01 rows=1 width=0) (actual time=21.946..21.946 rows=1 loops=1)
    ->  Nested Loop  (cost=0.56..412010.00 rows=354505 width=20) (actual time=21.978..66.854 rows=240000 loops=1)
            ->  CTE Scan on firstdata  (cost=0.00..2.00 rows=100 width=4) (actual time=21.954..21.993 rows=100 loops=1)
            ->  Index Scan using bigtable_runid_type_action_idx on bigtable fi  (cost=0.56..4084.63 rows=3545 width=20) (actual time=0.004..0.273 rows=2400 loops=100)
                    Index Cond: (runid = firstdata.runid)
 Planning Time: 0.408 ms
 Execution Time: 119.600 ms

Isso é cerca de 100x mais rápido. Ele está usando o índice que acabei de criar, mas apenas para encontrar "runid=...". Não está usando as outras colunas, porque o postgres não sabe fazer isso com vários FILTROs agregados. Portanto, um índice simples em (runid) funcionaria igualmente bem.

Se você tiver um índice de várias colunas (id, status, tipo, runid), será inútil porque runid é a última coluna. além disso, com id sendo a primeira coluna e único, esse índice multicolunas não pode fazer muito mais do que um índice apenas (id), que você já possui se for a chave primária, a menos que você o esteja usando para algo muito específico Acho que pode ser removido.

Agora, a razão pela qual criei esses dois índices é para realmente usá-los:

EXPLAIN ANALYZE with firstData as MATERIALIZED (SELECT generate_series(1,100) runid)
    select f.runid,
            (SELECT count(*) FROM bigtable b WHERE b.runid=f.runid AND action = 'Add'     and type = 'Bank') as addcount,
            (SELECT count(*) FROM bigtable b WHERE b.runid=f.runid AND action = 'Modify'  and type = 'Bank') as modifycount,
            (SELECT count(*) FROM bigtable b WHERE b.runid=f.runid AND action = 'Delete'  and type = 'Bank') as deletecount,
            (SELECT count(*) FROM bigtable b WHERE b.runid=f.runid AND action is null     and type = 'Bank') as pendingcount,
            (SELECT count(*) FROM bigtable b WHERE b.runid=f.runid AND status = 'FAILED'  and type = 'Bank') as failcount
    FROM firstData f;

 CTE Scan on firstdata f  (cost=0.52..5282.52 rows=100 width=44) (actual time=0.529..11.163 rows=100 loops=1)
    CTE firstdata
      ->  ProjectSet  (cost=0.00..0.52 rows=100 width=4) (actual time=0.006..0.017 rows=100 loops=1)
              ->  Result  (cost=0.00..0.01 rows=1 width=0) (actual time=0.002..0.002 rows=1 loops=1)
    SubPlan 2
      ->  Aggregate  (cost=12.12..12.12 rows=1 width=8) (actual time=0.027..0.027 rows=1 loops=100)
              ->  Index Only Scan using bigtable_runid_type_action_idx on bigtable b  (cost=0.56..11.36 rows=302 width=0) (actual time=0.004..0.019 rows=200 loops=100)
                      Index Cond: ((runid = f.runid) AND (type = 'Bank'::text) AND (action = 'Add'::text))
                      Heap Fetches: 0
    SubPlan 3
      ->  Aggregate  (cost=11.92..11.93 rows=1 width=8) (actual time=0.027..0.027 rows=1 loops=100)
              ->  Index Only Scan using bigtable_runid_type_action_idx on bigtable b_1  (cost=0.56..11.18 rows=294 width=0) (actual time=0.003..0.018 rows=200 loops=100)
                      Index Cond: ((runid = f.runid) AND (type = 'Bank'::text) AND (action = 'Modify'::text))
                      Heap Fetches: 0
    SubPlan 4
      ->  Aggregate  (cost=12.01..12.02 rows=1 width=8) (actual time=0.025..0.025 rows=1 loops=100)
              ->  Index Only Scan using bigtable_runid_type_action_idx on bigtable b_2  (cost=0.56..11.27 rows=298 width=0) (actual time=0.003..0.017 rows=200 loops=100)
                      Index Cond: ((runid = f.runid) AND (type = 'Bank'::text) AND (action = 'Delete'::text))
                      Heap Fetches: 0
    SubPlan 5
      ->  Aggregate  (cost=11.87..11.88 rows=1 width=8) (actual time=0.026..0.026 rows=1 loops=100)
              ->  Index Only Scan using bigtable_runid_type_action_idx on bigtable b_3  (cost=0.56..11.13 rows=292 width=0) (actual time=0.003..0.017 rows=200 loops=100)
                      Index Cond: ((runid = f.runid) AND (type = 'Bank'::text) AND (action IS NULL))
                      Heap Fetches: 0
    SubPlan 6
      ->  Aggregate  (cost=4.84..4.85 rows=1 width=8) (actual time=0.004..0.004 rows=1 loops=100)
              ->  Index Only Scan using bigtable_runid_type_status_idx on bigtable b_4  (cost=0.56..4.81 rows=11 width=0) (actual time=0.003..0.003 rows=7 loops=100)
                      Index Cond: ((runid = f.runid) AND (type = 'Bank'::text) AND (status = 'FAILED'::text))
                      Heap Fetches: 0
 Planning Time: 0.680 ms
 Execution Time: 11.313 ms
(31 rows)

Desta vez, ele está usando varreduras apenas de índice para tudo, e também nos livramos dos HashAggregates, então é 10x mais rápido que o anterior, para uma aceleração total de cerca de 1000x.

Mas precisa de dois índices multicolunas, que consomem recursos. Você pode usar esses índices para outras consultas. O anterior só precisa de um índice menor no runid (ou um dos índices multicolunas).

Como melhorar o desempenho desta consulta? Alguma sugestão

conectar ao servidor PostgreSQL: FATAL: nenhuma entrada pg_hba.conf para o host

Como fazer a saída do sqlplus aparecer em uma linha?

Selecione qual tem data máxima ou data mais recente

Como faço para listar todos os esquemas no PostgreSQL?

Listar todas as colunas de uma tabela especificada

Como usar o sqlplus para se conectar a um banco de dados Oracle localizado em outro host sem modificar meu próprio tnsnames.ora

Como você mysqldump tabela (s) específica (s)?

Listar os privilégios do banco de dados usando o psql

Como inserir valores em uma tabela de uma consulta de seleção no PostgreSQL?

Como faço para listar todos os bancos de dados e tabelas usando o psql?

Como melhorar o desempenho desta consulta? Alguma sugestão

1 respostas

relate perguntas