Posso ativar o PITR depois que o banco de dados foi usado

Question

p.matsinopoulos

Asked: 2023-01-31 05:41:12 +0800 CST2023-01-31 05:41:12 +0800 CST 2023-01-31 05:41:12 +0800 CST

Por que a consulta leva anos quando filtro por uma coluna booleana indexada?

772

Eu tenho uma consulta que filtra em uma booleancoluna que possui index. Mas, a consulta leva séculos para terminar. Quando não uso esse filtro, a consulta retorna muito rapidamente.

Aqui estão os planos de explicação. O primeiro tem o processed is truee demora séculos para terminar. O segundo não tem e retorna imediatamente.

explain select count(*) from listen_events where (started_at >='2021-12-26' and started_at <'2021-12-27') and processed is true;
                                                                                 QUERY PLAN                                                                                 
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=212405.62..212405.63 rows=1 width=8)
   ->  Bitmap Heap Scan on listen_events  (cost=187657.78..212390.09 rows=6213 width=0)
         Recheck Cond: ((started_at >= '2021-12-26 00:00:00'::timestamp without time zone) AND (started_at < '2021-12-27 00:00:00'::timestamp without time zone))
         Filter: (processed IS TRUE)
         ->  BitmapAnd  (cost=187657.78..187657.78 rows=6213 width=0)
               ->  Bitmap Index Scan on index_listen_events_on_started_at  (cost=0.00..17323.56 rows=813898 width=0)
                     Index Cond: ((started_at >= '2021-12-26 00:00:00'::timestamp without time zone) AND (started_at < '2021-12-27 00:00:00'::timestamp without time zone))
               ->  Bitmap Index Scan on listen_events_processed_idx  (cost=0.00..170330.87 rows=9125639 width=0)
                     Index Cond: (processed = true)
(9 rows)

=> explain select count(*) from listen_events where (started_at >='2021-12-26' and started_at <'2021-12-27');
                                                                                 QUERY PLAN                                                                                 
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Finalize Aggregate  (cost=24549.13..24549.14 rows=1 width=8)
   ->  Gather  (cost=24548.92..24549.13 rows=2 width=8)
         Workers Planned: 2
         ->  Partial Aggregate  (cost=23548.92..23548.93 rows=1 width=8)
               ->  Parallel Index Only Scan using index_listen_events_on_started_at on listen_events  (cost=0.58..22701.11 rows=339124 width=0)
                     Index Cond: ((started_at >= '2021-12-26 00:00:00'::timestamp without time zone) AND (started_at < '2021-12-27 00:00:00'::timestamp without time zone))
(6 rows)

Segue a configuração da tabela:

Table "public.listen_events"
     Column     |            Type             | Collation | Nullable |                  Default                  | Storage  | Stats target | Description 
----------------+-----------------------------+-----------+----------+-------------------------------------------+----------+--------------+-------------
 id             | integer                     |           | not null | nextval('listen_events_id_seq'::regclass) | plain    |              | 
 event_type     | text                        |           |          |                                           | extended |              | 
 stream_type    | text                        |           |          |                                           | extended |              | 
 event_id       | text                        |           |          |                                           | extended |              | 
 broadcast_uid  | text                        |           |          |                                           | extended |              | 
 user_agent     | text                        |           |          |                                           | extended |              | 
 city           | text                        |           |          |                                           | extended |              | 
 country        | text                        |           |          |                                           | extended |              | 
 referrer       | text                        |           |          |                                           | extended |              | 
 country_code   | character varying(2)        |           |          |                                           | extended |              | 
 continent_code | character varying(2)        |           |          |                                           | extended |              | 
 user_id        | integer                     |           |          |                                           | plain    |              | 
 started_at     | timestamp without time zone |           |          |                                           | plain    |              | 
 created_at     | timestamp without time zone |           |          |                                           | plain    |              | 
 updated_at     | timestamp without time zone |           |          |                                           | plain    |              | 
 ip_address     | cidr                        |           |          |                                           | main     |              | 
 location       | point                       |           |          |                                           | plain    |              | 
 ended_at       | timestamp without time zone |           |          |                                           | plain    |              | 
 server_id      | text                        |           |          |                                           | extended |              | 
 channel_id     | integer                     |           |          |                                           | plain    |              | 
 id_bigint      | bigint                      |           |          |                                           | plain    |              | 
 processed      | boolean                     |           | not null | false                                     | plain    |              | 
Indexes:
    "listen_events_pkey" PRIMARY KEY, btree (id)
    "index_listen_events_event_id" btree (event_id)
    "index_listen_events_on_broadcast_uid" btree (broadcast_uid)
    "index_listen_events_on_started_at" btree (started_at)
    "index_listen_events_on_user_id" btree (user_id)
    "listen_events_processed_idx" btree (processed)
Options: autovacuum_enabled=true, autovacuum_vacuum_scale_factor=0, autovacuum_vacuum_threshold=30000, autovacuum_vacuum_cost_delay=0, autovacuum_analyze_scale_factor=0, autovacuum_analyze_threshold=30000, toast.autovacuum_enabled=true

Atualmente, a tabela possui 1,9 bilhão de linhas e a maioria possui processed = false.

Alguma pista de por que isso está acontecendo?

1 respostas

Voted

Laurenz Albe · Answer 1 · 2023-01-31T05:56:30+08:00

Best Answer

Laurenz Albe

2023-01-31T05:56:30+08:002023-01-31T05:56:30+08:00

Você não mostra EXPLAIN (ANALYZE, BUFFERS)a saída, então estou reduzido a adivinhar. De qualquer forma, existem duas diferenças principais:

Como não há índice único para a consulta, o PostgreSQL combina dois índices. Isso é um pouco mais trabalhoso do que digitalizar um único índice.
A principal diferença é que a consulta rápida pode usar uma varredura somente de índice, enquanto a consulta lenta não pode.

Eu criaria um índice de duas colunas como este:

CREATE INDEX ON listen_events (processed, started_at);

Se você consultar apenas linhas com processed IS TRUE, também poderá criar um índice menor e mais rápido:

CREATE INDEX ON listen_events (started_at) WHERE processed IS TRUE;

3

Por que a consulta leva anos quando filtro por uma coluna booleana indexada?

conectar ao servidor PostgreSQL: FATAL: nenhuma entrada pg_hba.conf para o host

Como fazer a saída do sqlplus aparecer em uma linha?

Selecione qual tem data máxima ou data mais recente

Como faço para listar todos os esquemas no PostgreSQL?

Listar todas as colunas de uma tabela especificada

Como usar o sqlplus para se conectar a um banco de dados Oracle localizado em outro host sem modificar meu próprio tnsnames.ora

Como você mysqldump tabela (s) específica (s)?

Listar os privilégios do banco de dados usando o psql

Como inserir valores em uma tabela de uma consulta de seleção no PostgreSQL?

Como faço para listar todos os bancos de dados e tabelas usando o psql?

Por que a consulta leva anos quando filtro por uma coluna booleana indexada?

1 respostas

relate perguntas