Tenho um cenário em que o postgres está fazendo uma quantidade enorme de trabalho extra.
Estou tentando consultar uma view que é computacionalmente muito pesada. No entanto, nunca pretendi retornar grandes conjuntos de dados dela, então imaginei que provavelmente seria ok...
No entanto, quando tento filtrá-lo, parece que o PG Planner está fazendo uma materialização completa da visualização e só então aplicando a filtragem.
SELECT * FROM record_view ORDER BY last_updated desc LIMIT 1
Então pensei que talvez pudesse forçá-lo a ter um desempenho melhor.
SELECT * FROM record_view WHERE id IN (
SELECT record_id FROM record_field GROUP BY record_id ORDER BY MAX(updated_at) desc LIMIT 1
)
Demora cerca de 6x mais do que se eu executasse as duas consultas sequencialmente.
Eu até tentei adicionar índices bem específicos para tentar priorizar a primeira consulta:
CREATE INDEX idx_record_field_recordid_updatedat_desc
ON record_field (record_id, updated_at DESC);
Detalhes
-- Object Definitions
create table "public"."object" (
"id" text not null,
"created_at" timestamp with time zone not null default now(),
"label" text
);
-- Field Definitions
create table "public"."field" (
"id" text not null,
"object_id" text not null,
"label" text not null,
"created_at" timestamp with time zone not null default now(),
"type" text not null default 'string'::text,
"is_unique" boolean not null default false
);
alter table "public"."object" enable row level security;
alter table "public"."field" enable row level security;
CREATE UNIQUE INDEX "object_pkey" ON public.object USING btree (id);
CREATE UNIQUE INDEX field_pkey ON public.field USING btree (id, object_id);
alter table "public"."object" add constraint "object_pkey" PRIMARY KEY using index "object_pkey";
alter table "public"."field" add constraint "field_pkey" PRIMARY KEY using index "field_pkey";
alter table "public"."field" add constraint "field_object_id_fkey" FOREIGN KEY (object_id) REFERENCES object(id) ON UPDATE CASCADE ON DELETE CASCADE not valid;
alter table "public"."field" validate constraint "field_object_id_fkey";
create table "public"."record" (
"id" uuid not null default gen_random_uuid(),
"created_at" timestamp with time zone not null default now(),
"object_id" text not null
);
alter table "public"."record" enable row level security;
CREATE UNIQUE INDEX record_pkey ON public.record USING btree (id, object_id);
create table "public"."record_field" (
"field_id" text not null,
"object_id" text not null,
"record_id" uuid not null,
"updated_at" timestamp with time zone not null default now(),
"data" jsonb
);
alter table "public"."record_field" enable row level security;
CREATE UNIQUE INDEX record_field_pkey ON public.record_field USING btree (field_id, object_id, record_id);
alter table "public"."record" add constraint "record_pkey" PRIMARY KEY using index "record_pkey";
alter table "public"."record_field" add constraint "record_field_pkey" PRIMARY KEY using index "record_field_pkey";
alter table "public"."record" add constraint "record_object_id_fkey" FOREIGN KEY (object_id) REFERENCES object(id) ON UPDATE CASCADE ON DELETE CASCADE not valid;
alter table "public"."record" validate constraint "record_object_id_fkey";
alter table "public"."record_field" add constraint "record_field_field_id_object_id_fkey" FOREIGN KEY (field_id, object_id) REFERENCES field(id, object_id) ON UPDATE CASCADE ON DELETE CASCADE not valid;
alter table "public"."record_field" validate constraint "record_field_field_id_object_id_fkey";
alter table "public"."record_field" add constraint "record_field_object_id_record_id_fkey" FOREIGN KEY (object_id, record_id) REFERENCES record(object_id, id) not valid;
alter table "public"."record_field" validate constraint "record_field_object_id_record_id_fkey";
CREATE OR REPLACE VIEW "public"."record_view"
WITH (security_invoker = true) AS
SELECT
rf.object_id,
rf.record_id AS id,
json_object_agg(f.id, rf.data) AS data,
MAX(rf.updated_at) AS last_updated
FROM
record_field rf
JOIN
field f
ON rf.field_id = f.id
AND rf.object_id = f.object_id
GROUP BY
rf.object_id,
rf.record_id;
Explique os resultados
explain analyse SELECT * FROM record_view WHERE id IN (
SELECT record_id FROM record_field GROUP BY record_id ORDER BY max(updated_at) desc LIMIT 1
)
| QUERY PLAN |
| ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
| Hash Join (cost=430728.66..482205.64 rows=562 width=63) (actual time=2267.232..2879.726 rows=1 loops=1) |
| Hash Cond: (rf.record_id = "ANY_subquery".record_id) |
| -> HashAggregate (cost=296770.54..346821.66 rows=112444 width=63) (actual time=1468.243..2332.244 rows=112011 loops=1) |
| Group Key: rf.object_id, rf.record_id |
| Planned Partitions: 256 Batches: 257 Memory Usage: 3113kB Disk Usage: 245328kB |
| -> Hash Join (cost=20.25..48817.12 rows=1779038 width=86) (actual time=0.041..598.819 rows=1779038 loops=1) |
| Hash Cond: ((rf.field_id = f.id) AND (rf.object_id = f.object_id)) |
| -> Seq Scan on record_field rf (cost=0.00..39456.38 rows=1779038 width=63) (actual time=0.012..254.655 rows=1779038 loops=1) |
| -> Hash (cost=14.10..14.10 rows=410 width=64) (actual time=0.018..0.021 rows=25 loops=1) |
| Buckets: 1024 Batches: 1 Memory Usage: 10kB |
| -> Seq Scan on field f (cost=0.00..14.10 rows=410 width=64) (actual time=0.008..0.010 rows=25 loops=1) |
| -> Hash (cost=133958.11..133958.11 rows=1 width=16) (actual time=538.110..539.750 rows=1 loops=1) |
| Buckets: 1024 Batches: 1 Memory Usage: 9kB |
| -> Subquery Scan on "ANY_subquery" (cost=133958.10..133958.11 rows=1 width=16) (actual time=538.105..539.745 rows=1 loops=1) |
| -> Limit (cost=133958.10..133958.10 rows=1 width=24) (actual time=538.103..539.577 rows=1 loops=1) |
| -> Sort (cost=133958.10..134239.21 rows=112444 width=24) (actual time=538.094..539.562 rows=1 loops=1) |
| Sort Key: (max(record_field.updated_at)) DESC |
| Sort Method: top-N heapsort Memory: 25kB |
| -> Finalize GroupAggregate (cost=104908.23..133395.88 rows=112444 width=24) (actual time=497.679..531.461 rows=112011 loops=1) |
| Group Key: record_field.record_id |
| -> Gather Merge (cost=104908.23..131147.00 rows=224888 width=24) (actual time=497.673..515.269 rows=115025 loops=1) |
| Workers Planned: 2 |
| Workers Launched: 2 |
| -> Sort (cost=103908.21..104189.32 rows=112444 width=24) (actual time=481.532..484.165 rows=38342 loops=3) |
| Sort Key: record_field.record_id |
| Sort Method: quicksort Memory: 3654kB |
| Worker 0: Sort Method: quicksort Memory: 3332kB |
| Worker 1: Sort Method: quicksort Memory: 3556kB |
| -> Partial HashAggregate (cost=82357.15..92168.30 rows=112444 width=24) (actual time=452.259..458.555 rows=38342 loops=3) |
| Group Key: record_field.record_id |
| Planned Partitions: 4 Batches: 1 Memory Usage: 4881kB |
| Worker 0: Batches: 1 Memory Usage: 4369kB |
| Worker 1: Batches: 1 Memory Usage: 4625kB |
| -> Parallel Seq Scan on record_field (cost=0.00..29078.66 rows=741266 width=24) (actual time=0.296..355.722 rows=593013 loops=3) |
| Planning Time: 11.046 ms |
| Execution Time: 2919.026 ms |
Consultas por conta própria
explain analyse SELECT record_id FROM record_field GROUP BY record_id ORDER BY MAX(updated_at) desc LIMIT 1
| QUERY PLAN |
| ------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
| Limit (cost=133958.10..133958.10 rows=1 width=24) (actual time=547.848..550.734 rows=1 loops=1) |
| -> Sort (cost=133958.10..134239.21 rows=112444 width=24) (actual time=547.846..550.732 rows=1 loops=1) |
| Sort Key: (max(updated_at)) DESC |
| Sort Method: top-N heapsort Memory: 25kB |
| -> Finalize GroupAggregate (cost=104908.23..133395.88 rows=112444 width=24) (actual time=499.786..542.183 rows=112011 loops=1) |
| Group Key: record_id |
| -> Gather Merge (cost=104908.23..131147.00 rows=224888 width=24) (actual time=499.776..525.336 rows=115000 loops=1) |
| Workers Planned: 2 |
| Workers Launched: 2 |
| -> Sort (cost=103908.21..104189.32 rows=112444 width=24) (actual time=485.506..489.133 rows=38333 loops=3) |
| Sort Key: record_id |
| Sort Method: quicksort Memory: 3579kB |
| Worker 0: Sort Method: quicksort Memory: 3486kB |
| Worker 1: Sort Method: quicksort Memory: 3476kB |
| -> Partial HashAggregate (cost=82357.15..92168.30 rows=112444 width=24) (actual time=467.843..473.231 rows=38333 loops=3) |
| Group Key: record_id |
| Planned Partitions: 4 Batches: 1 Memory Usage: 4881kB |
| Worker 0: Batches: 1 Memory Usage: 4625kB |
| Worker 1: Batches: 1 Memory Usage: 4625kB |
| -> Parallel Seq Scan on record_field (cost=0.00..29078.66 rows=741266 width=24) (actual time=0.114..380.557 rows=593013 loops=3) |
| Planning Time: 2.946 ms |
| Execution Time: 554.166 ms |
explain analyse SELECT * FROM record_view WHERE id IN ('91381946-82fd-480b-a910-16e5e8ae0461')
| QUERY PLAN |
| ----------------------------------------------------------------------------------------------------------------------------------------------------------- |
| GroupAggregate (cost=84.26..84.66 rows=16 width=63) (actual time=8.467..8.515 rows=1 loops=1) |
| Group Key: rf.object_id, rf.record_id |
| -> Sort (cost=84.26..84.30 rows=16 width=86) (actual time=8.174..8.224 rows=16 loops=1) |
| Sort Key: rf.object_id |
| Sort Method: quicksort Memory: 26kB |
| -> Hash Join (cost=67.69..83.94 rows=16 width=86) (actual time=8.064..8.127 rows=16 loops=1) |
| Hash Cond: ((f.id = rf.field_id) AND (f.object_id = rf.object_id)) |
| -> Seq Scan on field f (cost=0.00..14.10 rows=410 width=64) (actual time=3.815..3.820 rows=25 loops=1) |
| -> Hash (cost=67.45..67.45 rows=16 width=63) (actual time=1.477..1.486 rows=16 loops=1) |
| Buckets: 1024 Batches: 1 Memory Usage: 10kB |
| -> Bitmap Heap Scan on record_field rf (cost=4.55..67.45 rows=16 width=63) (actual time=0.449..0.476 rows=16 loops=1) |
| Recheck Cond: (record_id = '91381946-82fd-480b-a910-16e5e8ae0461'::uuid) |
| Heap Blocks: exact=1 |
| -> Bitmap Index Scan on record_field_record_id_idx (cost=0.00..4.55 rows=16 width=0) (actual time=0.436..0.443 rows=16 loops=1) |
| Index Cond: (record_id = '91381946-82fd-480b-a910-16e5e8ae0461'::uuid) |
| Planning Time: 4.332 ms |
| Execution Time: 8.961 ms |
Embora o planejador "saiba" que sua subconsulta com LIMITE 1 retornará apenas 1 linha, ele aparentemente não sabe disso com certeza suficiente para criar um plano cuja correção depende desse fato.
Você pode contornar essa limitação mudando para igualdade, usando
id = (subquery)
em vez deid IN (subquery)
. A única diferença entre eles pode ser se a subconsulta retornar 0 linhas, embora eu ache que seria o mesmo nesse caso também.Observe que a subconsulta em si também é excessivamente complexa e lenta.
deve dar resultados idênticos aos
Mas o primeiro deve ser mais rápido em geral, e muito mais rápido se houver um índice btree começando com updated_at.