Esquema :
CREATE TABLE "applications" (
"id" SERIAL NOT NULL PRIMARY KEY,
"country" VARCHAR(2) NOT NULL,
"created" TIMESTAMP WITH TIME ZONE NOT NULL,
"is_preliminary" BOOLEAN NOT NULL,
"first_name" VARCHAR(128) NOT NULL,
"last_name" VARCHAR(128) NOT NULL,
"birth_number" VARCHAR(11) NULL
);
CREATE TABLE "persons" (
"id" UUID NOT NULL PRIMARY KEY,
"created" TIMESTAMP WITH TIME ZONE NOT NULL,
"modified" TIMESTAMP WITH TIME ZONE NOT NULL
);
ALTER TABLE "applications" ADD COLUMN "physical_person_id" UUID NULL;
CREATE INDEX "physical_person_id_idx" ON "applications" ("physical_person_id");
ALTER TABLE "applications" ADD CONSTRAINT "physical_person_id_fk" FOREIGN KEY ("physical_person_id") REFERENCES "persons" ("id") DEFERRABLE INITIALLY DEFERRED;
CREATE INDEX "country_created" ON "applications" (country, created);
Notas : O valor de persons.created
deve ser o mesmo do primeiro application.created
para esta pessoa, independentemente do is_preliminary
valor.
Consulta :
SELECT
to_char(created, 'YYYY-MM-DD') AS "Date",
COUNT(*) AS "Total",
COALESCE(
COUNT(*) FILTER(
WHERE applications.is_preliminary = false
AND NOT EXISTS(
SELECT 1
FROM applications A
WHERE A.physical_person_id = applications.physical_person_id
AND A.created < applications.created
LIMIT 1
)
)
, 0
) AS "Is first app"
FROM applications
WHERE
created >= '2017-01-01'::TIMESTAMP AND created < '2017-07-01'::TIMESTAMP
AND country = 'CZ'
GROUP BY 1
ORDER BY 1
Objetivo : Meu objetivo é ver o número total de solicitações versus o número de primeiras solicitações por dia em determinado país. Por primeira aplicação quero dizer um número de aplicações em um determinado dia, que foram registradas pela primeira vez e não tiveram nenhuma aplicação antes.
Problema : Desempenho da consulta. O número de linhas está crescendo e o desempenho agora não está em um bom nível.
Amostra de dados : aqui ( xz
saída compactada de pg_dump
)
Os seguintes planos de consulta são retirados do meu laptop (na produção não houve "fusão externa")
Plano de consulta :
GroupAggregate (cost=54186.11..2391221.59 rows=186832 width=48) (actual time=2137.029..3224.937 rows=181 loops=1)
Group Key: (to_char(applications.created, 'YYYY-MM-DD'::text))
-> Sort (cost=54186.11..54653.19 rows=186832 width=57) (actual time=2128.554..2370.798 rows=186589 loops=1)
Sort Key: (to_char(applications.created, 'YYYY-MM-DD'::text))
Sort Method: external merge Disk: 8176kB
-> Bitmap Heap Scan on applications (cost=5262.54..30803.18 rows=186832 width=57) (actual time=93.993..411.096 rows=186589 loops=1)
Recheck Cond: (((country)::text = 'CZ'::text) AND (created >= '2017-01-01 00:00:00'::timestamp without time zone) AND (created < '2017-07-01 00:00:00'::timestamp without time zone))
Heap Blocks: exact=19640
-> Bitmap Index Scan on country_created (cost=0.00..5215.83 rows=186832 width=0) (actual time=90.945..90.945 rows=186589 loops=1)
Index Cond: (((country)::text = 'CZ'::text) AND (created >= '2017-01-01 00:00:00'::timestamp without time zone) AND (created < '2017-07-01 00:00:00'::timestamp without time zone))
SubPlan 1
-> Index Scan using physical_person_id_idx on applications a (cost=0.43..72.77 rows=6 width=0) (actual time=0.006..0.006 rows=1 loops=127558)
Index Cond: (physical_person_id = applications.physical_person_id)
Filter: (created < applications.created)
Rows Removed by Filter: 0
Planning time: 0.235 ms
Execution time: 3261.530 ms
Pergunta : Como posso melhorar o desempenho da consulta? Suponho que seja possível se livrar da subconsulta em "É o primeiro aplicativo", mas não sei como.
Versão do PostgreSQL : 9.6.3
Plano de consulta após atualização de Evan Carroll:
Subquery Scan on t (cost=51624.73..2390836.50 rows=186782 width=52) (actual time=291.726..1129.435 rows=181 loops=1)
-> GroupAggregate (cost=51624.73..2388034.77 rows=186782 width=20) (actual time=291.707..1128.057 rows=181 loops=1)
Group Key: ((applications.created)::date)
-> Sort (cost=51624.73..52091.69 rows=186782 width=29) (actual time=280.283..334.391 rows=186589 loops=1)
Sort Key: ((applications.created)::date)
Sort Method: external merge Disk: 6720kB
-> Bitmap Heap Scan on applications (cost=5261.90..30801.54 rows=186782 width=29) (actual time=42.944..181.325 rows=186589 loops=1)
Recheck Cond: (((country)::text = 'CZ'::text) AND (created >= '2017-01-01 00:00:00+01'::timestamp with time zone) AND (created <= '2017-07-01 00:00:00+02'::timestamp with time zone))
Heap Blocks: exact=19640
-> Bitmap Index Scan on country_created (cost=0.00..5215.20 rows=186782 width=0) (actual time=40.003..40.003 rows=186589 loops=1)
Index Cond: (((country)::text = 'CZ'::text) AND (created >= '2017-01-01 00:00:00+01'::timestamp with time zone) AND (created <= '2017-07-01 00:00:00+02'::timestamp with time zone))
SubPlan 1
-> Index Scan using physical_person_id_idx on applications a (cost=0.43..72.77 rows=6 width=0) (actual time=0.006..0.006 rows=1 loops=127558)
Index Cond: (physical_person_id = applications.physical_person_id)
Filter: (created < applications.created)
Rows Removed by Filter: 0
Planning time: 0.232 ms
Execution time: 1145.761 ms
A consulta inicial sem is_first_app
coluna leva ~300 ms.
Plano de consulta para uma solução alternativa de Erwin Brandstetter:
GroupAggregate (cost=51356.14..55562.83 rows=186964 width=20) (actual time=562.470..620.993 rows=181 loops=1)
Group Key: ((a.created)::date)
Buffers: shared hit=2137 read=4491, temp read=2491 written=2485
-> Sort (cost=51356.14..51823.55 rows=186964 width=20) (actual time=562.216..592.226 rows=186589 loops=1)
Sort Key: ((a.created)::date)
Sort Method: external merge Disk: 2640kB
Buffers: shared hit=2137 read=4491, temp read=2491 written=2485
-> Hash Right Join (cost=13394.71..31149.19 rows=186964 width=20) (actual time=119.488..464.407 rows=186589 loops=1)
Hash Cond: ((p.id = a.physical_person_id) AND (p.created = a.created))
Join Filter: (NOT a.is_preliminary)
Buffers: shared hit=2137 read=4491, temp read=2159 written=2153
-> Seq Scan on persons p (cost=0.00..9003.04 rows=364404 width=24) (actual time=3.800..73.486 rows=364404 loops=1)
Buffers: shared hit=868 read=4491
-> Hash (cost=9311.25..9311.25 rows=186964 width=25) (actual time=115.213..115.213 rows=186589 loops=1)
Buckets: 65536 Batches: 4 Memory Usage: 2875kB
Buffers: shared hit=1269, temp written=681
-> Index Only Scan using app_country_created_person_preliminary_idx on applications a (cost=0.56..9311.25 rows=186964 width=25) (actual time=0.054..64.392 rows=186589 loops=1)
reated < '2017-07-01 00:00:00+02'::timestamp with time zone))
Heap Fetches: 0
Buffers: shared hit=1269
Planning time: 0.401 ms
Execution time: 628.100 ms