Para representar uma matriz 2D esparsa de valores flutuantes, uso uma tabela de banco de dados Postgres 13 simples que armazena um único valor de célula de matriz por linha. Eu gostaria de melhorar uma consulta que calcula a versão densa de uma matriz como uma matriz 2D (por exemplo real[][]
, ), com células vazias padronizadas para zero.
O esquema da tabela matriz fica assim:
Table "public.similarity_score"
Column | Type | Collation | Nullable | Default
------------------+---------+-----------+----------+---------
query_object_id | bigint | | not null |
target_object_id | bigint | | not null |
score | real | | not null |
similarity_id | integer | | not null |
Várias matrizes podem ser armazenadas na mesma tabela ( similarity_id
) e para cada linha ( query_object_id
) e coluna ( target_object_id
), um único valor pode ser armazenado ( score
).
Além de muitos outros campos, a similarity
(conforme referenciado pela tabela acima) contém uma lista de todas as opções de linha e coluna possíveis:
Table "public.similarity"
Column | Type | Collation | Nullable | Default
------------------------+--------------------------+-----------+----------+-------------------
id | integer | | not null | generated by default as identity
query_objects | bigint[] | | |
target_objects | bigint[] | | |
…
Ao montar a matriz densa para um similarity
, gostaria de ter as linhas e colunas na ordem dessas query_objects
e target_objects
, ou seja, as linhas de resultado podem ser mapeadas para esses arrays. Atualmente, faço isso usando a seguinte consulta:
WITH qt AS MATERIALIZED (
SELECT query.id as query_id, query.o as query_o,
target.id AS target_id, target.o AS target_o
FROM similarity s,
UNNEST(s.query_objects) WITH ORDINALITY AS query(id, o),
UNNEST(s.target_objects) WITH ORDINALITY AS target(id, o)
WHERE s.id = %(similarity_id)s
),
scores AS (
SELECT qt.*, COALESCE(nss.score, 0) AS score
FROM qt
LEFT JOIN similarity_score nss
ON nss.query_object_id = qt.query_id
AND nss.target_object_id = qt.target_id
AND nss.similarity_id = %(similarity_id)s
),
score_rows AS (
SELECT query_o, array_agg(score ORDER BY target_o) as scores
FROM scores
GROUP BY query_o
)
SELECT array_agg(scores ORDER BY query_o)
FROM score_rows;
O plano para uma matriz de elementos de 1025*9097 é assim:
Aggregate (cost=485.99..486.01 rows=1 width=32) (actual time=53124.869..53124.874 rows=1 loops=1)
Buffers: shared hit=26513337
CTE qt
-> Nested Loop (cost=0.29..6.91 rows=100 width=32) (actual time=0.781..2812.748 rows=9324425 loops=1)
Buffers: shared hit=9232
-> Nested Loop (cost=0.28..2.91 rows=10 width=36) (actual time=0.085..1.624 rows=1025 loops=1)
Buffers: shared hit=7
-> Index Scan using nblast_similarity_pkey on nblast_similarity s (cost=0.28..2.51 rows=1 width=51) (actual time=0.010..0.012 rows=1 loops=1)
Index Cond: (id = 2282)
Buffers: shared hit=3
-> Function Scan on unnest query (cost=0.00..0.20 rows=10 width=16) (actual time=0.075..0.935 rows=1025 loops=1)
Buffers: shared hit=4
-> Function Scan on unnest target (cost=0.00..0.20 rows=10 width=16) (actual time=0.662..1.803 rows=9097 loops=1025)
Buffers: shared hit=9225
-> GroupAggregate (cost=473.82..476.82 rows=100 width=40) (actual time=50217.415..53062.468 rows=1025 loops=1)
Group Key: qt.query_o
Buffers: shared hit=26513337
-> Sort (cost=473.82..474.07 rows=100 width=20) (actual time=50214.748..50650.740 rows=9324425 loops=1)
Sort Key: qt.query_o
Sort Method: quicksort Memory: 832749kB
Buffers: shared hit=26513337
-> Nested Loop Left Join (cost=3.52..470.50 rows=100 width=20) (actual time=0.790..48459.027 rows=9324425 loops=1)
Buffers: shared hit=26513337
-> CTE Scan on qt (cost=0.00..4.00 rows=100 width=32) (actual time=0.782..5483.806 rows=9324425 loops=1)
Buffers: shared hit=9232
-> Bitmap Heap Scan on nblast_similarity_score nss (cost=3.52..4.65 rows=1 width=20) (actual time=0.004..0.004 rows=0 loops=9324425)
Recheck Cond: ((target_object_id = qt.target_id) AND (query_object_id = qt.query_id))
Filter: (similarity_id = 2282)
Heap Blocks: exact=78412
Buffers: shared hit=26504105
-> BitmapAnd (cost=3.52..3.52 rows=1 width=0) (actual time=0.003..0.003 rows=0 loops=9324425)
Buffers: shared hit=26425693
-> Bitmap Index Scan on nblast_similarity_score_target_object_id (cost=0.00..1.43 rows=21 width=0) (actual time=0.002..0.002 rows=9 loops=9324425)
Index Cond: (target_object_id = qt.target_id)
Buffers: shared hit=18648850
-> Bitmap Index Scan on nblast_similarity_score_query_object_id (cost=0.00..1.84 rows=77 width=0) (actual time=0.003..0.003 rows=76 loops=3871425)
Index Cond: (query_object_id = qt.query_id)
Buffers: shared hit=7776843
Planning:
Buffers: shared hit=2
Planning Time: 0.474 ms
Execution Time: 53362.434 ms
(42 rows)
Time: 53363.526 ms (00:53.364)
Seria ótimo se esse tempo pudesse ser reduzido dos atuais 53 segundos. Eu o comparo com alternativas de armazenar a matriz densa explicitamente como real[][]
(6 segundos) ou large object
(4,5 segundos), que são obviamente mais rápidas ao ler os dados inteiramente (vem com desvantagens como limitações de tamanho e acesso SQL lento a entradas individuais). O plano acima sugere que algumas estimativas de tamanho de resultados são muito baixas, talvez levando a planos ruins (sim, é ANALYZED
e a meta de estatísticas padrão é 5000)?
Por contexto: esta consulta é realizada a partir de um back-end Django (Python) e retornada a um front-end como resposta a uma solicitação HTTP.
Edit: Conforme sugerido por Laurenz Albe, algumas melhorias são possíveis com melhores estimativas. Eu tentei tanto uma large_unnest
função personalizada quanto desabilitar junções de loop aninhadas para a consulta. Ambos são sugestões úteis e levam a planos melhores. A large_unset
função customizada leva a um tempo de execução de 19 segundos :
Aggregate (cost=1348212.99..1348213.01 rows=1 width=32) (actual time=18804.368..18804.374 rows=1 loops=1)
Buffers: shared hit=9732
CTE qt
-> Nested Loop (cost=0.78..160083.01 rows=4000000 width=32) (actual time=2.180..3113.809 rows=9324425 loops=1)
Buffers: shared hit=9232
-> Nested Loop (cost=0.53..82.76 rows=2000 width=36) (actual time=0.391..2.654 rows=1025 loops=1)
Buffers: shared hit=7
-> Index Scan using nblast_similarity_pkey on nblast_similarity s (cost=0.28..2.51 rows=1 width=51) (actual time=0.018..0.021 rows=1 loops=1)
Index Cond: (id = 2282)
Buffers: shared hit=3
-> Function Scan on large_unnest query (cost=0.25..40.25 rows=2000 width=16) (actual time=0.365..1.633 rows=1025 loops=1)
Buffers: shared hit=4
-> Function Scan on large_unnest target (cost=0.25..40.25 rows=2000 width=16) (actual time=1.834..2.252 rows=9097 loops=1025)
Buffers: shared hit=9225
-> GroupAggregate (cost=1158120.98..1188125.48 rows=200 width=40) (actual time=15175.754..18745.797 rows=1025 loops=1)
Group Key: scores.query_o
Buffers: shared hit=9732
-> Sort (cost=1158120.98..1168120.98 rows=4000000 width=20) (actual time=15172.036..15584.416 rows=9324425 loops=1)
Sort Key: scores.query_o
Sort Method: quicksort Memory: 1121687kB
Buffers: shared hit=9732
-> Subquery Scan on scores (cost=607270.06..719489.61 rows=4000000 width=20) (actual time=10514.543..13821.419 rows=9324425 loops=1)
Buffers: shared hit=9732
-> Limit (cost=607270.06..639489.61 rows=4000000 width=36) (actual time=10514.536..13183.841 rows=9324425 loops=1)
Buffers: shared hit=9732
-> Merge Right Join (cost=607270.06..639489.61 rows=4000000 width=36) (actual time=10261.048..12382.913 rows=9324425 loops=1)
Merge Cond: ((nss.query_object_id = qt.query_id) AND (nss.target_object_id = qt.target_id))
Buffers: shared hit=9732
-> Sort (cost=8638.69..8834.72 rows=78412 width=20) (actual time=33.814..43.757 rows=78412 loops=1)
Sort Key: nss.query_object_id, nss.target_object_id
Sort Method: quicksort Memory: 9198kB
Buffers: shared hit=500
-> Seq Scan on nblast_similarity_score nss (cost=0.00..2264.27 rows=78412 width=20) (actual time=0.019..6.504 rows=78412 loops=1)
Filter: (similarity_id = 2282)
Buffers: shared hit=500
-> Sort (cost=598631.37..608631.37 rows=4000000 width=32) (actual time=10227.173..11103.484 rows=9324425 loops=1)
Sort Key: qt.query_id, qt.target_id
Sort Method: quicksort Memory: 1121687kB
Buffers: shared hit=9232
-> CTE Scan on qt (cost=0.00..160000.00 rows=4000000 width=32) (actual time=2.185..5524.025 rows=9324425 loops=1)
Buffers: shared hit=9232
Planning:
Buffers: shared hit=2
Planning Time: 0.680 ms
JIT:
Functions: 36
Options: Inlining true, Optimization true, Expressions true, Deforming true
Timing: Generation 1.907 ms, Inlining 6.672 ms, Optimization 151.239 ms, Emission 95.707 ms, Total 255.524 ms
Execution Time: 19135.939 ms
(49 rows)
Time: 19137.368 ms (00:19.137)
Curiosamente, desabilitar completamente a junção de loop aninhado para a consulta e remover MATERIALIZED
do primeiro CTE leva a um plano ainda melhor com um tempo de execução de 8 segundos , mas somente se o original UNNEST
for usado:
Aggregate (cost=20000004304.55..20000004304.57 rows=1 width=32) (actual time=8460.589..8460.594 rows=1 loops=1)
Buffers: shared hit=9732
-> GroupAggregate (cost=20000004303.34..20000004304.32 rows=10 width=40) (actual time=5628.507..8402.370 rows=1025 loops=1)
Group Key: query.o
Buffers: shared hit=9732
-> Sort (cost=20000004303.34..20000004303.59 rows=100 width=20) (actual time=5625.952..6067.150 rows=9324425 loops=1)
Sort Key: query.o
Sort Method: quicksort Memory: 832749kB
Buffers: shared hit=9732
-> Hash Left Join (cost=20000004224.85..20000004300.02 rows=100 width=20) (actual time=201.473..4356.689 rows=9324425 loops=1)
Hash Cond: ((query.id = nss.query_object_id) AND (target.id = nss.target_object_id))
Buffers: shared hit=9732
-> Nested Loop (cost=20000000000.28..20000000006.91 rows=100 width=32) (actual time=181.110..2089.076 rows=9324425 loops=1)
Buffers: shared hit=9232
-> Nested Loop (cost=10000000000.28..10000000002.91 rows=10 width=36) (actual time=180.056..181.406 rows=1025 loops=1)
Buffers: shared hit=7
-> Index Scan using nblast_similarity_pkey on nblast_similarity s (cost=0.28..2.51 rows=1 width=51) (actual time=179.859..179.861 rows=1 loops=1)
Index Cond: (id = 2282)
Buffers: shared hit=3
-> Function Scan on unnest query (cost=0.00..0.20 rows=10 width=16) (actual time=0.186..0.939 rows=1025 loops=1)
Buffers: shared hit=4
-> Function Scan on unnest target (cost=0.00..0.20 rows=10 width=16) (actual time=0.596..1.192 rows=9097 loops=1025)
Buffers: shared hit=9225
-> Hash (cost=2264.27..2264.27 rows=78412 width=20) (actual time=20.286..20.287 rows=78412 loops=1)
Buckets: 131072 Batches: 1 Memory Usage: 5313kB
Buffers: shared hit=500
-> Seq Scan on nblast_similarity_score nss (cost=0.00..2264.27 rows=78412 width=20) (actual time=0.015..6.740 rows=78412 loops=1)
Filter: (similarity_id = 2282)
Buffers: shared hit=500
Planning:
Buffers: shared hit=2
Planning Time: 0.650 ms
JIT:
Functions: 29
Options: Inlining true, Optimization true, Expressions true, Deforming true
Timing: Generation 1.608 ms, Inlining 6.629 ms, Optimization 102.178 ms, Emission 71.173 ms, Total 181.588 ms
Execution Time: 8519.436 ms
(37 rows)
Time: 8520.675 ms (00:08.521)
Eu suponho que não há nenhuma maneira real de contornar a classificação embora.