AskOverflow.Dev

AskOverflow.Dev Logo AskOverflow.Dev Logo

AskOverflow.Dev Navigation

  • 主页
  • 系统&网络
  • Ubuntu
  • Unix
  • DBA
  • Computer
  • Coding
  • LangChain

Mobile menu

Close
  • 主页
  • 系统&网络
    • 最新
    • 热门
    • 标签
  • Ubuntu
    • 最新
    • 热门
    • 标签
  • Unix
    • 最新
    • 标签
  • DBA
    • 最新
    • 标签
  • Computer
    • 最新
    • 标签
  • Coding
    • 最新
    • 标签
主页 / user-137478

Jukurrpa's questions

Martin Hope
Jukurrpa
Asked: 2025-04-07 23:16:01 +0800 CST

在较小的 Postgres 表上,不同的计划和较慢的查询

  • 5

在两个仅行数不同的表(约 7.8M vs 约 1.4M)上运行相同的查询,会得到两个不同的计划,这听起来很合理。但是在较小的表上执行速度要慢 4 到 5 倍,我想知道原因。

表格定义如下:

   Column   |           Type           | Collation | Nullable | Default 
------------+--------------------------+-----------+----------+---------
 image_id   | bigint                   |           | not null | 
 h3_cell    | h3index                  |           | not null | 
 created_at | timestamp with time zone |           | not null | 
 location   | geometry(PointZ,4326)    |           | not null | 
Indexes:
    "images_a_pkey" PRIMARY KEY, btree (image_id)
    "images_a_created_at_idx" btree (created_at)
    "images_a_h3_cell_idx" btree (h3_cell)

查询如下

h3_cells AS (
    SELECT UNNEST(h3_linestring_to_cells(:line_string, 13, 1)) AS cell
)
SELECT COUNT(*)
FROM images
JOIN h3_cells hc ON images.h3_cell = hc.cell

该h3_linestring_to_cells()函数返回一个数组,h3index其大小在某些情况下可能高达数万个值。在下面的示例中,它返回的值约为 50,000 个。

在具有 7.8M 行的表中,计划和执行条目如下(为简洁起见,删除了数组值):

Aggregate  (cost=347404.47..347404.48 rows=1 width=8) (actual time=74.311..74.312 rows=1 loops=1)
  Buffers: shared hit=154681 read=328
  I/O Timings: shared read=1.362
  ->  Nested Loop  (cost=0.43..346724.23 rows=272093 width=0) (actual time=0.051..74.246 rows=833 loops=1)
        Buffers: shared hit=154681 read=328
        I/O Timings: shared read=1.362
        ->  ProjectSet  (cost=0.00..256.90 rows=51377 width=8) (actual time=0.002..4.113 rows=51377 loops=1)
              ->  Result  (cost=0.00..0.01 rows=1 width=0) (actual time=0.000..0.001 rows=1 loops=1)
        ->  Index Only Scan using images_a_h3_cell_idx on images_a  (cost=0.43..6.68 rows=5 width=8) (actual time=0.001..0.001 rows=0 loops=51377)
              Index Cond: (h3_cell = (unnest('{...}'::h3index[])))
              Heap Fetches: 354
              Buffers: shared hit=154681 read=328
              I/O Timings: shared read=1.362
Planning Time: 139.421 ms
Execution Time: 74.345 ms

而在较小的 1.4M 行表上,计划和执行如下:

Aggregate  (cost=105040.78..105040.79 rows=1 width=8) (actual time=327.586..327.587 rows=1 loops=1)
  Buffers: shared hit=148358 read=6315 written=41
  I/O Timings: shared read=26.521 write=0.327
  ->  Merge Join  (cost=4791.05..104802.14 rows=95455 width=0) (actual time=321.174..327.575 rows=118 loops=1)
        Merge Cond: (ptilmi.h3_cell = (unnest('{...}'::h3index[])))
        Buffers: shared hit=148358 read=6315 written=41
        I/O Timings: shared read=26.521 write=0.327
        ->  Index Only Scan using images_b_h3_cell_idx on images_b ptilmi  (cost=0.43..95041.10 rows=1415438 width=8) (actual time=0.026..245.897 rows=964987 loops=1)
              Heap Fetches: 469832
              Buffers: shared hit=148358 read=6315 written=41
              I/O Timings: shared read=26.521 write=0.327
        ->  Sort  (cost=4790.62..4919.07 rows=51377 width=8) (actual time=11.181..13.551 rows=51390 loops=1)
              Sort Key: (unnest('{...}'::h3index[]))
              Sort Method: quicksort  Memory: 1537kB
              ->  ProjectSet  (cost=0.00..256.90 rows=51377 width=8) (actual time=0.002..3.716 rows=51377 loops=1)
                    ->  Result  (cost=0.00..0.01 rows=1 width=0) (actual time=0.000..0.001 rows=1 loops=1)
Planning Time: 146.617 ms
Execution Time: 327.626 ms

对于较小的源数组(例如大小为 25,000),较小表上的计划更改为第一个(嵌套循环),并且其执行时间变得更符合预期(比较大的表更快)。

我不明白是什么促使计划改变为效率更低的计划。

请注意,我使用的是 CTE+JOIN 而不是 eg WHERE h3_cell = ANY(h3_linestring_to_cells(:line_string, 13, 1)),因为生成的数组通常很大,而且我发现在这种情况下前者通常更高效。有趣的是,对于包含 50,000 个条目的数组,这种= ANY()方法在较小的表上速度更快,而对于包含 25,000 个条目的数组,这种方法速度较慢。

postgresql
  • 1 个回答
  • 25 Views
Martin Hope
Jukurrpa
Asked: 2021-06-18 09:31:20 +0800 CST

在 WHERE 中使用子查询会使查询非常慢

  • 0

由于我无法弄清楚的原因,我有这个相当基本的查询非常慢:

SELECT s.id 
FROM segments s
WHERE
    ST_DWithin(
        s.geom::GEOGRAPHY,
        ST_Envelope((SELECT ST_COLLECT(s2.geom) FROM segments s2 WHERE s2.id IN (407820025,  407820024,  407817407,  407817408,  407816908,  407816909,  407817413,  407817414,  407817409,  407817410,  407817405,  407817406,  407816905,  407816907,  407817412,  407817411,  407816906,  407816904,  407816764,  407816765)))::GEOGRAPHY,
        30
    );

                                                                                                                           QUERY PLAN                                                                                                                            
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Seq Scan on segments s  (cost=55.58..48476381.06 rows=7444984 width=4)
   Filter: st_dwithin((geom)::geography, (st_astext(st_envelope($0)))::geography, '30'::double precision)
   InitPlan 1 (returns $0)
     ->  Aggregate  (cost=55.57..55.58 rows=1 width=32)
           ->  Index Scan using segments_pkey on segments s2  (cost=0.44..55.52 rows=20 width=113)
                 Index Cond: (id = ANY ('{407820025,407820024,407817407,407817408,407816908,407816909,407817413,407817414,407817409,407817410,407817405,407817406,407816905,407816907,407817412,407817411,407816906,407816904,407816764,407816765}'::integer[]))

我真正感到困惑的是带有子查询的 ST_Envelope 本身非常快

SELECT ST_Envelope((SELECT ST_COLLECT(geom) FROM segments WHERE id IN (407820025,  407820024,  407817407,  407817408,  407816908,  407816909,  407817413,  407817414,  407817409,  407817410,  407817405,  407817406,  407816905,  407816907,  407817412,  407817411,  407816906,  407816904,  407816764,  407816765)))::GEOGRAPHY;

                                                                                                                           QUERY PLAN                                                                                                                            
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Result  (cost=55.58..55.60 rows=1 width=32)
   InitPlan 1 (returns $0)
     ->  Aggregate  (cost=55.57..55.58 rows=1 width=32)
           ->  Index Scan using segments_pkey on segments  (cost=0.44..55.52 rows=20 width=113)
                 Index Cond: (id = ANY ('{407820025,407820024,407817407,407817408,407816908,407816909,407817413,407817414,407817409,407817410,407817405,407817406,407816905,407816907,407817412,407817411,407816906,407816904,407816764,407816765}'::integer[]))

如果我插入 ST_Envelope 的结果,主查询也是如此

SELECT id 
FROM segments
WHERE
    st_dwithin(
        geom::geography,
        '0103000020E61000000100000005000000C87B6E0D8FB85EC04BFD8462B9C34640C87B6E0D8FB85EC0929B35C16DC44640BBF8DDA6F2B75EC0929B35C16DC44640BBF8DDA6F2B75EC04BFD8462B9C34640C87B6E0D8FB85EC04BFD8462B9C34640'::GEOGRAPHY,
        30
    );

                                                                                                                                                                                                                                                                                QUERY PLAN                                                                                                                                                                                                                                                                                
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Index Scan using segments_geom_geo_idx on segments  (cost=0.42..4.82 rows=1 width=4)
   Index Cond: ((geom)::geography && '0103000020E61000000100000005000000C87B6E0D8FB85EC04BFD8462B9C34640C87B6E0D8FB85EC0929B35C16DC44640BBF8DDA6F2B75EC0929B35C16DC44640BBF8DDA6F2B75EC04BFD8462B9C34640C87B6E0D8FB85EC04BFD8462B9C34640'::geography)
   Filter: (('0103000020E61000000100000005000000C87B6E0D8FB85EC04BFD8462B9C34640C87B6E0D8FB85EC0929B35C16DC44640BBF8DDA6F2B75EC0929B35C16DC44640BBF8DDA6F2B75EC04BFD8462B9C34640C87B6E0D8FB85EC04BFD8462B9C34640'::geography && _st_expand((geom)::geography, '30'::double precision)) AND _st_dwithin((geom)::geography, '0103000020E61000000100000005000000C87B6E0D8FB85EC04BFD8462B9C34640C87B6E0D8FB85EC0929B35C16DC44640BBF8DDA6F2B75EC0929B35C16DC44640BBF8DDA6F2B75EC04BFD8462B9C34640C87B6E0D8FB85EC04BFD8462B9C34640'::geography, '30'::double precision, true))

Postgres 不应该计算一次 ST_Envelope 然后将其用于 WHERE 条件,有效地执行我手动执行的操作吗?我也不明白为什么在原始查询中没有使用索引来执行过滤器。

我尝试将子查询放在 CTE 中,但这并没有解决问题。

postgresql subquery
  • 1 个回答
  • 131 Views

Sidebar

Stats

  • 问题 205573
  • 回答 270741
  • 最佳答案 135370
  • 用户 68524
  • 热门
  • 回答
  • Marko Smith

    连接到 PostgreSQL 服务器:致命:主机没有 pg_hba.conf 条目

    • 12 个回答
  • Marko Smith

    如何让sqlplus的输出出现在一行中?

    • 3 个回答
  • Marko Smith

    选择具有最大日期或最晚日期的日期

    • 3 个回答
  • Marko Smith

    如何列出 PostgreSQL 中的所有模式?

    • 4 个回答
  • Marko Smith

    列出指定表的所有列

    • 5 个回答
  • Marko Smith

    如何在不修改我自己的 tnsnames.ora 的情况下使用 sqlplus 连接到位于另一台主机上的 Oracle 数据库

    • 4 个回答
  • Marko Smith

    你如何mysqldump特定的表?

    • 4 个回答
  • Marko Smith

    使用 psql 列出数据库权限

    • 10 个回答
  • Marko Smith

    如何从 PostgreSQL 中的选择查询中将值插入表中?

    • 4 个回答
  • Marko Smith

    如何使用 psql 列出所有数据库和表?

    • 7 个回答
  • Martin Hope
    Jin 连接到 PostgreSQL 服务器:致命:主机没有 pg_hba.conf 条目 2014-12-02 02:54:58 +0800 CST
  • Martin Hope
    Stéphane 如何列出 PostgreSQL 中的所有模式? 2013-04-16 11:19:16 +0800 CST
  • Martin Hope
    Mike Walsh 为什么事务日志不断增长或空间不足? 2012-12-05 18:11:22 +0800 CST
  • Martin Hope
    Stephane Rolland 列出指定表的所有列 2012-08-14 04:44:44 +0800 CST
  • Martin Hope
    haxney MySQL 能否合理地对数十亿行执行查询? 2012-07-03 11:36:13 +0800 CST
  • Martin Hope
    qazwsx 如何监控大型 .sql 文件的导入进度? 2012-05-03 08:54:41 +0800 CST
  • Martin Hope
    markdorison 你如何mysqldump特定的表? 2011-12-17 12:39:37 +0800 CST
  • Martin Hope
    Jonas 如何使用 psql 对 SQL 查询进行计时? 2011-06-04 02:22:54 +0800 CST
  • Martin Hope
    Jonas 如何从 PostgreSQL 中的选择查询中将值插入表中? 2011-05-28 00:33:05 +0800 CST
  • Martin Hope
    Jonas 如何使用 psql 列出所有数据库和表? 2011-02-18 00:45:49 +0800 CST

热门标签

sql-server mysql postgresql sql-server-2014 sql-server-2016 oracle sql-server-2008 database-design query-performance sql-server-2017

Explore

  • 主页
  • 问题
    • 最新
    • 热门
  • 标签
  • 帮助

Footer

AskOverflow.Dev

关于我们

  • 关于我们
  • 联系我们

Legal Stuff

  • Privacy Policy

Language

  • Pt
  • Server
  • Unix

© 2023 AskOverflow.DEV All Rights Reserve