AskOverflow.Dev

AskOverflow.Dev Logo AskOverflow.Dev Logo

AskOverflow.Dev Navigation

  • 主页
  • 系统&网络
  • Ubuntu
  • Unix
  • DBA
  • Computer
  • Coding
  • LangChain

Mobile menu

Close
  • 主页
  • 系统&网络
    • 最新
    • 热门
    • 标签
  • Ubuntu
    • 最新
    • 热门
    • 标签
  • Unix
    • 最新
    • 标签
  • DBA
    • 最新
    • 标签
  • Computer
    • 最新
    • 标签
  • Coding
    • 最新
    • 标签
主页 / dba / 问题 / 178344
Accepted
Divick
Divick
Asked: 2017-07-09 00:24:09 +0800 CST2017-07-09 00:24:09 +0800 CST 2017-07-09 00:24:09 +0800 CST

与单列排序相比,两列排序非常慢

  • 772

我正在使用 Postgres,我看到两列上的 order by 与仅一列上的 order by 相比,我的查询慢了几个数量级。我正在考虑的表中有大约 2950 万行。

以下是三个不同查询的结果:

仅在 id 上订购:

EXPLAIN ANALYZE SELECT "api_meterdata"."id", "api_meterdata"."meter_id", "api_meterdata"."datetime", "api_meter"."id" FROM "api_meterdata" INNER JOIN "api_meter" ON ( "api_meterdata"."meter_id" = "api_meter"."id" ) ORDER BY "api_meterdata"."id" DESC LIMIT 100;
                                                                               QUERY PLAN                                                            

------------------------------------------------------------------------------------------------------------------------------------------------------------------------  
 Limit  (cost=0.44..321.49 rows=100 width=20) (actual time=0.407..30.424 rows=100 loops=1)    
   ->  Nested Loop  (cost=0.44..94824299.30 rows=29535145 width=20) (actual time=0.402..30.090 rows=100 loops=1)
         Join Filter: (api_meterdata.meter_id = api_meter.id)
         Rows Removed by Join Filter: 8147
         ->  Index Scan Backward using api_meterdata_pkey on api_meterdata  (cost=0.44..58053041.74 rows=29535145 width=16) (actual time=0.103..0.867 rows=100 loops=1)
         ->  Materialize  (cost=0.00..2.25 rows=83 width=4) (actual time=0.002..0.144 rows=82 loops=100)
               ->  Seq Scan on api_meter  (cost=0.00..1.83 rows=83 width=4) (actual time=0.008..0.153 rows=83 loops=1)  Planning time:
0.491 ms  Execution time: 30.701 ms (9 rows)

仅在日期时间订购:

EXPLAIN ANALYZE SELECT "api_meterdata"."id", "api_meterdata"."meter_id", "api_meterdata"."datetime", "api_meter"."id" FROM "api_meterdata" INNER JOIN "api_meter" ON ( "api_meterdata"."meter_id" = "api_meter"."id" ) ORDER BY "api_meterdata"."datetime" ASC LIMIT 100;
                                                                               QUERY PLAN                                                                                
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.44..321.50 rows=100 width=20) (actual time=1.245..37.054 rows=100 loops=1)
   ->  Nested Loop  (cost=0.44..94825493.68 rows=29535313 width=20) (actual time=1.238..36.652 rows=100 loops=1)
         Join Filter: (api_meterdata.meter_id = api_meter.id)
         Rows Removed by Join Filter: 8148
         ->  Index Scan using api_meterdata_datetime_index on api_meterdata  (cost=0.44..58054026.95 rows=29535313 width=16) (actual time=0.851..1.501 rows=100 loops=1)
         ->  Materialize  (cost=0.00..2.25 rows=83 width=4) (actual time=0.002..0.172 rows=82 loops=100)
               ->  Seq Scan on api_meter  (cost=0.00..1.83 rows=83 width=4) (actual time=0.013..0.192 rows=83 loops=1)
 Planning time: 0.483 ms
 Execution time: 37.340 ms
(9 rows)

在日期时间和 id 上按顺序排列:

EXPLAIN ANALYZE SELECT "api_meterdata"."id", "api_meterdata"."meter_id", "api_meterdata"."datetime", "api_meter"."id" FROM "api_meterdata" INNER JOIN "api_meter" ON ( "api_meterdata"."meter_id" = "api_meter"."id" ) ORDER BY "api_meterdata"."datetime" ASC, "api_meterdata"."id" DESC LIMIT 100;
                                                                    QUERY PLAN                                                                    
--------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=3064122.28..3064122.53 rows=100 width=20) (actual time=146772.167..146772.372 rows=100 loops=1)
   ->  Sort  (cost=3064122.28..3137955.90 rows=29533446 width=20) (actual time=146772.164..146772.242 rows=100 loops=1)
         Sort Key: api_meterdata.datetime, api_meterdata.id
         Sort Method: top-N heapsort  Memory: 32kB
         ->  Hash Join  (cost=2.87..1935375.21 rows=29533446 width=20) (actual time=0.394..113349.364 rows=29535544 loops=1)
               Hash Cond: (api_meterdata.meter_id = api_meter.id)
               ->  Seq Scan on api_meterdata  (cost=0.00..1529287.46 rows=29533446 width=16) (actual time=0.220..47537.991 rows=29535544 loops=1)
               ->  Hash  (cost=1.83..1.83 rows=83 width=4) (actual time=0.160..0.160 rows=83 loops=1)
                     Buckets: 1024  Batches: 1  Memory Usage: 3kB
                     ->  Seq Scan on api_meter  (cost=0.00..1.83 rows=83 width=4) (actual time=0.005..0.071 rows=83 loops=1)
 Planning time: 0.290 ms
 Execution time: 146772.500 ms
(12 rows)

这是表上的索引:

SELECT * FROM pg_indexes WHERE tablename = 'api_meterdata';
 schemaname |   tablename   |                  indexname                   | tablespace |                                                      indexdef                                       

------------+---------------+----------------------------------------------+------------+-----------------------------------------------------------------------------------------------------
---------------
 public     | api_meterdata | api_meterdata_meter_id_36fe63013b50049f_uniq |            | CREATE UNIQUE INDEX api_meterdata_meter_id_36fe63013b50049f_uniq ON api_meterdata USING btree (meter
_id, datetime)
 public     | api_meterdata | api_meterdata_pkey                           |            | CREATE UNIQUE INDEX api_meterdata_pkey ON api_meterdata USING btree (id)
 public     | api_meterdata | api_meterdata_f7a5de1d                       |            | CREATE INDEX api_meterdata_f7a5de1d ON api_meterdata USING btree (meter_id)
 public     | api_meterdata | api_meterdata_datetime_index                 |            | CREATE INDEX api_meterdata_datetime_index ON api_meterdata USING btree (datetime)
(4 rows)

我可以看到这是花费时间最长的排序步骤。但不确定为什么。

postgresql performance
  • 1 1 个回答
  • 2814 Views

1 个回答

  • Voted
  1. Best Answer
    joanolo
    2017-07-09T07:22:37+08:002017-07-09T07:22:37+08:00

    时间差异的原因是由于几个事实:

    1. 您的查询没有WHERE过滤掉要检索的结果的子句。
    2. 查询的速度取决于您确实有一个LIMIT子句这一事实。
    3. 如果计划程序可以找到一个索引,通过它可以按照您指定的顺序ORDER BY检索查询的行,它将开始一个接一个地选择它们,直到它读取 100(您的LIMIT子句指定的数字。索引,如果它是多列,则需要具有相同的列,以相同的顺序,并具有相同的 ASC、DESC 排序方向(或全部颠倒)。
    4. 如果规划器不能有一个索引来完成这个角色,它必须执行一个排序步骤,将所有行按顺序放在一个(临时的、虚拟的)表中,然后检索前 100 行。

    需要检索所有数据(不仅仅是已经排序的 100 行),必须将所有数据连接起来,然后进行排序步骤,这才是导致性能出现如此大差异的原因。使用explain.depesz.com可以清楚地看到这一点。


    在此处的dbfiddle查找您的场景模拟,涵盖并解释了不同的案例,并考虑了来自 @ypercube 的建议以用于另一个索引。另请注意,您的某些索引是多余的。

    适用于您的场景的 DDL,以及一些模拟数据:

    CREATE TABLE api_meter
    (
        id INTEGER PRIMARY KEY
    ) ;
    INSERT INTO 
        api_meter
        (id)
    SELECT
        generate_series(1, 83) ;
    

    ...对于保存你的meter_data的表

    CREATE TABLE api_meterdata
    (
        id serial /* integer */ PRIMARY KEY,
        meter_id integer REFERENCES api_meter(id),
        datetime timestamp NOT NULL default now()
    ) ;
    
    -- The PK will have made an implicit index ON (id)
    
    -- Index on (meter_id, datetime); which is probably the *NATURAL KEY*
    CREATE UNIQUE INDEX api_meterdata_meter_id_datetime_unique 
        ON api_meterdata (meter_id, datetime) ;
    
    -- The following index is redundant, the column meter_id is already the first in
    -- the previous one.
    -- CREATE INDEX api_meterdata_meter_id_idx 
    --    ON api_meterdata (meter_id) ;
    
    CREATE INDEX api_meterdata_datetime_idx 
        ON api_meterdata (datetime) ;
    

    ...一些模拟数据(648001 行,使其真实)。数据少于您拥有的数据,但是如果我尝试添加更多数据,DBFiddle 会达到其极限

    INSERT INTO 
        api_meterdata
        (meter_id, datetime)
    SELECT
        random()*82+1, d
    FROM
        generate_series(timestamp '2017-01-01', timestamp '2017-01-31', 
                        interval '4 second') AS s(d);
    
    -- Make sure statistics are good
    ANALYZE api_meterdata;
    ANALYZE api_meter;
    

    分析您的第一个查询

    -- This query doesn't have a WHERE clause, so, indexes will be used based on 
    -- ORDER BY + LIMIT (and, eventually, column coverage)
    --
    -- * The index helping this case is the one corresponding to the PK of 
    --   api_meter_data, used in DESC order
    -- * A second index will help: the one used for the JOIN condition
    -- * How does postgresql choose to JOIN will depend on specific data values 
    --   distribution, sizes, etc.
    EXPLAIN ANALYZE
    SELECT 
        api_meterdata.id, api_meterdata.meter_id, api_meterdata.datetime, 
        api_meter.id 
    FROM 
        api_meterdata 
        INNER JOIN api_meter ON ( api_meterdata.meter_id = api_meter.id ) 
    ORDER BY 
        api_meterdata.id DESC 
    LIMIT 100;
    
    | 查询计划 |
    | :------------------------------------------------ -------------------------------------------------- -------------------------------------------------- ------------ |
    | 限制(成本=0.57..20.71 行=100 宽度=20)(实际时间=0.033..0.188 行=100 循环=1)|
    | -> 嵌套循环(成本=0.57..130514.61 行=648001 宽度=20)(实际时间=0.031..0.175 行=100 循环=1)|
    | -> Index Scan Backward using api_meterdata_pkey on api_meterdata (cost=0.42..20342.44 rows=648001 width=16) (实际时间=0.023..0.038 rows=100 loops=1) |
    | -> Index Only Scan using api_meter_pkey on api_meter (cost=0.14..0.16 rows=1 width=4) (实际时间=0.001..0.001 rows=1 loops=100) |
    | 指数条件:(id = api_meterdata.meter_id)|
    | 堆取数:100 |
    | 规划时间:0.331 ms |
    | 执行时间:0.216 ms |
    

    第二次查询分析

    -- This query doesn't have either a WHERE clause, so, indexes will be used 
    -- based on ORDER BY + LIMIT (and, eventually, column coverage).
    -- * The index helping this case is the one corresponding to 
    --   ON api_meterdata (datetime), because that's the only column used in the
    --   ORDER BY.
    -- * A second index will help: the one used for the JOIN condition
    -- * How does postgresql choose to JOIN will depend on specific data values 
    --   distribution
    EXPLAIN ANALYZE 
    SELECT 
        api_meterdata.id, api_meterdata.meter_id, api_meterdata.datetime, 
        api_meter.id 
    FROM 
        api_meterdata 
        INNER JOIN api_meter ON ( api_meterdata.meter_id = api_meter.id ) 
    ORDER BY 
        api_meterdata.datetime ASC 
    LIMIT 100;
    
    | 查询计划 |
    | :------------------------------------------------ -------------------------------------------------- -------------------------------------------------- ---------- |
    | 限制(成本=0.57..20.71 行=100 宽度=20)(实际时间=0.041..0.201 行=100 循环=1)|
    | -> 嵌套循环(成本=0.57..130514.61 行=648001 宽度=20)(实际时间=0.040..0.182 行=100 循环=1)|
    | -> 在 api_meterdata 上使用 api_meterdata_datetime_idx 进行索引扫描(成本=0.42..20342.44 行=648001 宽度=16)(实际时间=0.036..0.048 行=100 循环=1) |
    | -> Index Only Scan using api_meter_pkey on api_meter (cost=0.14..0.16 rows=1 width=4) (实际时间=0.001..0.001 rows=1 loops=100) |
    | 指数条件:(id = api_meterdata.meter_id)|
    | 堆取数:100 |
    | 规划时间:0.113 ms |
    | 执行时间:0.224 ms |
    

    分析您的第 3 个查询,没有和使用建议的索引

    -- This query doesn't have either a WHERE clause.
    -- Again indexes will be used based on ORDER BY + LIMIT 
    -- * The index that would mostly he,p this case would be one with 
    --   (datetime ASC, id DESC). 
    --   But there's not in place. An index with (datetime) will not be good enough,
    --   because the second condition in ORDER BY will need to be evaluated before
    --   the LIMIT can be computed. That is a SORT will be needed 
    -- * A second index will help: the one used for the JOIN condition
    -- * How does postgresql choose to JOIN will depend on specific data values 
    --   distribution, as always.
    --
    -- This query performs MUCH WORSE than the previous one.
    
    EXPLAIN ANALYZE 
    SELECT api_meterdata.id, api_meterdata.meter_id, api_meterdata.datetime, api_meter.id 
    FROM api_meterdata 
        INNER JOIN api_meter ON ( api_meterdata.meter_id = api_meter.id ) 
    ORDER BY api_meterdata.datetime ASC, api_meterdata.id DESC 
    LIMIT 100;
    
    | 查询计划 |
    | :------------------------------------------------ -------------------------------------------------- ---------------------------------- |
    | 限制(成本=43662.02..43662.27 行=100 宽度=20)(实际时间=377.202..377.222 行=100 循环=1) |
    | -> 排序(成本=43662.02..45282.03 行=648001 宽度=20)(实际时间=377.202..377.210 行=100 循环=1)|
    | 排序键:api_meterdata.datetime、api_meterdata.id DESC |
    | 排序方法:top-N heapsort 内存:32kB |
    | -> Hash Join (cost=2.87..18895.89 rows=648001 width=20) (实际时间=0.034..270.809 rows=648001 loops=1) |
    | 哈希条件:(api_meterdata.meter_id = api_meter.id)|
    | -> Seq Scan on api_meterdata (cost=0.00..9983.01 rows=648001 width=16) (实际时间=0.007..75.104 rows=648001 loops=1) |
    | -> 哈希(成本=1.83..1.83 行=83 宽度=4)(实际时间=0.023..0.023 行=83 循环=1)|
    | 存储桶:1024 批次:1 内存使用量:11kB |
    | -> Seq Scan on api_meter (cost=0.00..1.83 rows=83 width=4) (实际时间=0.002..0.009 rows=83 loops=1) |
    | 规划时间:0.123 ms |
    | 执行时间:377.251 毫秒 |
    

    索引创建(和删除多余的)

    -- We DROP one of the indexes... which will become redundant
    -- CREATE INDEX api_meterdata_datetime_idx ON api_meterdata (datetime) ;
    DROP INDEX api_meterdata_datetime_idx ;
    
    -- And create one with two columns, and ordered in the same fashion need by the query
    CREATE INDEX api_meterdata_datetime_idx 
        ON api_meterdata (datetime ASC, id DESC) ;
    

    新场景下查询分析

    --
    -- We put in place the required index
    --
    -- This query is again fast, and has an execution plan equivalent in 
    -- structure to the two first ones. No SORT phase is needed, because rows are
    -- already retrieved in the correct order, and once the LIMIT is reached, no
    -- more rows are read from (disk/cache)
    --
    
    EXPLAIN ANALYZE 
    SELECT 
        api_meterdata.id, api_meterdata.meter_id, api_meterdata.datetime, 
        api_meter.id 
    FROM 
        api_meterdata 
        INNER JOIN api_meter ON ( api_meterdata.meter_id = api_meter.id ) 
    ORDER BY 
        api_meterdata.datetime ASC, api_meterdata.id DESC 
    LIMIT 100;
    
    | 查询计划 |
    | :------------------------------------------------ -------------------------------------------------- -------------------------------------------------- ---------- |
    | 限制(成本=0.57..21.86 行=100 宽度=20)(实际时间=0.019..0.229 行=100 循环=1)|
    | -> 嵌套循环(成本=0.57..137986.99 行=648001 宽度=20)(实际时间=0.018..0.214 行=100 循环=1)|
    | -> 在 api_meterdata 上使用 api_meterdata_datetime_idx 进行索引扫描(成本=0.42..27814.81 行=648001 宽度=16)(实际时间=0.013..0.040 行=100 循环=1)|
    | -> Index Only Scan using api_meter_pkey on api_meter (cost=0.14..0.16 rows=1 width=4) (实际时间=0.001..0.001 rows=1 loops=100) |
    | 指数条件:(id = api_meterdata.meter_id)|
    | 堆取数:100 |
    | 规划时间:0.218 ms |
    | 执行时间:0.262 ms |
    
    • 6

相关问题

  • PostgreSQL 中 UniProt 的生物序列

  • 如何确定是否需要或需要索引

  • 我在哪里可以找到mysql慢日志?

  • 如何优化大型数据库的 mysqldump?

  • PostgreSQL 9.0 Replication 和 Slony-I 有什么区别?

Sidebar

Stats

  • 问题 205573
  • 回答 270741
  • 最佳答案 135370
  • 用户 68524
  • 热门
  • 回答
  • Marko Smith

    连接到 PostgreSQL 服务器:致命:主机没有 pg_hba.conf 条目

    • 12 个回答
  • Marko Smith

    如何让sqlplus的输出出现在一行中?

    • 3 个回答
  • Marko Smith

    选择具有最大日期或最晚日期的日期

    • 3 个回答
  • Marko Smith

    如何列出 PostgreSQL 中的所有模式?

    • 4 个回答
  • Marko Smith

    列出指定表的所有列

    • 5 个回答
  • Marko Smith

    如何在不修改我自己的 tnsnames.ora 的情况下使用 sqlplus 连接到位于另一台主机上的 Oracle 数据库

    • 4 个回答
  • Marko Smith

    你如何mysqldump特定的表?

    • 4 个回答
  • Marko Smith

    使用 psql 列出数据库权限

    • 10 个回答
  • Marko Smith

    如何从 PostgreSQL 中的选择查询中将值插入表中?

    • 4 个回答
  • Marko Smith

    如何使用 psql 列出所有数据库和表?

    • 7 个回答
  • Martin Hope
    Jin 连接到 PostgreSQL 服务器:致命:主机没有 pg_hba.conf 条目 2014-12-02 02:54:58 +0800 CST
  • Martin Hope
    Stéphane 如何列出 PostgreSQL 中的所有模式? 2013-04-16 11:19:16 +0800 CST
  • Martin Hope
    Mike Walsh 为什么事务日志不断增长或空间不足? 2012-12-05 18:11:22 +0800 CST
  • Martin Hope
    Stephane Rolland 列出指定表的所有列 2012-08-14 04:44:44 +0800 CST
  • Martin Hope
    haxney MySQL 能否合理地对数十亿行执行查询? 2012-07-03 11:36:13 +0800 CST
  • Martin Hope
    qazwsx 如何监控大型 .sql 文件的导入进度? 2012-05-03 08:54:41 +0800 CST
  • Martin Hope
    markdorison 你如何mysqldump特定的表? 2011-12-17 12:39:37 +0800 CST
  • Martin Hope
    Jonas 如何使用 psql 对 SQL 查询进行计时? 2011-06-04 02:22:54 +0800 CST
  • Martin Hope
    Jonas 如何从 PostgreSQL 中的选择查询中将值插入表中? 2011-05-28 00:33:05 +0800 CST
  • Martin Hope
    Jonas 如何使用 psql 列出所有数据库和表? 2011-02-18 00:45:49 +0800 CST

热门标签

sql-server mysql postgresql sql-server-2014 sql-server-2016 oracle sql-server-2008 database-design query-performance sql-server-2017

Explore

  • 主页
  • 问题
    • 最新
    • 热门
  • 标签
  • 帮助

Footer

AskOverflow.Dev

关于我们

  • 关于我们
  • 联系我们

Legal Stuff

  • Privacy Policy

Language

  • Pt
  • Server
  • Unix

© 2023 AskOverflow.DEV All Rights Reserve