Divick提出的问题 -dba

Divick

Asked: 2017-07-09 00:24:09 +0800 CST

与单列排序相比，两列排序非常慢

我正在使用 Postgres，我看到两列上的 order by 与仅一列上的 order by 相比，我的查询慢了几个数量级。我正在考虑的表中有大约 2950 万行。

以下是三个不同查询的结果：

仅在 id 上订购：

EXPLAIN ANALYZE SELECT "api_meterdata"."id", "api_meterdata"."meter_id", "api_meterdata"."datetime", "api_meter"."id" FROM "api_meterdata" INNER JOIN "api_meter" ON ( "api_meterdata"."meter_id" = "api_meter"."id" ) ORDER BY "api_meterdata"."id" DESC LIMIT 100;
                                                                               QUERY PLAN                                                            

------------------------------------------------------------------------------------------------------------------------------------------------------------------------  
 Limit  (cost=0.44..321.49 rows=100 width=20) (actual time=0.407..30.424 rows=100 loops=1)    
   ->  Nested Loop  (cost=0.44..94824299.30 rows=29535145 width=20) (actual time=0.402..30.090 rows=100 loops=1)
         Join Filter: (api_meterdata.meter_id = api_meter.id)
         Rows Removed by Join Filter: 8147
         ->  Index Scan Backward using api_meterdata_pkey on api_meterdata  (cost=0.44..58053041.74 rows=29535145 width=16) (actual time=0.103..0.867 rows=100 loops=1)
         ->  Materialize  (cost=0.00..2.25 rows=83 width=4) (actual time=0.002..0.144 rows=82 loops=100)
               ->  Seq Scan on api_meter  (cost=0.00..1.83 rows=83 width=4) (actual time=0.008..0.153 rows=83 loops=1)  Planning time:
0.491 ms  Execution time: 30.701 ms (9 rows)

仅在日期时间订购：

EXPLAIN ANALYZE SELECT "api_meterdata"."id", "api_meterdata"."meter_id", "api_meterdata"."datetime", "api_meter"."id" FROM "api_meterdata" INNER JOIN "api_meter" ON ( "api_meterdata"."meter_id" = "api_meter"."id" ) ORDER BY "api_meterdata"."datetime" ASC LIMIT 100;
                                                                               QUERY PLAN                                                                                
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.44..321.50 rows=100 width=20) (actual time=1.245..37.054 rows=100 loops=1)
   ->  Nested Loop  (cost=0.44..94825493.68 rows=29535313 width=20) (actual time=1.238..36.652 rows=100 loops=1)
         Join Filter: (api_meterdata.meter_id = api_meter.id)
         Rows Removed by Join Filter: 8148
         ->  Index Scan using api_meterdata_datetime_index on api_meterdata  (cost=0.44..58054026.95 rows=29535313 width=16) (actual time=0.851..1.501 rows=100 loops=1)
         ->  Materialize  (cost=0.00..2.25 rows=83 width=4) (actual time=0.002..0.172 rows=82 loops=100)
               ->  Seq Scan on api_meter  (cost=0.00..1.83 rows=83 width=4) (actual time=0.013..0.192 rows=83 loops=1)
 Planning time: 0.483 ms
 Execution time: 37.340 ms
(9 rows)

在日期时间和 id 上按顺序排列：

EXPLAIN ANALYZE SELECT "api_meterdata"."id", "api_meterdata"."meter_id", "api_meterdata"."datetime", "api_meter"."id" FROM "api_meterdata" INNER JOIN "api_meter" ON ( "api_meterdata"."meter_id" = "api_meter"."id" ) ORDER BY "api_meterdata"."datetime" ASC, "api_meterdata"."id" DESC LIMIT 100;
                                                                    QUERY PLAN                                                                    
--------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=3064122.28..3064122.53 rows=100 width=20) (actual time=146772.167..146772.372 rows=100 loops=1)
   ->  Sort  (cost=3064122.28..3137955.90 rows=29533446 width=20) (actual time=146772.164..146772.242 rows=100 loops=1)
         Sort Key: api_meterdata.datetime, api_meterdata.id
         Sort Method: top-N heapsort  Memory: 32kB
         ->  Hash Join  (cost=2.87..1935375.21 rows=29533446 width=20) (actual time=0.394..113349.364 rows=29535544 loops=1)
               Hash Cond: (api_meterdata.meter_id = api_meter.id)
               ->  Seq Scan on api_meterdata  (cost=0.00..1529287.46 rows=29533446 width=16) (actual time=0.220..47537.991 rows=29535544 loops=1)
               ->  Hash  (cost=1.83..1.83 rows=83 width=4) (actual time=0.160..0.160 rows=83 loops=1)
                     Buckets: 1024  Batches: 1  Memory Usage: 3kB
                     ->  Seq Scan on api_meter  (cost=0.00..1.83 rows=83 width=4) (actual time=0.005..0.071 rows=83 loops=1)
 Planning time: 0.290 ms
 Execution time: 146772.500 ms
(12 rows)

这是表上的索引：

SELECT * FROM pg_indexes WHERE tablename = 'api_meterdata';
 schemaname |   tablename   |                  indexname                   | tablespace |                                                      indexdef                                       

------------+---------------+----------------------------------------------+------------+-----------------------------------------------------------------------------------------------------
---------------
 public     | api_meterdata | api_meterdata_meter_id_36fe63013b50049f_uniq |            | CREATE UNIQUE INDEX api_meterdata_meter_id_36fe63013b50049f_uniq ON api_meterdata USING btree (meter
_id, datetime)
 public     | api_meterdata | api_meterdata_pkey                           |            | CREATE UNIQUE INDEX api_meterdata_pkey ON api_meterdata USING btree (id)
 public     | api_meterdata | api_meterdata_f7a5de1d                       |            | CREATE INDEX api_meterdata_f7a5de1d ON api_meterdata USING btree (meter_id)
 public     | api_meterdata | api_meterdata_datetime_index                 |            | CREATE INDEX api_meterdata_datetime_index ON api_meterdata USING btree (datetime)
(4 rows)

我可以看到这是花费时间最长的排序步骤。但不确定为什么。

Divick

Asked: 2017-02-13 19:55:30 +0800 CST

在外键上使用 order by 的左外连接返回带分页的重复项

我有两个表 api_user 和 api_user，其中 api_user 具有用户表的外键。这两个表的架构如下所列。

                                 Table "public.api_user"
   Column    |           Type           |                       Modifiers                       
--------------+--------------------------+-------------------------------------------------------
 id           | integer                  | not null default nextval('api_user_id_seq'::regclass)
 is_admin     | boolean                  | not null
 is_agent     | boolean                  | not null
 is_guide     | boolean                  | not null
Indexes:
    "api_user_pkey" PRIMARY KEY, btree (id)
Referenced by:
    TABLE "api_userprofile" CONSTRAINT "api_userprofile_user_id_5a1c1c92_fk_api_user_id" FOREIGN KEY (user_id) REFERENCES api_user(id) DEFERRABLE INITIALLY DEFERRED


                                         Table "public.api_userprofile"
         Column         |          Type           |                          Modifiers                           
------------------------+-------------------------+--------------------------------------------------------------
 id                     | integer                 | not null default nextval('api_userprofile_id_seq'::regclass)
 percent_complete       | numeric(3,0)            | not null
 display_name           | character varying(128)  | not null
 city                   | character varying(64)   | not null
 user_id                | integer                 | not null
Indexes:
    "api_userprofile_pkey" PRIMARY KEY, btree (id)
    "api_userprofile_user_id_key" UNIQUE CONSTRAINT, btree (user_id)
Foreign-key constraints:
    "api_userprofile_user_id_5a1c1c92_fk_api_user_id" FOREIGN KEY (user_id) REFERENCES api_user(id) DEFERRABLE INITIALLY DEFERRED

当我运行以下查询时：

select 
    api_user.id, 
    api_userprofile.display_name, 
    api_userprofile.city
FROM "api_user" 
LEFT OUTER JOIN "api_userprofile" ON ("api_user"."id" = "api_userprofile"."user_id") 
WHERE ((("api_user"."is_admin" = false 
    AND "api_userprofile"."percent_complete" >= 60.0 
    AND "api_userprofile"."id" IS NOT NULL)) 
    AND "api_user"."is_guide" = true)
ORDER BY "api_userprofile"."city" ASC LIMIT 20;

它返回：

id  |       display_name        |           city           
-----+---------------------------+--------------------------
 299 | Mohsin Khan               | Agra
  93 | Rizwan Mohd               | Agra
 126 | Abdhesh Sharma            | Agra
  39 | Rashid Ahmed              | Agra
 244 | Nishkam Sharma            | Ajmer
  42 | Parminder Mahla           | Amritsar
 131 | Prashant Hullatti         | Ballry
 241 | Pankaj Anand              | Bangalore
  89 | Niraj K. Singh            | Bodhgaya, Nalanda, Patna
 204 | Ravi Rocks                | Bokaro
  19 | Ian Lotriet               | Cape Town
  15 | Ivy Almacin               | Cape Town
  38 | Dr Brahm Prakaah Tripathi | Delhi
 130 | Virendra Singh            | Delhi
 271 | Satish Jain               | Delhi
 110 | Vikas Agarwal             | Delhi
 114 | Devi Singh Rathore        | Delhi
  58 | Dilip Singh Chanpawat     | Delhi
  95 | Anam Kumar Dhasmana       | Delhi
  51 | Gopal Sharma              | Delhi

使用偏移量 20 返回再次运行查询：

 id  |       display_name        |    city    
-----+---------------------------+------------
  95 | Anam Kumar Dhasmana       | Delhi
 114 | Devi Singh Rathore        | Delhi
 252 | Tarun Pratap Singh        | Delhi
 258 | Rajesh Kumar Pal          | Delhi
 255 | Chandan Singh Shekhawat   | Delhi
 268 | Amit Kumar                | Delhi
 100 | Ketan Mehra               | Delhi
 286 | Vikash Poonia             | Delhi
  61 | Belinda Schempers         | Durban
  67 | Pieter Janse Van Rensburg | Hoedspruit
 140 | Dr Hari Krishna Somanchi  | Hyderabad
 197 | Sstya Prabha              | Hyderabad
 118 | Dalpat Jodha              | Jaipur
 253 | Yash Shekhawat            | Jaipur
 120 | Govind Sharma             | Jaipur
 257 | Abhimanyu Singh           | Jaipur
  99 | Ghanshyam Singh           | Jaisalmer
 308 | Nitin Lobo                | Jodhpur
 124 | Rajendra Singh            | Jodhpur
  55 | Umed Gehlot               | Jodhpur

从输出中可以看出，在第一个查询和下一个偏移量为 20 的查询中都返回了一些重复项（参见 ID 为 114 的用户）。

使用 distinct 似乎工作正常，但为什么它在按相关表（用户配置文件）上的字段排序时返回重复项？

显然，如果我按 user.id 订购，那么它似乎也能正常工作并且不会返回重复项。

这里 user 和 userprofile 之间的关系是一对一的，没有 user.id 在 userprofile.user_id 中被多次引用（由框架（django）强制执行）。

与单列排序相比，两列排序非常慢

在外键上使用 order by 的左外连接返回带分页的重复项

连接到 PostgreSQL 服务器：致命：主机没有 pg_hba.conf 条目

如何让sqlplus的输出出现在一行中？

选择具有最大日期或最晚日期的日期

如何列出 PostgreSQL 中的所有模式？

列出指定表的所有列

如何在不修改我自己的 tnsnames.ora 的情况下使用 sqlplus 连接到位于另一台主机上的 Oracle 数据库

你如何mysqldump特定的表？

使用 psql 列出数据库权限

如何从 PostgreSQL 中的选择查询中将值插入表中？

如何使用 psql 列出所有数据库和表？

Divick's questions