PostgreSQL 中 UniProt 的生物序列

Question

Hassan Baig

Asked: 2019-01-31 12:26:32 +0800 CST2019-01-31 12:26:32 +0800 CST 2019-01-31 12:26:32 +0800 CST

优化 SELECT 查询始终显示在慢速日志中

772

在名为“链接”的应用程序中，用户发布他们最近发现的有趣内容的链接和照片（以及其他人对上述帖子发表评论）。

照片下的这些张贴评论保存在links_photocomment我的 postgresql 9.6.5 数据库中的一个表中。

表中的一个SELECT查询links_photocomment始终显示在slow_log. 它花费的时间超过 500 毫秒，并且比我在大多数其他 postgresql 操作中遇到的慢 10 倍。

这是我的慢日志中相应 SQL 的示例：

日志：持续时间：5071.112 毫秒语句：

SELECT "links_photocomment"."abuse",
       "links_photocomment"."text",
       "links_photocomment"."id",
       "links_photocomment"."submitted_by_id",
       "links_photocomment"."submitted_on",
       "auth_user"."username",
       "links_userprofile"."score"
FROM   "links_photocomment"
       INNER JOIN "auth_user"
               ON ( "links_photocomment"."submitted_by_id" = "auth_user"."id" )
       LEFT OUTER JOIN "links_userprofile"
                    ON ( "auth_user"."id" = "links_userprofile"."user_id" )
WHERE  "links_photocomment"."which_photo_id" = 3115087
ORDER  BY "links_photocomment"."id" DESC
LIMIT  25;

查看explain analyze结果：https ://explain.depesz.com/s/UuCk

查询最终根据那个过滤了 19,100,179 行！

我试过的：

我的直觉是 Postgres 将此查询计划基于误导性统计数据。因此，我VACUUM ANALYZE在上述桌子上跑步。然而这并没有改变任何东西。

作为某种偶然的 DBA，我正在寻找有关该主题的一些快速专家指导。在此先感谢并为 noob 问题（如果是）道歉。

附录：

以下是的完整输出\d links_photocomment：

                                      Table "public.links_photocomment"
     Column      |           Type           |                            Modifiers                            
-----------------+--------------------------+-----------------------------------------------------------------
 id              | integer                  | not null default nextval('links_photocomment_id_seq'::regclass)
 which_photo_id  | integer                  | not null
 text            | text                     | not null
 device          | character varying(10)    | not null
 submitted_by_id | integer                  | not null
 submitted_on    | timestamp with time zone | not null
 image_comment   | character varying(100)   | not null
 has_image       | boolean                  | not null
 abuse           | boolean                  | default false
Indexes:
    "links_photocomment_pkey" PRIMARY KEY, btree (id)
    "links_photocomment_submitted_by_id" btree (submitted_by_id)
    "links_photocomment_which_photo_id" btree (which_photo_id)
Foreign-key constraints:
    "links_photocomment_submitted_by_id_fkey" FOREIGN KEY (submitted_by_id) REFERENCES auth_user(id) DEFERRABLE INITIALLY DEFERRED
    "links_photocomment_which_photo_id_fkey" FOREIGN KEY (which_photo_id) REFERENCES links_photo(id) DEFERRABLE INITIALLY DEFERRED
Referenced by:
    TABLE "links_photo" CONSTRAINT "latest_comment_id_refs_id_f2566197" FOREIGN KEY (latest_comment_id) REFERENCES links_photocomment(id) DEFERRABLE INITIALLY DEFERRED
    TABLE "links_report" CONSTRAINT "links_report_which_photocomment_id_fkey" FOREIGN KEY (which_photocomment_id) REFERENCES links_photocomment(id) DEFERRABLE INITIALLY DEFERRED
    TABLE "links_photo" CONSTRAINT "second_latest_comment_id_refs_id_f2566197" FOREIGN KEY (second_latest_comment_id) REFERENCES links_photocomment(id) DEFERRABLE INITIALLY DEFERRED

1 个回答

Voted

ypercubeᵀᴹ · Answer 1 · 2019-01-31T14:06:24+08:00

该计划不使用索引(which_photo_id)而是使用 PK(id)索引，因此它必须读取索引的很大一部分（如果匹配过滤器的行少于 25 行，则读取全部）。这在具体执行中大约需要 4.4 秒（并在读取并拒绝 19M 行后找到 25 行）：

-> Index Scan Backward using links_photocomment_pkey on links_photocomment  
   (cost=0.57..2,819,246.22 rows=7,195 width=41)
   (actual time=555.830..4,929.154 rows=25 loops=1)

   Filter: (which_photo_id = 3115087)
   Rows Removed by Filter: 19100179

我会尝试这些：

用上的索引替换上(which_photo_id)的索引(which_photo_id, id)。
INNER将连接重写为LEFT连接（有一个FOREIGN KEY约束可确保两个查询将产生相同的结果。）
用子查询（派生表或 CTE）重写，将WHERE过滤器移动到内部），以便首先获得 25 个 ID（希望仅使用索引扫描），然后加入其他 2 个表。

查询（带派生表）：

SELECT "links_photocomment"."abuse",
       "links_photocomment"."text",
       "links_photocomment"."id",
       "links_photocomment"."submitted_by_id",
       "links_photocomment"."submitted_on",
       "auth_user"."username",
       "links_userprofile"."score"
FROM   
       (
         SELECT id
         FROM links_photocomment
         WHERE which_photo_id = 3115087
         ORDER BY id DESC
         LIMIT 25
       ) AS lim
       INNER JOIN "links_photocomment"
           ON ( "links_photocomment"."id" = lim.id )
       LEFT OUTER JOIN "auth_user"
           ON ( "links_photocomment"."submitted_by_id" = "auth_user"."id" )
       LEFT OUTER JOIN "links_userprofile"
           ON ( "auth_user"."id" = "links_userprofile"."user_id" )
ORDER  BY lim.id DESC
LIMIT  25;

查询（使用 CTE）：

WITH lim AS
       (
         SELECT id
         FROM links_photocomment
         WHERE which_photo_id = 3115087
         ORDER BY id DESC
         LIMIT 25
       )
SELECT "links_photocomment"."abuse",
       "links_photocomment"."text",
       "links_photocomment"."id",
       "links_photocomment"."submitted_by_id",
       "links_photocomment"."submitted_on",
       "auth_user"."username",
       "links_userprofile"."score"
FROM   
       lim
       INNER JOIN "links_photocomment"
           ON ( "links_photocomment"."id" = lim.id )
       LEFT OUTER JOIN "auth_user"
           ON ( "links_photocomment"."submitted_by_id" = "auth_user"."id" )
       LEFT OUTER JOIN "links_userprofile"
           ON ( "auth_user"."id" = "links_userprofile"."user_id" )
ORDER  BY lim.id DESC
LIMIT  25;

优化 SELECT 查询始终显示在慢速日志中

连接到 PostgreSQL 服务器：致命：主机没有 pg_hba.conf 条目

如何让sqlplus的输出出现在一行中？

选择具有最大日期或最晚日期的日期

如何列出 PostgreSQL 中的所有模式？

列出指定表的所有列

如何在不修改我自己的 tnsnames.ora 的情况下使用 sqlplus 连接到位于另一台主机上的 Oracle 数据库

你如何mysqldump特定的表？

使用 psql 列出数据库权限

如何从 PostgreSQL 中的选择查询中将值插入表中？

如何使用 psql 列出所有数据库和表？

优化 SELECT 查询始终显示在慢速日志中

1 个回答

相关问题