我可以在使用数据库后激活 PITR 吗？

Question

Asked: 2014-12-10 19:56:39 +0800 CST2014-12-10 19:56:39 +0800 CST 2014-12-10 19:56:39 +0800 CST

为什么 CTE 比内联子查询差得多

772

我试图更好地理解查询计划器在 postgresql 中是如何工作的。

我有这个查询：

select id from users 
    where id <> 2
    and gender = (select gender from users where id = 2)
    order by latest_location::geometry <-> (select latest_location from users where id = 2) ASC
    limit 50

它在我的数据库上运行不到 10 毫秒，用户表中有大约 50 万个条目。

然后我认为为了避免重复的子选择，我可以将查询重写为 CTE，如下所示：

with me as (
    select * from users where id = 2
)
select u.id, u.popularity from users u, me 
    where u.gender = me.gender
    order by  u.latest_location::geometry <-> me.latest_location::geometry ASC
    limit 50;

然而，这个重写的查询运行大约 1 秒！为什么会这样？我在解释中看到它不使用几何索引，但是可以为此做些什么吗？谢谢！

编写查询的另一种方法是：

select u.id, u.popularity from users u, (select gender, latest_location from users where id = 2) as me 
    where u.gender = me.gender
    order by  u.latest_location::geometry <-> me.latest_location::geometry ASC
    limit 50;

但是，这也将与 CTE 一样慢。

另一方面，如果我提取出 me 参数并静态插入它们，查询又会很快：

select u.id, u.popularity from users u
    where u.gender = 'male'
    order by  u.latest_location::geometry <-> '0101000000A49DE61DA71C5A403D0AD7A370F54340'::geometry ASC
    limit 50;

第一个（快速）查询的解释

 Limit  (cost=5.69..20.11 rows=50 width=36) (actual time=0.512..8.114 rows=50 loops=1)
   InitPlan 1 (returns $0)
     ->  Index Scan using users_pkey on users users_1  (cost=0.42..2.64 rows=1 width=32) (actual time=0.032..0.033 rows=1 loops=1)
           Index Cond: (id = 2)
   InitPlan 2 (returns $1)
     ->  Index Scan using users_pkey on users users_2  (cost=0.42..2.64 rows=1 width=4) (actual time=0.009..0.010 rows=1 loops=1)
           Index Cond: (id = 2)
   ->  Index Scan using users_latest_location_gix on users  (cost=0.41..70796.51 rows=245470 width=36) (actual time=0.509..8.100 rows=50 loops=1)
         Order By: (latest_location <-> $0)
         Filter: (gender = $1)
         Rows Removed by Filter: 20
 Total runtime: 8.211 ms
(12 rows)

第二个（慢）查询的解释

Limit  (cost=62419.82..62419.95 rows=50 width=76) (actual time=1024.963..1024.970 rows=50 loops=1)
   CTE me
     ->  Index Scan using users_pkey on users  (cost=0.42..2.64 rows=1 width=221) (actual time=0.037..0.038 rows=1 loops=1)
           Index Cond: (id = 2)
   ->  Sort  (cost=62417.18..63030.86 rows=245470 width=76) (actual time=1024.959..1024.963 rows=50 loops=1)
         Sort Key: ((u.latest_location <-> me.latest_location))
         Sort Method: top-N heapsort  Memory: 28kB
         ->  Hash Join  (cost=0.03..54262.85 rows=245470 width=76) (actual time=0.122..938.131 rows=288646 loops=1)
               Hash Cond: (u.gender = me.gender)
               ->  Seq Scan on users u  (cost=0.00..49353.41 rows=490941 width=48) (actual time=0.021..465.025 rows=490994 loops=1)
               ->  Hash  (cost=0.02..0.02 rows=1 width=36) (actual time=0.054..0.054 rows=1 loops=1)
                     Buckets: 1024  Batches: 1  Memory Usage: 1kB
                     ->  CTE Scan on me  (cost=0.00..0.02 rows=1 width=36) (actual time=0.047..0.049 rows=1 loops=1)
 Total runtime: 1025.096 ms

1 个回答

Voted

Noah Yetter · Answer 1 · 2015-03-24T15:37:38+08:00

尝试这个：

with me as (
    select * from users where id = 2
)
select u.id, u.popularity from users u, me 
    where u.gender = (select gender from me)
    order by  u.latest_location::geometry <-> (select latest_location from me)::geometry ASC
    limit 50;

当我查看快速计划时，我突然想到了以下内容（粗体）：

限制（成本=5.69..20.11 行=50 宽度=36）（实际时间=0.512..8.114 行=50 循环=1）
   InitPlan 1（返回 $0）
     -> 在用户 users_1 上使用 users_pkey 进行索引扫描（成本=0.42..2.64 行=1 宽度=32）（实际时间=0.032..0.033 行=1 循环=1）
           指数条件：（id = 2）
   InitPlan 2（返回 $1）
     -> 在用户 users_2 上使用 users_pkey 进行索引扫描（成本=0.42..2.64 行=1 宽度=4）（实际时间=0.009..0.010 行=1 循环=1）
           指数条件：（id = 2）
   -> 使用 users_latest_location_gix 对用户进行索引扫描（成本=0.41..70796.51 行=245470 宽度=36）（实际时间=0.509..8.100 行=50 循环=1）
         订购方式： (latest_location   $0 )
         过滤器：（性别 = $1）
         过滤器删除的行数：20
 总运行时间：8.211 毫秒
（12 行）

在慢速版本中，查询计划器在join的上下文中评估相等运算符 ongender和几何运算符 on ，其中 from 的值可能随每一行而变化（即使它正确估计了只有 1 行）。在快速版本中，和的值被视为标量，因为它们是由内联子查询发出的，这告诉查询规划器它只有一个值要处理。这与粘贴文字值时获得快速计划的原因相同。latest_locationmegenderlatest_location

为什么 CTE 比内联子查询差得多

连接到 PostgreSQL 服务器：致命：主机没有 pg_hba.conf 条目

如何让sqlplus的输出出现在一行中？

选择具有最大日期或最晚日期的日期

如何列出 PostgreSQL 中的所有模式？

列出指定表的所有列

如何在不修改我自己的 tnsnames.ora 的情况下使用 sqlplus 连接到位于另一台主机上的 Oracle 数据库

你如何mysqldump特定的表？

使用 psql 列出数据库权限

如何从 PostgreSQL 中的选择查询中将值插入表中？

如何使用 psql 列出所有数据库和表？

为什么 CTE 比内联子查询差得多

1 个回答

相关问题