关于【execution-plan】的问题- 第1页

Asked: 2022-09-14 14:04:19 +0800 CST

为什么 SQL Server 在更新位列时扫描所有行，即使是通过链接服务器的主键

我正在为 SQL 链接服务器的特定主键使用简单的更新语句，如下所示

UPDATE t
SET
    processed = 1,
    processed_on = GETDATE()
FROM [LINKED\SERVER].DATABASE.dbo.FileQueue t
WHERE t.FileId = '3b33eff6-fde1-4e8c-9c23-2dbd45f50222'

两台服务器都是 SQL Server 2019。表定义是

CREATE TABLE dbo.FileQueue
(
    FileId UNIQUEIDENTIFIER NOT NULL,
    Processed BIT NOT NULL,
    Processed_on DATETIME NULL
 CONSTRAINT PK_FileQueue PRIMARY KEY CLUSTERED 
 (
    FileId ASC
 )
)

Processed 列具有位类型。由于全表扫描，查询速度很慢。

为什么会这样？当我从语句中删除位列时，一切正常，读取和更新单个远程行。

该Id列是聚集的主键。我有很多具有相似键的表。

我尝试了CONVERTorCAST函数，结果是一样的。

对于没有位列的查询，执行计划非常好。

UPDATE t
SET
    --processed = 1,
    -- any other columns can be added to be updated except bit
    processed_on = GETDATE()
FROM [LINKED\SERVER].DATABASE.dbo.FileQueue t
WHERE t.FileId = 'ABD4442F-8560-43B5-8B04-000000B2A626'

Geezer

Asked: 2022-08-08 07:51:04 +0800 CST

SQL 所有的 Procs 都有 WITH RECOMPLILE 所以如何维护一个臃肿的计划缓存

我正在使用在 Azure SQL 数据库中创建的星型模式数据仓库，其中最后一个开发人员包含WITH RECOMPILE在所有 Procs 中。

我认为这是因为 ETL 每天只执行这些 Procs 两次，因此重新编译的开销很小。

但是，是否存在缓存因所有这些计划而变得臃肿的风险，如果是这样，维护计划缓存以使其保持精简和尽可能高效的最佳方法是什么？

Erik Darling

Asked: 2022-07-18 09:32:23 +0800 CST

为什么 SQL Server 可以准确地跟踪某些多语句表值函数查询计划而不是其他的时间？

设置

对于这个演示，我使用的是2013 版本的 Stack Overflow 数据库和 SQL Server 2022 CTP2，但它可以追溯到 SQL Server 2017，这是我想检查的。

功能一

对于此函数，SQL Server 跟踪函数中花费的执行时间：

CREATE OR ALTER FUNCTION
    dbo.ScoreStats
(
    @UserId int
)
RETURNS
    @out table
    (
        TotalScore bigint
    )
WITH SCHEMABINDING
AS 
BEGIN

    INSERT
        @out
    (
        TotalScore
    )
    SELECT
        TotalScore = 
            SUM(x.Score)
    FROM 
    (
        SELECT
            Score = 
                SUM(p.Score)
        FROM dbo.Posts AS p
        WHERE p.OwnerUserId = @UserId

        UNION ALL

        SELECT
            Score = 
                SUM(c.Score)
        FROM dbo.Comments AS c
        WHERE c.UserId = @UserId    
    ) AS x;

    RETURN;

END;

这是查询和执行计划：

SELECT
    u.DisplayName,
    TotalScore = 
        (
            SELECT
                ss.TotalScore
            FROM dbo.ScoreStats(u.Id) AS ss
        )
FROM dbo.Users AS u
WHERE u.Reputation >= 1000000;

您可以看到，在查询计划和 Query Time Stats 属性中都准确地跟踪了时间。

功能二

这是第二个功能，它不会发生：

CREATE OR ALTER FUNCTION
    dbo.VoteStats()
RETURNS
    @out table
    (
        PostId int,
        UpVotes int,
        DownVotes int,
        UpMultipier AS 
             UpVotes * 2
    )
WITH SCHEMABINDING
AS 
BEGIN

    INSERT
        @out
    (
        PostId,
        UpVotes,
        DownVotes
    )
    SELECT
        v.PostId,
        UpVotes = 
            SUM
            (
                CASE v.VoteTypeId
                     WHEN 2
                     THEN 1
                     ELSE 0
                END
            ),
        DownVotes = 
            SUM
            (
                CASE v.VoteTypeId
                     WHEN 3
                     THEN 1
                     ELSE 0
                END
            )
    FROM dbo.Votes AS v
    GROUP BY 
        v.PostId;

    RETURN;

END;

这是查询和执行计划：

SELECT TOP (100)
     p.Id,
     vs.UpVotes,
     vs.DownVotes
FROM dbo.VoteStats() AS vs
JOIN dbo.Posts AS p
    ON vs.PostId = p.Id
WHERE vs.DownVotes > vs.UpMultipier
AND   p.CommunityOwnedDate IS NULL
AND   p.ClosedDate IS NULL
ORDER BY vs.UpVotes DESC;

在此查询中，时间没有在图形执行计划中准确跟踪，而是在 Query Time Stats 属性中进行跟踪。

MAXDOP 1 处的功能二

即使是强制连载，也无法准确跟踪时间：

SELECT TOP (100)
     p.Id,
     vs.UpVotes,
     vs.DownVotes
FROM dbo.VoteStats() AS vs
JOIN dbo.Posts AS p
    ON vs.PostId = p.Id
WHERE vs.DownVotes > vs.UpMultipier
AND   p.CommunityOwnedDate IS NULL
AND   p.ClosedDate IS NULL
ORDER BY vs.UpVotes DESC
OPTION(MAXDOP 1);

问题

回到手头的问题：为什么在一个查询计划中可以准确地跟踪时间，而在另一个查询计划中却没有？

Sylvia

Asked: 2022-06-07 08:01:11 +0800 CST

在我的查询计划中 - 多个节点说他们的成本是 100%

在此查询计划中：

https://www.brentozar.com/pastetheplan/?id=HydtQjsO5

...多个节点的成本为 100%。这是怎么发生的？当如此多的节点具有一定是错误的成本时，您如何确定从哪里开始优化？是否有其他东西可以代替成本百分比？

Data Dill

Asked: 2022-06-03 10:22:21 +0800 CST

在并行位图哈希匹配计划中读取可能不正确的行数

我正在尝试确定这是报告错误的执行计划还是按预期工作的实际功能。

在我的执行计划中，当使用 MAXDOP 1 时，我看到整个表正在被扫描（https://www.brentozar.com/pastetheplan/?id=HkPFquLdc - 计划对象 11 的最底部显示读取了大约 250 万行整个表）。

但是，当我让引擎在没有提示的情况下选择它自己的计划时，它会并行（https://www.brentozar.com/pastetheplan/?id=r12p5OUdc）并进行位图/哈希匹配，并且相同的 Object11 只显示〜尽管进行了索引扫描并且在不在聚集索引中的其他列上有一堆谓词，但仍读取了 534k 行。

我希望 SQL 必须读取表中的每一行来评估每个谓词，但也许并行计划中对象 11 上的 PROBE IN（您在粘贴计划中看不到这一点）能够“过滤掉页面" 因为探头在 PK/CX 上。

ahmed elbarbary

Asked: 2022-02-28 16:46:44 +0800 CST

何时从表中选择前 5 行需要太多时间？

我在 sql server 2019 上工作，我在选择top 5 行时遇到问题，它需要太多时间。

表上的行数Z2DataCore.parts.SourcingNotMappedParts70 百万行。

当运行语句选择top 5它需要太多时间超过15 minutes。

那么如何让它更快

选择有问题的语句

SELECT top 5 GivenPartNumber_Non,vcompanyid
into #GetSupplierAndOther
FROM Z2DataCore.parts.SourcingNotMappedParts with(nolock)
Where  PriorityLevel in ('A3','A4') and vcompanyid is not null and sourcetypeid=484456
group by GivenPartNumber_Non,vcompanyid
having count(distinct sourcetypeid)=2

我估计的执行计划

https://www.brentozar.com/pastetheplan/?id=r1EPmqFx5

注意：我尝试在不使用的情况下选择上面的列，select into但仍然很慢。

示例表脚本和索引

CREATE TABLE [Parts].[SourcingNotMappedParts](
    [SourcingNotMappedPartsID] [int] IDENTITY(1,1) NOT NULL,
    [SearchPart] [nvarchar](200) NULL,
    [GivenManufacture] [nvarchar](200) NULL,
    [CompanyId] [int] NULL,
    [SourceTypeID] [int] NULL,
    [RevisionId] [bigint] NULL,
    [ExtractionDate] [date] NULL,
    [Taxonomy] [nvarchar](250) NULL,
    [PartStatus] [nvarchar](50) NULL,
    [Datasheet] [nvarchar](2000) NULL,
    [ROHS] [nvarchar](250) NULL,
    [StockId] [int] NULL,
    [SourceUrl] [nvarchar](2000) NULL,
    [Description] [nvarchar](2000) NULL,
    [CreatedBy] [int] NULL,
    [ModifiedBy] [int] NULL,
    [CreatedDate] [datetime] NULL,
    [ModifiedDate] [datetime] NULL,
    [Comment] [nvarchar](2000) NULL,
    [Reason] [nvarchar](2000) NULL,
    [PartId] [int] NULL,
    [GroupID] [int] NULL,
    [PartStatusID] [int] NULL,
    [ManufactureStatus] [int] NULL,
    [EditStatus] [int] NULL,
    [FamilyID] [int] NULL,
    [LookupId] [int] NULL,
    [ValidationReasonId] [int] NULL,
    [MatchStatus] [nvarchar](200) NULL,
    [GivenPartNumber_Non] [nvarchar](200) NULL,
    [GivenManufacturer_Non] [nvarchar](200) NULL,
    [signatureID] [int] NULL,
    [VCompanyId] [int] NULL,
    [PriorityLevel] [nvarchar](10) NULL,
    [NotMappedCode] [int] NULL,
    [PCPartStatus] [nvarchar](50) NULL,
 CONSTRAINT [PK_Parts.SourcingNotMappedParts] PRIMARY KEY CLUSTERED 
(
    [SourcingNotMappedPartsID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]

GO

ALTER TABLE [Parts].[SourcingNotMappedParts] ADD  CONSTRAINT [DF_SourcingNotMappedParts_CreatedDate]  DEFAULT (getdate()) FOR [CreatedDate]
GO

ALTER TABLE [Parts].[SourcingNotMappedParts] ADD  CONSTRAINT [DF_SourcingNotMappedParts_ModifiedDate]  DEFAULT (getdate()) FOR [ModifiedDate]
GO

ALTER TABLE [Parts].[SourcingNotMappedParts] ADD  CONSTRAINT [PK_Parts.SourcingNotMappedParts] PRIMARY KEY CLUSTERED 
(
    [SourcingNotMappedPartsID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
GO
CREATE NONCLUSTERED INDEX [IX_NotMapped_SourceType] ON [Parts].[SourcingNotMappedParts]
(
    [SourceTypeID] ASC,
    [CompanyId] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
GO
CREATE NONCLUSTERED INDEX [IX_NotMapped_PriorityLevel] ON [Parts].[SourcingNotMappedParts]
(
    [PriorityLevel] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
GO
CREATE NONCLUSTERED INDEX [IX_NotMapped_NonalphaPartCompany] ON [Parts].[SourcingNotMappedParts]
(
    [GivenPartNumber_Non] ASC,
    [VCompanyId] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
GO
CREATE NONCLUSTERED INDEX [IDX_SourcingNotMappedParts_VCompanyId] ON [Parts].[SourcingNotMappedParts]
(
    [VCompanyId] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
GO

SEarle1986

Asked: 2022-01-28 15:11:23 +0800 CST

执行计划中的位图创建导致聚集索引扫描的错误估计

鉴于 StackOverflow2010 数据库上的以下简单查询：

SELECT  u.DisplayName,
        u.Reputation
FROM    Users u
        JOIN Posts p
            ON u.id = p.OwnerUserId
WHERE   u.DisplayName = 'alex' AND
        p.CreationDate >= '2010-01-01' AND
        p.CreationDate <= '2010-03-01'

我试图理解为什么要创建索引

CREATE INDEX IX_CreationDate ON Posts
(
    CreationDate
)
INCLUDE (OwnerUserId)

产生更好的估计Posts.CreationDate

当我运行没有索引的查询时，我得到Plan 1。在这个计划中，SQL Server 估计有 298,910 行来自对 Posts 的 CI 扫描，实际上有 552 行回来了——这个估计还有很长的路要走。

添加索引后，我会得到Plan 2，这会导致索引搜索和更准确的估计。

我很好奇为什么添加索引会导致更好的估计，因为在谓词中使用列时会创建统计信息WHERE，无论它是否被索引。

进一步检查，我可以看到Posts.CreationDate计划 1 和计划 2 的谓词不同：

计划 1 谓词

[StackOverflow2010].[dbo].[Posts].[CreationDate] as [p].[CreationDate]>='2010-01-01 00:00:00.000' AND [StackOverflow2010].[dbo].[Posts].[CreationDate] as [p].[CreationDate]<='2010-03-01 00:00:00.000' AND PROBE([Bitmap1002],[StackOverflow2010].[dbo].[Posts].[OwnerUserId] as [p].[OwnerUserId],N'[IN ROW]')

计划 2 谓词

Seek Keys[1]: Start: [StackOverflow2010].[dbo].[Posts].CreationDate >= Scalar Operator('2010-01-01 00:00:00.000'), End: [StackOverflow2010].[dbo].[Posts].CreationDate <= Scalar Operator('2010-03-01 00:00:00.000')

所以我可以看到计划 2 只是要使用直方图来查找两个日期之间的行数，但计划 1 有一个稍微复杂的谓词，涉及位图探测。

这（我认为）解释了为什么对搜索的估计更准确，但我现在想知道什么是位图探测？我可以在计划中看到创建了一个与 Alex 谓词匹配的用户 ID 的位图，这就是正在调查的内容。

我想知道“没有索引，为什么计划 1 与计划 2 不同，唯一的区别是 CI 扫描而不是 CreationDate 上的索引搜索？”

我做了一些进一步的测试，发现如果我在没有索引的情况下运行查询但强制计划进入串行，使用OPTION (MAXDOP 1)我得到计划 3，尽管现在对 Posts 进行 CI 扫描，但它对 CreationDate 的估计更好。如果我查看谓词，我可以看到探针现在已经消失并且位图不再在计划中，因此这使我相信位图与计划并行有关。

所以我的问题是 - 为什么在计划并行时会创建位图，为什么会导致对的估计如此糟糕Posts.CreationDate？

Vadim Samokhin

Asked: 2022-01-24 03:57:59 +0800 CST

Postgresql 10：具有精确堆块的位图堆扫描

我有以下查询：

select ro.*
from courier c1
    join courier c2 on c2.real_physical_courier_1c_id = c1.real_physical_courier_1c_id
    join restaurant_order ro on ro.courier_id = c2.id
    left join jsonb_array_elements(items) jae on true
    left join jsonb_array_elements(jae->'options') ji on true
    inner join catalogue c on c.id in ((jae->'id')::int, (ji->'id')::int)
    join restaurant r on r.id = ro.restaurant_id
where c1.id = '7b35cdab-b423-472a-bde1-d6699f6cefd3' and ro.status in (70, 73)
group by ro.order_id, r.id ;

这是查询计划的一部分，它需要大约 95% 的时间：

->  Parallel Bitmap Heap Scan on restaurant_order ro  (cost=23.87..2357.58 rows=1244 width=1257) (actual time=11.931..38.163 rows=98 loops=2)"
      Recheck Cond: (status = ANY ('{70,73}'::integer[]))"
      Heap Blocks: exact=28755"
      ->  Bitmap Index Scan on ro__status  (cost=0.00..23.34 rows=2115 width=0) (actual time=9.168..9.168 rows=51540 loops=1)"
            Index Cond: (status = ANY ('{70,73}'::integer[]))"

我有一些问题。

首先是位图索引扫描部分。Postgres 遍历 51540 条 ro__status 索引记录，Index Cond: (status = ANY ('{70,73}'::integer[]))"并创建一个包含 28755 个元素的位图。它的键是对应表行的物理位置（exact在Heap Blocks节中表示）。这个对吗？
其次，这张图被传递到 Bitmap Heap Scan 阶段。Recheck Cond实际上并没有执行，因为堆块不是有损样式。位图堆扫描按元组的物理位置对位图进行排序，以启用顺序访问。然后它分两次依次读取表数据 ( loops=2) 并获得不超过 196 个表行。那是对的吗？
线中反映的位图大小Heap Blocks: exact=28755随时间变化很大。差异是两个数量级。比如昨天是500左右，为什么会这样？
现在，为什么在位图索引扫描阶段创建的位图有这么多键？有 ro__status 索引可以表明只有大约 200 条状态为 70 和 73 的记录。我想不出任何原因阻止 postgres 只保留那些实际满足index cond. 开销似乎很大：而不是约 200 个键，而是 28755 个！
为什么位图堆扫描需要这么长时间？据我所知，有两次顺序读取（loops=2），它应该花费更少的时间，不是吗？或者，按元组的物理位置排序的位图是罪魁祸首吗？
我应该担心估计不佳吗？如果是这样，增加 default_statistics_target 应该会有所帮助，对吧？现在默认为 100。

以防万一，这是一个完整的计划：

"Group  (cost=51297.15..52767.65 rows=19998 width=1261) (actual time=42.555..42.555 rows=0 loops=1)"
"  Group Key: ro.order_id, r.id"
"  ->  Gather Merge  (cost=51297.15..52708.83 rows=11764 width=1261) (actual time=42.554..45.459 rows=0 loops=1)"
"        Workers Planned: 1"
"        Workers Launched: 1"
"        ->  Group  (cost=50297.14..50385.37 rows=11764 width=1261) (actual time=38.850..38.850 rows=0 loops=2)"
"              Group Key: ro.order_id, r.id"
"              ->  Sort  (cost=50297.14..50326.55 rows=11764 width=1261) (actual time=38.850..38.850 rows=0 loops=2)"
"                    Sort Key: ro.order_id, r.id"
"                    Sort Method: quicksort  Memory: 25kB"
"                    Worker 0:  Sort Method: quicksort  Memory: 25kB"
"                    ->  Nested Loop  (cost=31.84..45709.27 rows=11764 width=1261) (actual time=38.819..38.819 rows=0 loops=2)"
"                          ->  Nested Loop Left Join  (cost=27.21..5194.50 rows=5882 width=1325) (actual time=38.819..38.819 rows=0 loops=2)"
"                                ->  Nested Loop Left Join  (cost=27.20..5076.49 rows=59 width=1293) (actual time=38.818..38.818 rows=0 loops=2)"
"                                      ->  Nested Loop  (cost=27.20..5074.49 rows=1 width=1261) (actual time=38.818..38.818 rows=0 loops=2)"
"                                            ->  Hash Join  (cost=26.93..5073.59 rows=1 width=1257) (actual time=38.817..38.818 rows=0 loops=2)"
"                                                  Hash Cond: (c2.real_physical_courier_1c_id = c1.real_physical_courier_1c_id)"
"                                                  ->  Nested Loop  (cost=24.28..5068.22 rows=1038 width=1267) (actual time=11.960..38.732 rows=98 loops=2)"
"                                                        ->  Parallel Bitmap Heap Scan on restaurant_order ro  (cost=23.87..2357.58 rows=1244 width=1257) (actual time=11.931..38.163 rows=98 loops=2)"
"                                                              Recheck Cond: (status = ANY ('{70,73}'::integer[]))"
"                                                              Heap Blocks: exact=28755"
"                                                              ->  Bitmap Index Scan on ro__status  (cost=0.00..23.34 rows=2115 width=0) (actual time=9.168..9.168 rows=51540 loops=1)"
"                                                                    Index Cond: (status = ANY ('{70,73}'::integer[]))"
"                                                        ->  Index Scan using courier_pkey on courier c2  (cost=0.41..2.18 rows=1 width=26) (actual time=0.005..0.005 rows=1 loops=195)"
"                                                              Index Cond: (id = ro.courier_id)"
"                                                  ->  Hash  (cost=2.63..2.63 rows=1 width=10) (actual time=0.039..0.039 rows=1 loops=2)"
"                                                        Buckets: 1024  Batches: 1  Memory Usage: 9kB"
"                                                        ->  Index Scan using courier_pkey on courier c1  (cost=0.41..2.63 rows=1 width=10) (actual time=0.034..0.034 rows=1 loops=2)"
"                                                              Index Cond: (id = '7b35cdab-b423-472a-bde1-d6699f6cefd3'::uuid)"
"                                            ->  Index Only Scan using restaurant_pkey on restaurant r  (cost=0.27..0.89 rows=1 width=4) (never executed)"
"                                                  Index Cond: (id = ro.restaurant_id)"
"                                                  Heap Fetches: 0"
"                                      ->  Function Scan on jsonb_array_elements jae  (cost=0.00..1.00 rows=100 width=32) (never executed)"
"                                ->  Function Scan on jsonb_array_elements ji  (cost=0.01..1.00 rows=100 width=32) (never executed)"
"                          ->  Bitmap Heap Scan on catalogue c  (cost=4.63..6.87 rows=2 width=4) (never executed)"
"                                Recheck Cond: ((id = ((jae.value -> 'id'::text))::integer) OR (id = ((ji.value -> 'id'::text))::integer))"
"                                ->  BitmapOr  (cost=4.63..4.63 rows=2 width=0) (never executed)"
"                                      ->  Bitmap Index Scan on catalogue_pkey  (cost=0.00..0.97 rows=1 width=0) (never executed)"
"                                            Index Cond: (id = ((jae.value -> 'id'::text))::integer)"
"                                      ->  Bitmap Index Scan on catalogue_pkey  (cost=0.00..0.97 rows=1 width=0) (never executed)"
"                                            Index Cond: (id = ((ji.value -> 'id'::text))::integer)"
"Planning Time: 1.113 ms"
"Execution Time: 45.588 ms"

mj_

Asked: 2021-11-06 12:56:25 +0800 CST

postgres 实际上是如何获得一些列的？执行计划问题

我有一张有一堆列的桌子。其中facility_id和po_date。我正在编写一个复杂的查询，并且我在这两列上有一个索引。您可以从下面的计划器输出中看到 Postgres 在加入我的facility表时使用该索引作为索引条件。

我的问题实际上是关于Filter:下面一行中列举的列的存在。exclude, verified, 并且deleted不在顶部引用的索引中，那么 Postgres 是如何实际获取这些列中的数据的呢？我本来希望明确引用全表扫描，但是在任何地方都看不到。全表扫描可以悄悄地隐藏在一个中Filter吗？

该表大小为 50M 行，我使用的是 Postgres 10。此外，这部分查询位于 CTE 中。

->  Index Scan using clustered_table_facility_id_po_date_idx on clustered_table cp  (cost=0.56..1401.08 rows=31929 width=37) (actual time=0.021..5.331 rows=9572 loops=3640)
      Index Cond: (facility_id = af.id)
      Filter: ((NOT exclude) AND (verified IS NULL) AND (deleted IS NULL))
      Rows Removed by Filter: 2819

Tuukka Mustonen

Asked: 2021-10-13 00:39:29 +0800 CST

使用 JSONB 字段的部分表达式索引显示 MCV 和非 MCV 值的相同行数的查询规划器

我在 PostgreSQL 13.3 中有下表：

CREATE TABLE node (
  id serial NOT NULL,
  meta jsonb NOT NULL,
  ...
);

还有索引：

CREATE INDEX node_meta_group_owner_uuid ON node USING BTREE ((meta ->> 'group_owner_uuid'));

（更新：这个索引实际上是部分的，有子句... WHERE meta ->> 'group_owner_uuid' IS NOT NULL。这导致了在评论和答案中读到的决议。）

该表有约 20M 行。我跑了VACUUM ANALYZE。

行数：

db=> SELECT * FROM pg_class WHERE relname LIKE 'node_meta_group_owner_uuid';

-[ RECORD 1 ]-------+---------------------------
oid                 | 19449180
relname             | node_meta_group_owner_uuid
...
relpages            | 29815
reltuples           | 2.0835164e+07
...

指数统计：

db=> SELECT * FROM pg_stats WHERE tablename = 'node_meta_group_owner_uuid';

schemaname             | public
tablename              | node_meta_group_owner_uuid
attname                | expr
inherited              | f
null_frac              | 0
avg_width              | 40
n_distinct             | 812466
most_common_vals       | {48d11628-bfe9-4512-97e0-b308b7b5ac76,6a6b937f-c17c-49cb-a55a-e5346fe4ecfe,949b6f2c-2aae-42e0-a237-58cac017c6a0,f1792b9d-78a1-4811-a2e6-61532b689d07,...}
most_common_freqs      | {0.00024385618,0.00020321348,0.00013547565,0.00012192809,...}
histogram_bounds       | {00000c34-0cfa-443c-bbfd-75a7df972dde,028ca2c2-6bea-4fdd-a19c-c8b1976e96be,0518044d-41bf-40da-9bc6-763b0883d65b,07b677e7-747e-438e-a6fb-2c4af7c1c435,...}
correlation            | -0.00047400856
most_common_elems      | 
most_common_elem_freqs | 
elem_count_histogram   |

WHERE现在，给定一个在 MCV 列表中不存在的条目 (in )：

db=> EXPLAIN SELECT COUNT(*) FROM node WHERE meta ->> 'group_owner_uuid' = 'a';
                                                         QUERY PLAN                                                          
-----------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=405494.13..405494.14 rows=1 width=8)
   ->  Index Scan using node_meta_group_owner_uuid on node t1  (cost=0.56..405233.93 rows=104081 width=0)
         Index Cond: ((meta ->> 'group_owner_uuid'::text) = 'a'::text)
(3 rows)

索引扫描估计找到104081行。

使用 MCV 列表中确实存在的条目：

db=> EXPLAIN SELECT COUNT(*) FROM node WHERE meta ->> 'group_owner_uuid' = '48d11628-bfe9-4512-97e0-b308b7b5ac76';
                                                         QUERY PLAN                                                          
-----------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=405494.13..405494.14 rows=1 width=8)
   ->  Index Scan using node_meta_group_owner_uuid on node t1  (cost=0.56..405233.93 rows=104081 width=0)
         Index Cond: ((meta ->> 'group_owner_uuid'::text) = '48d11628-bfe9-4512-97e0-b308b7b5ac76'::text)
(3 rows)

对行数的估计是相同的。我本来预计这一行的估计会有所不同（0.00024385618 * 2.0816244e+07 = ~5076准确地说），因为在 MCV 列表中显然有一个条目。

为什么使用 MCV 值查询不返回不同的行成本估计？这与JSONB有关吗？

我确实了解 PostgreSQL 不会收集 JSONB 列的统计信息，但表达式索引会。

为什么 SQL Server 在更新位列时扫描所有行，即使是通过链接服务器的主键

SQL 所有的 Procs 都有 WITH RECOMPLILE 所以如何维护一个臃肿的计划缓存

为什么 SQL Server 可以准确地跟踪某些多语句表值函数查询计划而不是其他的时间？

设置

功能一

功能二

MAXDOP 1 处的功能二

问题

在我的查询计划中 - 多个节点说他们的成本是 100%

在并行位图哈希匹配计划中读取可能不正确的行数

何时从表中选择前 5 行需要太多时间？

执行计划中的位图创建导致聚集索引扫描的错误估计

Postgresql 10：具有精确堆块的位图堆扫描

postgres 实际上是如何获得一些列的？执行计划问题

使用 JSONB 字段的部分表达式索引显示 MCV 和非 MCV 值的相同行数的查询规划器

连接到 PostgreSQL 服务器：致命：主机没有 pg_hba.conf 条目

如何让sqlplus的输出出现在一行中？

选择具有最大日期或最晚日期的日期

如何列出 PostgreSQL 中的所有模式？

列出指定表的所有列

如何在不修改我自己的 tnsnames.ora 的情况下使用 sqlplus 连接到位于另一台主机上的 Oracle 数据库

你如何mysqldump特定的表？

使用 psql 列出数据库权限

如何从 PostgreSQL 中的选择查询中将值插入表中？

如何使用 psql 列出所有数据库和表？

问题[execution-plan](dba)

设置

功能一

功能二

MAXDOP 1 处的功能二

问题