SQL Server - 使用聚集索引时如何存储数据页

Question

Matthew

Asked: 2016-12-14 16:44:33 +0800 CST2016-12-14 16:44:33 +0800 CST 2016-12-14 16:44:33 +0800 CST

与有条件的 INSERT 和 SELECT 相比，带有 OUTPUT 的 MERGE 是更好的做法吗？

772

我们经常会遇到“如果不存在，就插入”的情况。Dan Guzman 的博客对如何使这个进程线程安全进行了出色的调查。

我有一个基本表，它只是将字符串从SEQUENCE. 在存储过程中，我需要获取值的整数键（如果存在），或者INSERT获取结果值。列上有唯一性约束，dbo.NameLookup.ItemName因此数据完整性没有风险，但我不想遇到异常。

这不是IDENTITY我无法得到的，在某些情况下SCOPE_IDENTITY价值可能是。NULL

在我的情况下，我只需要处理INSERT桌子上的安全问题，所以我试图决定这样使用是否更好MERGE：

SET NOCOUNT, XACT_ABORT ON;

DECLARE @vValueId INT 
DECLARE @inserted AS TABLE (Id INT NOT NULL)

MERGE 
    dbo.NameLookup WITH (HOLDLOCK) AS f 
USING 
    (SELECT @vName AS val WHERE @vName IS NOT NULL AND LEN(@vName) > 0) AS new_item
        ON f.ItemName= new_item.val
WHEN MATCHED THEN
    UPDATE SET @vValueId = f.Id
WHEN NOT MATCHED BY TARGET THEN
    INSERT
      (ItemName)
    VALUES
      (@vName)
OUTPUT inserted.Id AS Id INTO @inserted;
SELECT @vValueId = s.Id FROM @inserted AS s

我可以在不使用MERGE条件的情况下执行此操作，INSERT然后使用SELECT 我认为第二种方法对读者来说更清楚，但我不相信这是“更好”的做法

SET NOCOUNT, XACT_ABORT ON;

INSERT INTO 
    dbo.NameLookup (ItemName)
SELECT
    @vName
WHERE
    NOT EXISTS (SELECT * FROM dbo.NameLookup AS t WHERE @vName IS NOT NULL AND LEN(@vName) > 0 AND t.ItemName = @vName)

DECLARE @vValueId int;
SELECT @vValueId = i.Id FROM dbo.NameLookup AS i WHERE i.ItemName = @vName

或者也许还有另一种我没有考虑过的更好的方法

我确实搜索并参考了其他问题。这个：https ://stackoverflow.com/questions/5288283/sql-server-insert-if-not-exists-best-practice是我能找到的最合适的，但似乎不太适用于我的用例。IF NOT EXISTS() THEN我认为不可接受的方法的其他问题。

2 个回答

Voted

Solomon Rutzky · Answer 1 · 2016-12-16T22:32:09+08:00

因为您使用的是序列，所以您可以使用相同的NEXT VALUE FOR函数——您已经在主键字段的默认约束中拥有该函数——提前Id生成一个新值。Id首先生成值意味着您不必担心没有SCOPE_IDENTITY，这意味着您不需要该OUTPUT子句或执行附加SELECT操作来获取新值；在你做之前你将拥有价值，INSERT你甚至不需要搞砸SET IDENTITY INSERT ON / OFF:-)

所以这需要照顾整体情况的一部分。另一部分是同时处理两个进程的并发问题，找不到完全相同的字符串的现有行，并继续处理INSERT. 问题在于避免可能发生的唯一约束违规。

处理这些类型的并发问题的一种方法是强制此特定操作为单线程。做到这一点的方法是使用应用程序锁（跨会话工作）。虽然有效，但对于这种碰撞频率可能相当低的情况，它们可能会有点笨拙。

处理冲突的另一种方法是接受它们有时会发生并处理它们而不是试图避免它们。使用该TRY...CATCH构造，您可以有效地捕获特定错误（在这种情况下：“唯一约束违规”，Msg 2601）并重新执行SELECT以获取该Id值，因为我们知道它现在存在是由于CATCH与该特定的块中错误。其他错误可以以典型RAISERROR/RETURN或THROW方式处理。

测试设置：序列、表格和唯一索引

USE [tempdb];

CREATE SEQUENCE dbo.MagicNumber
  AS INT
  START WITH 1
  INCREMENT BY 1;

CREATE TABLE dbo.NameLookup
(
  [Id] INT NOT NULL
         CONSTRAINT [PK_NameLookup] PRIMARY KEY CLUSTERED
        CONSTRAINT [DF_NameLookup_Id] DEFAULT (NEXT VALUE FOR dbo.MagicNumber),
  [ItemName] NVARCHAR(50) NOT NULL         
);

CREATE UNIQUE NONCLUSTERED INDEX [UIX_NameLookup_ItemName]
  ON dbo.NameLookup ([ItemName]);
GO

测试设置：存储过程

CREATE PROCEDURE dbo.GetOrInsertName
(
  @SomeName NVARCHAR(50),
  @ID INT OUTPUT,
  @TestRaceCondition BIT = 0
)
AS
SET NOCOUNT ON;

BEGIN TRY
  SELECT @ID = nl.[Id]
  FROM   dbo.NameLookup nl
  WHERE  nl.[ItemName] = @SomeName
  AND    @TestRaceCondition = 0;

  IF (@ID IS NULL)
  BEGIN
    SET @ID = NEXT VALUE FOR dbo.MagicNumber;

    INSERT INTO dbo.NameLookup ([Id], [ItemName])
    VALUES (@ID, @SomeName);
  END;
END TRY
BEGIN CATCH
  IF (ERROR_NUMBER() = 2601) -- "Cannot insert duplicate key row in object"
  BEGIN
    SELECT @ID = nl.[Id]
    FROM   dbo.NameLookup nl
    WHERE  nl.[ItemName] = @SomeName;
  END;
  ELSE
  BEGIN
    ;THROW; -- SQL Server 2012 or newer
    /*
    DECLARE @ErrorNumber INT = ERROR_NUMBER(),
            @ErrorMessage NVARCHAR(4000) = ERROR_MESSAGE();

    RAISERROR(N'Msg %d: %s', 16, 1, @ErrorNumber, @ErrorMessage);
    RETURN;
    */
  END;

END CATCH;
GO

考试

DECLARE @ItemID INT;
EXEC dbo.GetOrInsertName
  @SomeName = N'test1',
  @ID = @ItemID OUTPUT;
SELECT @ItemID AS [ItemID];
GO

DECLARE @ItemID INT;
EXEC dbo.GetOrInsertName
  @SomeName = N'test1',
  @ID = @ItemID OUTPUT,
  @TestRaceCondition = 1;
SELECT @ItemID AS [ItemID];
GO

来自OP的问题

为什么这比MERGE? TRY如果不使用该WHERE NOT EXISTS子句，我不会获得相同的功能吗？

MERGE有各种“问题”（@SqlZim 的答案中链接了几个参考资料，因此无需在此处复制该信息）。而且，这种方法没有额外的锁定（争用较少），因此在并发方面应该更好。在这种方法中，您将永远不会遇到唯一约束违规，所有这些都没有任何HOLDLOCK等。它几乎可以保证工作。

这种方法背后的原因是：

如果你有足够多的执行这个过程以至于你需要担心冲突，那么你不想：
1. 采取不必要的措施
2. 锁定任何资源的时间超过必要的时间
由于冲突只会发生在新条目上（新条目同时提交），因此首先落入CATCH区块的频率将非常低。优化将运行 99% 的时间的代码而不是将运行 1% 的时间的代码更有意义（除非优化两者都没有成本，但这里不是这种情况）。

来自@SqlZim 的回答的评论（强调添加）

我个人更喜欢尝试定制解决方案，以避免在可能的情况下这样做。在这种情况下，我不觉得使用 from 的锁serializable是一种笨拙的方法，我相信它可以很好地处理高并发。

如果将第一句话修改为“和_当谨慎时”，我会同意。仅仅因为某事在技术上是可行的并不意味着该情况（即预期的用例）会从中受益。

我用这种方法看到的问题是它锁定的比建议的要多。重新阅读有关“可序列化”的引用文档很重要，特别是以下内容（强调添加）：

在当前事务完成之前，其他事务不能插入键值落在当前事务中的任何语句读取的键范围内的新行。

现在，这里是示例代码中的注释：

SELECT [Id]
FROM   dbo.NameLookup WITH (SERIALIZABLE) /* hold that key range for @vName */

那里的操作词是“范围”。被锁定的不仅仅是中的值@vName，更准确地说，是从开始的范围这个新值应该去的位置（即在新值适合的任一侧的现有键值之间），但不是值本身。这意味着，其他进程将被阻止插入新值，具体取决于当前正在查找的值。如果查找是在范围的顶部进行的，那么插入任何可能占据相同位置的东西都将被阻止。例如，如果值“a”、“b”和“d”存在，那么如果一个进程正在对“f”执行 SELECT，那么将无法插入值“g”甚至“e”（因为其中任何一个都会在“d”之后立即出现）。但是，插入“c”值是可能的，因为它不会放在“保留”范围内。

以下示例应说明此行为：

（在查询选项卡（即会话）#1）

INSERT INTO dbo.NameLookup ([ItemName]) VALUES (N'test5');

BEGIN TRAN;

SELECT [Id]
FROM   dbo.NameLookup WITH (SERIALIZABLE) /* hold that key range for @vName */
WHERE  ItemName = N'test8';

--ROLLBACK;

（在查询选项卡（即会话）#2 中）

EXEC dbo.NameLookup_getset_byName @vName = N'test4';
-- works just fine

EXEC dbo.NameLookup_getset_byName @vName = N'test9';
-- hangs until you either hit "cancel" in this query tab,
-- OR issue a COMMIT or ROLLBACK in query tab #1

EXEC dbo.NameLookup_getset_byName @vName = N'test7';
-- hangs until you either hit "cancel" in this query tab,
-- OR issue a COMMIT or ROLLBACK in query tab #1

EXEC dbo.NameLookup_getset_byName @vName = N's';
-- works just fine

EXEC dbo.NameLookup_getset_byName @vName = N'u';
-- hangs until you either hit "cancel" in this query tab,
-- OR issue a COMMIT or ROLLBACK in query tab #1

同样，如果值“C”存在，并且值“A”被选中（并因此被锁定），那么您可以插入值“D”，但不能插入值“B”：

（在查询选项卡（即会话）#1）

INSERT INTO dbo.NameLookup ([ItemName]) VALUES (N'testC');

BEGIN TRAN

SELECT [Id]
FROM   dbo.NameLookup WITH (SERIALIZABLE) /* hold that key range for @vName */
WHERE  ItemName = N'testA';

--ROLLBACK;

（在查询选项卡（即会话）#2 中）

EXEC dbo.NameLookup_getset_byName @vName = N'testD';
-- works just fine

EXEC dbo.NameLookup_getset_byName @vName = N'testB';
-- hangs until you either hit "cancel" in this query tab,
-- OR issue a COMMIT or ROLLBACK in query tab #1

公平地说，在我建议的方法中，当出现异常时，事务日志中将有 4 个条目不会在这种“可序列化事务”方法中发生。但是，正如我上面所说，如果异常发生的时间为 1%（甚至 5%），那么与初始 SELECT 暂时阻塞 INSERT 操作的可能性更大的情况相比，这影响要小得多。

这种“可序列化事务 + OUTPUT 子句”方法的另一个（尽管很小）问题是该OUTPUT子句（在其当前用法中）将数据作为结果集发回。OUTPUT结果集需要比简单参数更多的开销（可能在双方：在 SQL Server 中管理内部游标，在应用程序层中管理 DataReader 对象）。鉴于我们只处理单个标量值，并且假设执行频率很高，结果集的额外开销可能会增加。

虽然该OUTPUT子句可以以返回OUTPUT参数的方式使用，但这需要额外的步骤来创建临时表或表变量，然后从该临时表/表变量中选择值到OUTPUT参数中。

进一步澄清：_{对@SqlZim 的回应（更新的答案）对我对@SqlZim 的回应（在原始答案中）对我关于并发和性能的声明的回应；-)}

对不起，如果这部分有点长，但在这一点上，我们只是了解这两种方法的细微差别。

serializable我相信信息的呈现方式可能会导致人们在原始问题中提出的场景中使用时可能会遇到的锁定量的错误假设。

是的，我承认我有偏见，但公平地说：

一个人不可能没有偏见，至少在某种程度上，我确实尽量将其保持在最低限度，
给出的例子很简单，但这是为了说明目的，在不过度复杂化的情况下传达行为。暗示频率过高并不是有意的，尽管我确实理解我也没有明确说明其他情况，并且可以将其解读为暗示比实际存在的问题更大。我将尝试在下面澄清这一点。
我还包括一个锁定两个现有键之间范围的示例（第二组“查询选项卡 1”和“查询选项卡 2”块）。
我确实发现（并自愿）了我的方法的“隐藏成本”，即每次INSERT由于违反唯一约束而失败时的四个额外的 Tran Log 条目。我没有看到任何其他答案/帖子中提到的内容。

关于@gbn 的“JFDI”方法，Michael J. Swart 的“Ugly Pragmatism For The Win”帖子，以及 Aaron Bertrand 对 Michael 帖子的评论（关于他的测试显示哪些场景降低了性能），以及您对“Michael J 的适应”的评论. 斯图尔特对@gbn 的 Try Catch JFDI 程序的改编”指出：

如果您更频繁地插入新值而不是选择现有值，这可能比@srutzky 的版本更高效。否则我会更喜欢@srutzky 的版本而不是这个版本。

关于与“JFDI”方法相关的 gbn / Michael / Aaron 讨论，将我的建议等同于 gbn 的“JFDI”方法是不正确的。由于“获取或插入”操作的性质，明确需要执行SELECT以获取ID现有记录的值。此 SELECT 充当IF EXISTS检查，这使得这种方法更等同于 Aaron 测试的“CheckTryCatch”变体。Michael 重新编写的代码（以及您对 Michael 的改编的最终改编）还包括WHERE NOT EXISTS首先进行相同的检查。因此，我的建议（连同迈克尔的最终代码和您对他的最终代码的改编）实际上不会CATCH经常遇到问题。只能是两个会话的情况，ItemNameINSERT...SELECT在完全相同的时刻，使得两个会话在完全相同的时刻收到一个“真” WHERE NOT EXISTS，因此都试图INSERT在完全相同的时刻做。当没有其他进程在同一时刻尝试这样做时，这种非常具体的情况比选择现有的ItemName或插入新的要少得多。ItemName

考虑到以上所有因素：为什么我更喜欢我的方法？

首先，让我们看看在“可序列化”方法中发生了什么锁定。如上所述，被锁定的“范围”取决于新键值适合的任一侧的现有键值。如果该方向上没有现有的键值，则范围的开始或结束也可以分别是索引的开始或结束。假设我们有以下索引和键（^表示索引的开头，表示索引$的结尾）：

Range #:    |--- 1 ---|--- 2 ---|--- 3 ---|--- 4 ---|
Key Value:  ^         C         F         J         $

如果会话 55 尝试插入以下键值：

A，则范围 #1（从^到C）被锁定：会话 56 不能插入的值B，即使是唯一且有效的（还）。但是会话 56 可以插入D、G和的值M。
D，则范围#2（从C到F）被锁定：会话 56 无法插入E（尚未）的值。但是会话 56 可以插入A、G和的值M。
M，则范围#4（从J到$）被锁定：会话 56 无法插入X（尚未）的值。但是会话 56 可以插入A、D和的值G。

随着更多键值的添加，键值之间的范围变得更窄，从而降低了同时插入多个值在同一范围内争斗的概率/频率。诚然，这不是一个大问题，幸运的是，它似乎是一个实际上随着时间的推移而减少的问题。

上面描述了我的方法的问题：它仅在两个会话尝试同时插入相同的键值时发生。在这方面，它归结为发生概率更高的事情：同时尝试两个不同但接近的键值，还是同时尝试相同的键值？我想答案在于执行插入的应用程序的结构，但一般来说，我认为更有可能插入恰好共享相同范围的两个不同值。但真正知道的唯一方法是在 OPs 系统上测试两者。

接下来，让我们考虑两种情况以及每种方法如何处理它们：

所有请求都是针对唯一键值的：

在这种情况下，CATCH我的建议中的块永远不会输入，因此没有“问题”（即 4 个 tran 日志条目和执行此操作所需的时间）。但是，在“可序列化”方法中，即使所有插入都是唯一的，也总会有可能阻塞同一范围内的其他插入（尽管不会持续很长时间）。
同一时间对同一个键值的高频率请求：

在这种情况下——对于不存在的键值的传入请求而言，唯一性程度非常低——CATCH我建议中的块将定期输入。这样做的效果是，每次失败的插入都需要自动回滚并将 4 个条目写入事务日志，每次都会对性能造成轻微影响。但是整体操作不应该失败（至少不是因为这个）。

（以前版本的“更新”方法存在一个问题，导致它遭受死锁。updlock添加了一个提示来解决这个问题，它不再出现死锁。）但是，在“可序列化”的方法中（即使是更新、优化的版本），操作会死锁。为什么？因为该serializable行为仅阻止INSERT已读取并因此锁定的范围内的操作；它不会阻止SELECT在该范围内的操作。

在这种情况下，这种serializable方法似乎没有额外的开销，并且性能可能比我建议的要好一些。

与许多/大多数关于性能的讨论一样，由于有很多因素会影响结果，真正了解某事将如何执行的唯一方法是在它将运行的目标环境中进行尝试。到那时，这将不是意见问题:)。

SqlZim · Answer 2 · 2016-12-17T11:10:50+08:00

更新的答案

回复@srutzky

这种“可序列化事务 + OUTPUT 子句”方法的另一个（尽管很小）问题是 OUTPUT 子句（在其当前用法中）将数据作为结果集发回。结果集需要比简单的 OUTPUT 参数更多的开销（可能在双方：在 SQL Server 中管理内部游标，在应用程序层中管理 DataReader 对象）。鉴于我们只处理单个标量值，并且假设执行频率很高，结果集的额外开销可能会增加。

我同意，出于同样的原因，我确实在谨慎时使用输出参数。在我最初的答案中不使用输出参数是我的错误，我很懒惰。

这是使用输出参数、附加优化以及@srutzky 在他的回答中解释 next value for的修改后的过程：

create procedure dbo.NameLookup_getset_byName (@vName nvarchar(50), @vValueId int output) as
begin
  set nocount on;
  set xact_abort on;
  set @vValueId = null;
  if nullif(@vName,'') is null                                 
    return;                                        /* if @vName is empty, return early */
  select  @vValueId = Id                                              /* go get the Id */
    from  dbo.NameLookup
    where ItemName = @vName;
  if @vValueId is not null                                 /* if we got the id, return */
    return;
  begin try;                                  /* if it is not there, then get the lock */
    begin tran;
      select  @vValueId = Id
        from  dbo.NameLookup with (updlock, serializable) /* hold key range for @vName */
        where ItemName = @vName;
      if @@rowcount = 0                    /* if we still do not have an Id for @vName */
      begin;                                         /* get a new Id and insert @vName */
        set @vValueId = next value for dbo.IdSequence;      /* get next sequence value */
        insert into dbo.NameLookup (ItemName, Id)
          values (@vName, @vValueId);
      end;
    commit tran;
  end try
  begin catch;
    if @@trancount > 0 
      begin;
        rollback transaction;
        throw;
      end;
  end catch;
end;

更新说明：updlock在这种情况下，包含 select 将获取正确的锁。感谢@srutzky，他指出仅serializable在select.

注意：这可能不是这种情况，但如果可能会使用 for 的值调用该过程@vValueId，包括set @vValueId = null;after set xact_abort on;，否则可以将其删除。

关于@srutzky 的键范围锁定行为示例：

@srutzky 只在他的表中使用一个值，并为他的测试锁定“下一个”/“无穷大”键以说明键范围锁定。serializable虽然他的测试说明了在这些情况下会发生什么，但我相信呈现信息的方式可能会导致对在原始问题中提出的场景中使用时可能会遇到的锁定量的错误假设。

尽管我在他介绍他的解释和关键范围锁定示例的方式中发现了偏见（可能是错误的），但它们仍然是正确的。

After more research, I found a particularly pertinent blog article from 2011 by Michael J. Swart: Mythbusting: Concurrent Update/Insert Solutions. In it, he tests multiple methods for accuracy and concurrency. Method 4: Increased Isolation + Fine Tuning Locks is based on Sam Saffron's post Insert or Update Pattern For SQL Server, and the only method in the original test to meet his expectations (joined later by merge with (holdlock)).

In February of 2016, Michael J. Swart posted Ugly Pragmatism For The Win. In that post, he covers some additional tuning he made to his Saffron upsert procedures to reduce locking (which I included in the procedure above).

After making those changes, Michael wasn't happy that his procedure was starting to look more complicated and consulted with a colleage named Chris. Chris read all of the original Mythbusters post and read all the comments and asked about @gbn's TRY CATCH JFDI pattern. This pattern is similar to @srutzky's answer, and is the solution that Michael ended up using in that instance.

Michael J Swart:

Yesterday I had my mind changed about the best way to do concurrency. I describe several methods in Mythbusting: Concurrent Update/Insert Solutions. My preferred method is to increase the isolation level and fine tune locks.

At least that was my preference. I recently changed my approach to use a method that gbn suggested in the comments. He describes his method as the “TRY CATCH JFDI pattern”. Normally I avoid solutions like that. There’s a rule of thumb that says developers should not rely on catching errors or exceptions for control flow. But I broke that rule of thumb yesterday.

By the way, I love the gbn’s description for the pattern “JFDI”. It reminds me of Shia Labeouf’s motivational video.

In my opinion, both solutions are viable. While I still prefer to increase the isolation level and fine tune locks, @srutzky's answer is also valid and may or may not be more performant in your specific situation.

Perhaps in the future I too will arrive at the same conclusion that Michael J. Swart did, but I'm just not there yet.

It isn't my preference, but here is what my adapation of Michael J. Stewart's adaptation of @gbn's Try Catch JFDI procedure would look like:

create procedure dbo.NameLookup_JFDI (
    @vName nvarchar(50)
  , @vValueId int output
  ) as
begin
  set nocount on;
  set xact_abort on;
  set @vValueId = null;
  if nullif(@vName,'') is null                                 
    return;                     /* if @vName is empty, return early */
  begin try                                                 /* JFDI */
    insert into dbo.NameLookup (ItemName)
      select @vName
      where not exists (
        select 1
          from dbo.NameLookup
          where ItemName = @vName);
  end try
  begin catch        /* ignore duplicate key errors, throw the rest */
    if error_number() not in (2601, 2627) throw;
  end catch
  select  @vValueId = Id                              /* get the Id */
    from  dbo.NameLookup
    where ItemName = @vName
  end;

If you are inserting new values more often than selecting existing values, this may be more performant than @srutzky's version. Otherwise I would prefer @srutzky's version over this one.

Aaron Bertrand's comments on Michael J Swart's post links to relevant testing he has done and led to this exchange. Excerpt from comment section on Ugly Pragmatism For the Win:

Sometimes, though, JFDI leads to worse performance overall, depending on what % of calls fail. Raising exceptions has substantial overhead. I showed this in a couple of posts:

http://sqlperformance.com/2012/08/t-sql-queries/error-handling

https://www.mssqltips.com/sqlservertip/2632/checking-for-potential-constraint-violations-before-entering-sql-server-try-and-catch-logic/

Aaron Bertrand 发表评论 — 2016 年 2 月 11 日上午 11:49

和答复：

你是对的 Aaron，我们确实测试了它。

事实证明，在我们的案例中，失败的调用百分比为 0（四舍五入到最接近的百分比）。

我认为您说明了这一点，即根据经验法则逐案评估事物。

这也是我们添加非严格必要的 WHERE NOT EXISTS 子句的原因。

Michael J. Swart 的评论 — 2016 年 2 月 11 日上午 11:57

新链接：

Original answer

I still prefer the Sam Saffron upsert approach vs using merge, especially when dealing with a single row.

I would adapt that upsert method to this situation like this:

declare @vName nvarchar(50) = 'Invader';
declare @vValueId int       = null;

if nullif(@vName,'') is not null /* this gets your where condition taken care of before we start doing anything */
begin tran;
  select @vValueId = Id
    from dbo.NameLookup with (serializable) 
    where ItemName = @vName;
  if @@rowcount > 0 
    begin;
      select @vValueId as id;
    end;
    else
    begin;
      insert into dbo.NameLookup (ItemName)
        output inserted.id
          values (@vName);
      end;
commit tran;

I would be consistent with your naming, and as serializable is the same as holdlock, pick one and be consistent in its use. I tend to use serializable because it is the same name used as when specifying set transaction isolation level serializable.

By using serializable or holdlock a range lock is taken based on the value of @vName which makes any other operations wait if they selecting or inserting values into dbo.NameLookup that include the value in the where clause.

For the range lock to work properly, there needs to be an index on the ItemName column this applies when using merge as well.

Here is what the procedure would look like mostly following Erland Sommarskog's whitepapers for error handling, using throw. If throw isn't how you are raising your errors, change it to be consistent with the rest of your procedures:

create procedure dbo.NameLookup_getset_byName (@vName nvarchar(50) ) as
begin
  set nocount on;
  set xact_abort on;
  declare @vValueId int;
  if nullif(@vName,'') is null /* if @vName is null or empty, select Id as null */
    begin
      select Id = cast(null as int);
    end 
    else                       /* else go get the Id */
    begin try;
      begin tran;
        select @vValueId = Id
          from dbo.NameLookup with (serializable) /* hold key range for @vName */
          where ItemName = @vName;
        if @@rowcount > 0      /* if we have an Id for @vName select @vValueId */
          begin;
            select @vValueId as Id; 
          end;
          else                     /* else insert @vName and output the new Id */
          begin;
            insert into dbo.NameLookup (ItemName)
              output inserted.Id
                values (@vName);
            end;
      commit tran;
    end try
    begin catch;
      if @@trancount > 0 
        begin;
          rollback transaction;
          throw;
        end;
    end catch;
  end;
go

To summarize what is going on in the procedure above: set nocount on; set xact_abort on; like you always do, then if our input variable is null or empty, select id = cast(null as int) as the result. If it isn't null or empty, then get the Id for our variable while holding that spot in case it isn't there. If the Id is there, send it out. If it isn't there, insert it and send out that new Id.

Meanwhile, other calls to this procedure trying to find the Id for the same value will wait until the first transaction is done and then select & return it. Other calls to this procedure or other statements looking for other values will continue on because this one isn't in the way.

While I agree with @srutzky that you can handle collisions and swallow the exceptions for this sort of issue, I personally prefer to try and tailor a solution to avoid doing that when possible. In this case, I don't feel that using the locks from serializable is a heavy handed approach, and I would be confident it would handle high concurrency well.

Quote from sql server documentation on the table hints serializable / holdlock:

SERIALIZABLE

Is equivalent to HOLDLOCK. Makes shared locks more restrictive by holding them until a transaction is completed, instead of releasing the shared lock as soon as the required table or data page is no longer needed, whether the transaction has been completed or not. The scan is performed with the same semantics as a transaction running at the SERIALIZABLE isolation level. For more information about isolation levels, see SET TRANSACTION ISOLATION LEVEL (Transact-SQL).

Quote from sql server documentation on transaction isolation level serializable

SERIALIZABLE Specifies the following:

Statements cannot read data that has been modified but not yet committed by other transactions.

No other transactions can modify data that has been read by the current transaction until the current transaction completes.

在当前事务完成之前，其他事务不能插入键值落在当前事务中的任何语句读取的键范围内的新行。

与上述解决方案相关的链接：

MERGE有一个参差不齐的历史，似乎需要更多的时间来确保代码在所有这些语法下都按照您希望的方式运行。相关merge文章：

One last link, Kendra Little did a rough comparison of merge vs insert with left join, with the caveat where she says "I didn’t do thorough load testing on this", but it is still a good read.

与有条件的 INSERT 和 SELECT 相比，带有 OUTPUT 的 MERGE 是更好的做法吗？

更新的答案

Original answer

连接到 PostgreSQL 服务器：致命：主机没有 pg_hba.conf 条目

如何让sqlplus的输出出现在一行中？

选择具有最大日期或最晚日期的日期

如何列出 PostgreSQL 中的所有模式？

列出指定表的所有列

如何在不修改我自己的 tnsnames.ora 的情况下使用 sqlplus 连接到位于另一台主机上的 Oracle 数据库

你如何mysqldump特定的表？

使用 psql 列出数据库权限

如何从 PostgreSQL 中的选择查询中将值插入表中？

如何使用 psql 列出所有数据库和表？

与有条件的 INSERT 和 SELECT 相比，带有 OUTPUT 的 MERGE 是更好的做法吗？

2 个回答

更新的答案

Original answer

相关问题