SQL Server index rebuid online and blocking scenario

A couple of months ago, I experienced a problem about index rebuild online operation on SQL Server. In short, the operation was supposed to be online and to never block concurrent queries. But in fact, it was not the case (or to be more precise, it was partially the case) and to make the scenario more complex, we experienced different behaviors regarding the context. Let’s start the story with the initial context: in my company, we usually go through continuous deployment including SQL modification scripts and because we usually rely on daily pipeline, we must ensure related SQL operations are not too disruptive to avoid impacting the user experience.

Sometimes, we must introduce new indexes to deployment scripts and according to how disruptive the script can be, a discussion between Devs and Ops is initiated, and it results either to manage manually by the Ops team or to deploy it automatically through the automatic deployment pipeline by Devs.

Non-disruptive operations can be achieved in many ways and ONLINE capabilities of SQL Server may be part of the solution and this is what I suggested with one of our scripts. Let’s illustrate this context with the following example. I created a table named dbo.t1 with a bunch of rows:

USE [test];

SET NOCOUNT ON;

DROP TABLE IF EXISTS dbo.t1;
GO

CREATE TABLE dbo.t1 (
    id INT IDENTITY(1,1) NOT NULL PRIMARY KEY,
    col1 VARCHAR(50) NULL
);
GO

INSERT INTO dbo.t1 (col1) VALUES (REPLICATE('T', 50));
GO …
EXEC sp_spaceused 'dbo.t1'
--name  rows    reserved    data    index_size  unused
--t1    5226496                 1058000 KB  696872 KB   342888 KB   18240 KB

Go ahead and let’ set the context with a pattern of scripts deployment we went through during this specific deployment. Let’s precise this script is over simplified, but I keep the script voluntary simple to focus only on the most important part. You will notice the script includes two steps with operations on the same table including updating / fixing values in col2 first and then rebuilding index on col1.

/* Code before */

-- Update some values in the col1 colum
UPDATE [dbo].[t1]
SET col1 = REPLICATE('B', 50)

-- Then create an index on col1 column
CREATE INDEX [col1]
ON [dbo].[t1] (col1) WITH (ONLINE = ON);
GO

At the initial stage, the creation of index was by default (OFFLINE). Having discussed this point with the DEV team, we decided to create the index ONLINE in this context. The choice between OFFLINE / ONLINE operation is often not trivial and should be evaluated carefully but to keep simple, let’s say it was the right way to go in our context. Generally speaking, online operations are slower, but the tradeoff was acceptable in order to minimize blocking issues during this deployment. At least, this is what I thought …

In my demo, without any concurrent workload against the dbo.t1 table, creating the index offline took 6s compared to the online method with 12s. So, an expected result here …

Let’s run this another query in another session:

SELECT id, col1
FROM dbo.t1
WHERE id BETWEEN 1 AND 2

In a normal situation, this query should be blocked in a short time corresponding to the duration of the update operation. But once the update is done, blocking situation should disappear even during the index rebuild operation that is performed ONLINE.

But now let’s add Flyway to the context. Flyway is an open source tool we are using for automatic deployment of SQL objects. The deployment script was executed from it in ACC environment and we noticed longer blocked concurrent accesses this time. This goes against what we would ideally like. Digging through this issue with the DEV team, we also noticed the following message when running the deployment script:

Warning: Online index operation on table ‘dbo.t1 will proceed but concurrent access to the table may be limited due to residual lock on the table from a previous operation in the same transaction.

This is something I didn’t noticed from SQL Server Management Studio when I tested the same deployment script. So, what happened here?

Referring to the Flyway documentation, it is mentioned that Flyway always wraps the execution of an entire migration within a single transaction by default and it was exactly the root cause of the issue.

Let’s try with some experimentations:

Test 1: Update + rebuilding index online in implicit transaction mode (one transaction per query).

-- Update some values in the col1 colum
UPDATE [dbo].[t1]
SET col1 = REPLICATE('B', 50)

-- Then create an index on col1 column
CREATE INDEX [col1]
ON [dbo].[t1] (col1) WITH (ONLINE = ON);
GO
-- In another session
SELECT id, col1
FROM dbo.t1
WHERE id BETWEEN 1 AND 2

Test 2: Update + rebuilding index online within one single explicit transaction

BEGIN TRAN;

-- Update some values in the col1 colum
UPDATE [dbo].[t1]
SET col1 = REPLICATE('B', 50)

-- Then create an index on col1 column
CREATE INDEX [col1]
ON [dbo].[t1] (col1) WITH (ONLINE = ON);
GO
COMMIT TRAN;
-- In another session
SELECT id, col1
FROM dbo.t1
WHERE id BETWEEN 1 AND 2

After running these two scripts, we can notice the blocking duration of SELECT query is longer in test2 as shown in the picture below:

166 - 1 - blocked process

In the test 1, the duration of the blocking session corresponds to that for updating operation (first step of the script). However, in the test 2, we must include the time for creating the index but let’s precise the index is not the blocking operation at all, but it increases the residual locking put by the previous update operation. In short, this is exactly what the warning message is telling us. I think you can imagine easily which impact such situation may implies if the index creation takes a long time. You may get exactly the opposite of what you really expected.

Obviously, this is not a recommended situation and creating an index should be run in very narrow and constrained transaction.But from my experience, things are never always obvious and regarding your context, you should keep an eye of how transactions are managed especially when it comes automatic deployment stuff that could be quickly out of the scope of the DBA / Ops team. Strong collaboration with DEV team is recommended to anticipate this kind of issue.

See you !!

Laisser un commentaire