David Barbarin » performance

Azure monitor as observability platform for Azure SQL Databases and more

mikedavem — Mon, 08 Feb 2021 16:57:26 +0000

In a previous blog post, I wrote about reasons we moved our monitoring of on-prem SQL Server instances on Prometheus and Grafana. But what about Cloud and database services?

We have different options and obviously in my company we thought first moving our Azure SQL Database workload telemetry on on-prem central monitoring infrastructure as well. But not to mention the main blocker which is the serverless compute tier because Telegraf Server agent would imply initiating a connection that could prevent auto-pausing the database or at least it would made monitoring more complex because it would supposed to have a predictable workload all the time.

The second option was to rely on Azure monitor which is a common platform for combining several logging, monitoring and dashboard solutions across a wide set of Azure resources. It is scalable platform, fully managed and provides a powerful query language and native features like alerts, if logs or metrics match specific conditions. Another important point is there is no vendor lock-in, with this solution, as we can always fallback to our self-hosted Prometheus and Grafana instances if neither computer tier doesn’t fit nor in case Azure Monitor might not be an option anymore!

Firstly, to achieve a good observability with Azure SQL Database we need to put both diagnostic telemetry and SQL Server audits events in a common Log Analytics workspace. A quick illustration below:

Diagnostic settings are configured per database and including basic metrics (CPU, IO, Memory etc …) and also different SQL Server internal metrics as deadlock, blocked processes or query store information about query execution statistic and waits etc… For more details please refer to the Microsoft BOL.

SQL Azure DB auditing is both server-level or database-level configuration setting. In our context, we defined a template of events at the server level which is then applied to all databases within the logical server. By default, 3 events are automatically audited:
– BATCH_COMPLETED_GROUP
– SUCCESSFUL_DATABASE_AUTHENTICATION_GROUP
– FAILED_DATABASE_AUTHENTICATION_GROUP

The first one of the list is probably to be discussed according to the environment because of its impact but in our context that’s ok because we faced a data warehouse workload. However we added other ones to meet our security requirements:
– PERMISSION_CHANGE_GROUP
– DATABASE_PRINCIPAL_CHANGE_GROUP
– DATABASE_ROLE_MEMBER_CHANGE_GROUP
– USER_CHANGE_PASSWORD_GROUP

But if you take care about Log Analytics as target for SQL audits, you will notice it is still a feature in preview as shown below:

To be clear, usually we don’t consider using Azure preview features in production especially when they remain in this state for a long time but in this specific context we got interested by observability capabilities of the platform. From one hand, we get very useful performance insights through SQL Analytics dashboards (again in preview) and from the other hand we can easily query logs and traces through Log Analytics for correlation with other metrics. Obviously, we hope Microsoft moving a step further and providing this feature in GA in the near feature.

Let’s talk briefly of SQL Analytics first. It is an advanced and free cloud monitoring solution for Azure SQL database monitoring performance and it relies mainly on your Azure Diagnostic metrics and Azure Monitor views to present data in a structured way through performance dashboard.

Here an example of built-in dashboards we are using to track activity and high CPU / IO bound queries against our data warehouse.

You can use drill-down capabilities to different contextual dashboards to get insights of resource intensive queries. For example, we identified some LOG IO intensive queries against a clustered columnstore index and after some refactoring of UPDATE statement to DELETE + INSERT we reduced drastically LOG IO waits.

In addition, Azure monitor helped us in an another scenario where we tried to figure out recent workload patterns and to know if the current compute tier still fits with it. As said previously, we are relying on Serverless compute tier to handle the data warehouse-oriented workload with both auto-scaling and auto-pausing capabilities. At the first glance, we might expect a typical nightly workload as illustrated to Microsoft BOL and a cost optimized to this workload:

Images from Microsoft BOL

It could have been true when the activity started on Azure, but the game has changed with new incoming projects over the time. Starting with the general performance dashboard, the workload seems to follow the right pattern for Serverless compute tier, but we noticed billing keep going during unexpected timeframe as shown below. Let’s precise that I put deliberately only a sample of two days, but this pattern is a good representation of the general workload in our context.

Indeed, workload should be mostly nightly-oriented with sporadic activity during the day but quick correlation with other basic metrics like CPU or Memory percentage usage confirmed a persistent activity all day. We have CPU spikes and probably small batches that keep minimum memory around at other moments.

As per the Microsoft documentation, the minimum auto-pausing delay value is 1h and requires an inactive database (number of sessions = 0 and CPU = 0 for user workload) during this timeframe. Basic metrics didn’t provide any further insights about connections, applications or users that could generate such « noisy » activity, so we had to go another way by looking at the SQL Audit logs stored in Azure Monitor Logs. Data can be read through KQL which stands for Kusto Query Language (and not Kibana Query Language ). It’s the language used to query the Azure log databases: Azure Monitor Logs, Azure Monitor Application Insights and others and it is pretty similar to SQL language in the construct.

Here the first query I used to correlate number of events with metrics and that could prevent auto-pausing to kick in for the concerned database including RPC COMPLETED, BATCH COMPLETED, DATABASE AUTHENTICATION SUCCEEDED or DATABASE AUTHENTICATION FAILED

AzureDiagnostics
| where Category == 'SQLSecurityAuditEvents' and (action_name_s in ('RPC COMPLETED','BATCH COMPLETED') or action_name_s contains "DATABASE AUTHENTICATION") and LogicalServerName_s == 'xxxx' and database_name_s == xxxx
| summarize count() by bin(event_time_t, 1h),action_name_s
| render columnchart

Results are aggregated and bucketized per hour on generated time event with bin() function. Finally, for a quick and easy read, I choosed a simple and unformatted column chart render. Here the outcome:

As you probably noticed, daily activity is pretty small compared to nightly one and seems to confirm SQL batches and remote procedure calls. From this unclear picture, we can confirm anyway the daily workload is enough to keep the billing going because there is no per hour timeframe where there is no activity.

Let’s write another KQL query to draw a clearer picture of which applications ran during the a daily timeframe 07:00 – 20:00:

let start=datetime("2021-01-26");
let end=datetime("2021-01-29");
let dailystart=7;
let dailyend=20;
let timegrain=1d;
AzureDiagnostics
| project action_name_s, event_time_t, application_name_s, server_principal_name_s, Category, LogicalServerName_s, database_name_s
| where Category == 'SQLSecurityAuditEvents' and (action_name_s in ('RPC COMPLETED','BATCH COMPLETED') or action_name_s contains "DATABASE AUTHENTICATION")
| where LogicalServerName_s == 'xxxx' and database_name_s == 'xxxx'
| where event_time_t > start and event_time_t < end
| where datetime_part("Hour",event_time_t) between (dailystart .. dailyend)
| summarize count() by bin(event_time_t, 1h), application_name_s
| render columnchart with (xtitle = 'Date', ytitle = 'Nb events', title = 'Prod SQL Workload pattern')

And here the new outcome:

The new chart reveals some activities from SQL Server Management Studio but most part concerns applications with .Net SQL Data Provider. For a better clarity, we need more information related about applications and, in my context, I managed to address the point by reducing the search scope with the service principal name that issued the related audit event. It results to this new outcome that is pretty similar to previous one:

Good job so far. For a sake of clarity, the service principal obfuscated above is used by our Reporting Server infrastructure and reports to get data from this data warehouse. By going this way to investigate daily activity at different moments on the concerned Azure SQL database, we came to the conclusion that using Serverless computer tier didn’t make sense anymore and we need to upgrade likely to another computer tier.

Additional thoughts

Azure monitor is definitely a must to have if you are running resources on Azure and if you don’t own a platform for observability (metrics, logs and traces). Otherwise, it can be even beneficial for freeing up your on-prem monitoring infrastructure resources if scalability is a concern. Furthermore, there is no vendor-locking and you can decide to stream Azure monitor data outside in another place but at the cost of additional network transfer fees according to the target scenario. For example, Azure monitor can be used directly as datasource with Grafana. Azure SQL telemetry can be collected with Telegraf agent whereas audit logs can be recorded in another logging system like Kibana. In this blog post, we just surfaced the Azure monitor capabilities but, as demonstrated above, performing deep analysis correlations from different sources in a very few steps is a good point of this platform.

Building a more robust and efficient statistic maintenance with large tables

mikedavem — Mon, 26 Oct 2020 21:05:34 +0000

In a past, I went to different ways for improving update statistic maintenance in different shops according to their context, requirement and constraints as well as the SQL Server version used at this moment. All are important inputs for creating a good maintenance strategy which can be very simple with execution of sp_updatestats or specialized scripts to focus on some tables.

One of my latest experiences on this topic was probably one of the best although we go to circuitous way for dealing with long update statistic maintenance task on a large database. We used a mix of statistic analysis stuff and improvements provided by SQL Server 2014 SP1 CU6 and parallel update statistic capabilities. I wrote a blog post if you are interested in learning more on this experience.

I’m working now for a new company meaning a different context … At the moment of this write-up, we are running on SQL Server 2017 CU21 and database sizes are in different order of magnitude (more than 100GB compressed) compared to my previous experience. However, switching from default sampling method to FULLSCAN for some large tables drastically increased the update statistic task beyond to the allowed Windows time frame (00:00AM to 03:00AM) without any optimization.

Why to change the update statistic sampling method?

Let’s start from the beginning: why we need to change default statistic sample? In fact, this topic has been already covered in detail in the internet and to make the story short, good statistics are part of the recipe for efficient execution plans and queries. Default sampling size used by both auto update mechanism or UPDATE STATISTIC command without any specification come from a non-linear algorithm and may not produce good histogram with large tables. Indeed, the sampling size decreases as the table get bigger leading to a rough picture of values in the table which may affect cardinality estimation in execution plan … Exactly the side effects we experienced on with a couple of our queries and we wanted to minimize in the future. Therefore, we decided to improve cardinality estimation by switching to FULLSCAN method only for some big tables to produce better histogram. But this method comes also at the cost of a direct impact on consumed resources and execution time because the optimizer needs to read more data to build a better picture of data distribution and sometimes with an higher tempdb usage. Our first attempt on ACC environment increased the update statistic maintenance task from initially 5min with default sampling size to 3.5 hours with the FULLSCAN method and only for large tables … Obviously an unsatisfactory solution because we were out of the allowed Windows maintenance timeframe.

Context matters

But first let’s set the context a little bit more: The term “large” can be relative according to the environment. In my context, it means tables with more than 100M of rows and less than 100GB in size for the biggest one and 10M of rows and 10GB in size for lower ones. In fact, for partitioned tables total size includes the archive partition’s compression.

Another gusty detail: concerned databases are part of availability groups and maxdop for primary replica was setup to 1. There is a long story behind this value with some side effects encountered in the past when switching to maxdop > 1 and cost threshold for parallelism = 50. At certain times of the year, the workload increased a lot and we faced memory allocation issues for some parallel queries (parallel queries usually require more memory). This is something we need to investigate further but we switched back to maxdop=1 for now and I would say so far so good …

Because we don’t really have index structures heavily fragmented between two rebuild index operations, we’re not in favor of frequent rebuilding index operations. Even if such operation can be either done online or is resumable with SQL Server 2017 EE, it remains a very resource intensive operation including log block replication on the underlying Always On infrastructure. In addition, there is a strong commitment of minimizing resource overhead during the Windows maintenance because of concurrent business workload in the same timeframe.

Options available to speed-up update statistic task

Using MAXDOP / PERSIST_SAMPLE_PERCENT with UPDATE STATISTICS command

KB4041809 describes new support added for MAXDOP option for the CREATE STATISTICS and UPDATE STATISTICS statements in Microsoft SQL Server 2014, 2016 and 2017. This is especially helpful to override MAXDOP settings defined at the server or database-scope level. As a reminder, maxdop value is forced to 1 in our context on availability group primary replicas.

For partitioned tables we don’t go through this setting because update statistic is done at partition level (see next section). The concerned tables own 2 partitions, respectively CURRENT and ARCHIVE. We keep the former small in size and with a relative low number of rows (only last 2 weeks of data). Therefore, there is no real benefit of using MAXDOP to force update statistics to run with parallelism in this case.

But non-partitioned large tables (>=10 GB) are good candidate. According to the following picture, we noticed an execution time reduction of 57% by increasing maxdop value to 4 for some large tables with these specifications:
– ~= 10GB
– ~ 11M rows
– 112 columns
– 71 statistics

Another feature we went through is described in KB4039284 and available since with SQL Server 2016+. In our context, the maintenance of statistics relies on a custom stored procedure (not Ola maintenance scripts yet) and we have configured default sampling rate method for all statistics and we wanted to make exception only for targeted large tables. In the past, we had to use NO_RECOMPUTE option to exclude statistics for automatic updates. The new PERSIST_SAMPLE_PERCENT option indicates SQL Server to lock the sampling rate for future update operations and we are using it for non-partitioned large tables.

Incremental statistics

SQL Server 2017 provides interesting options to reduce maintenance overhead. Surprisingly some large tables were already partitioned but no incremental statistics were configured. Incremental statistics are especially useful for tables where only few partitions are changed at a time and are a great feature to improve efficiency of statistic maintenance because operations are done at the partition level since SQL Server 2014. Another blog post written a couple of years ago and here was a great opportunity to apply theorical concepts to a practical use case. Because we already implemented partition-level maintenance for indexes, it made sense to apply the same method for statistics to minimize overhead with FULLSCAN method and to benefit from statistic update threshold at the partition level. As said in the previous section, partitioned tables own 2 partitions CURRENT (last 2 weeks) and ARCHIVE and the goal was to only update statistics on the CURRENT partition on daily basis. However, let’s precise that although statistic objects are managed are the partition level, the SQL Server optimizer is not able to use them directly (no change since SQL Server 2014 to SQL Server 2019 as far as I know) and refers instead to the global statistic object.

Let’s demonstrate with the following example:

Let’s consider BIG TABLE with 2 partitions for CURRENT (last 2 weeks) and ARCHIVE values as shown below:

SELECT
s.object_id,
s.name AS stat_name,
sp.rows,
sp.rows_sampled,
sp.node_id,
sp.left_boundary,
sp.right_boundary,
sp.partition_number
FROM sys.stats AS s
CROSS APPLY sys.dm_db_stats_properties_internal(s.object_id, s.stats_id) AS sp
WHERE s.object_id = OBJECT_ID('[dbo].[BIG TABLE]')
AND s.name = 'XXXX_OID'

Statistic object is incremental, and we got an internal picture of per-partition statistics and the global one. You need to enable trace flag 2309 and to add node id reference to the DBCC SHOW_STATISTICS command as well. Let’s dig into the ARCHIVE partition to find a specific value within the histogram step:

DBCC TRACEON ( 2309 );
GO
DBCC SHOW_STATISTICS('[dbo].[BIG TABLE]', 'XXX_OID', 7) WITH HISTOGRAM;

Then, I used the value 9246258 in the WHERE clause of the following query:

SELECT *
FROM dbo.[BIG TABLE]
WHERE XXXX_OID = 9246258

It gives an estimated cardinality of 37.689 rows as show below …

… Cardinality estimation is 37.689 while we should expect a value of 12 rows here referring to the statistic histogram above. Let’s now have a look at the global statistic (nodeid = 1):

DBCC SHOW_STATISTICS('[dbo].[BIG TABLE]', 'XXX_OID', 1) WITH HISTOGRAM;

In fact, the query optimizer estimates rows by using AVG_RANGE_ROWS value between 9189129 and 9473685 in the global statistic. Well, it is likely not as perfect as we may expect. Incremental statistics do helps in reducing time taken to gather stats for sure, but it may not be enough to represent the entire data distribution in the table – We are still limited to 200 steps in the global statistic object. Pragmatically, I think we may mitigate this point by saying things could be worst somehow if we need either to use default sample algorithm or to decrease the sample size of your update statistic operation.

Let’s illustrate with the BIG TABLE. To keep things simple, I have voluntary chosen a (real) statistic where data is evenly distributed. Here some pictures of real data distribution:

The first one is a simple view of MIN, MAX boundaries as well as AVG of occurrences (let’s say duplicate records for a better understanding) by distinct value:

Referring to the picture above, we may notice there is no high variation of number of occurrences per distinct value represented by the leading XXX_OID column in the related index. In the picture below, another representation of data distribution where each histogram bucket includes the number of distinct values per number of occurrences.

For example, we have roughly 2.3% of distinct values in the BIG TABLE with 29 duplicate records. The same applies for values 28, 31 and so on … In short, this histogram confirms a certain degree of homogeneity of data distribution and avg_occurences value is not so far from the truth.

Let’s using default sample value for UPDATE STATISTICS. A very low sample of rows are taken into account leading to very approximative statistics as show below:

SELECT
rows,
rows_sampled,
CAST(rows_sampled * 100. / rows AS DECIMAL(5,2)) AS [sample_%],
steps
FROM sys.dm_db_stats_properties(OBJECT_ID('[dbo].[BIG TABLE]), 1)

SELECT *
FROM sys.dm_db_stats_histogram(OBJECT_ID('[dbo].[BIG TABLE]), 1)

Focusing on average_range_rows colum values, we may notice estimation is not representative of real distribution in the BIG TABLE.

After running FULLSCAN method with UPDATE STATISTICS command, the story has changed, and estimation is now closer to the reality:

As a side note, one additional benefit of using FULLSCAN method is to get a representative statistic histogram in fewer steps. This is well-explained in the SQL Tiger team’s blog post and we noticed this specific behavior with some statistic histograms where frequency is low … mainly primary key and unique index related statistics.

How benefit was incremental statistic?

The picture below refers to one of our biggest partitioned large table with the following characteristics:
– ~ 410M rows
– ~ 63GB in size (including compressed partition size)
– 67 columns
– 30 statistics

As noticed in the picture above, overriding maxdop setting at the database-scoped level resulted to an interesting drop in execution time when FULLSCAN method is used (from 03h30 to 17s in the best case)
Similarly, combining efforts done for both non-partitioned and partitioned larges tables resulted to reduced execution time of update statistic task from ~ 03h30 to 15min – 30min in production that is a better fit with our requirements.

Going through more sophisticated process to update statistic may seem more complicated but strongly required in some specific scenarios. Fortunately, SQL Server provides different features to help optimizing this process. I’m looking forward to seeing features that will be shipped with next versions of SQL Server.

Curious case of locking scenario with SQL Server audits

mikedavem — Mon, 05 Oct 2020 19:25:47 +0000

In high mission-critical environments, ensuring high level of availability is a prerequisite and usually IT department addresses required SLAs (the famous 9’s) with high available architecture solutions. As stated by Wikipedia: availability measurement is subject to some degree of interpretation. Thus, IT department generally focus on uptime metric whereas for other departments availability is often related to application response time or tied to slowness / unresponsiveness complains. The latter is about application throughput and database locks may contribute to reduce it. This is something we are constantly monitoring in addition of the uptime in my company.

A couple of weeks ago, we began to experience suddenly some unexpected blocking issues that included some specific query patterns and SQL Server audit feature. This is all more important as this specific scenario began from one specific database and led to create a long hierarchy tree of blocked processes with blocked SQL Server audit operation first and then propagated to all databases on the SQL Server instance. A very bad scenario we definitely want to avoid … Here a sample of the blocking processes tree:

First, let’s set the context :

We are using SQL Server audit for different purposes since the SQL Server 2014 version and we actually running on SQL Server 2017 CU21 at the moment of this write-up. The obvious one is for security regulatory compliance with login events. We also rely on SQL Server audits to extend the observability of our monitoring system (based on Prometheus and Grafana). Configuration changes are audited with specific events and we link concerned events with annotations in our SQL Server Grafana dashboards. Thus, we are able to quickly correlate events with some behavior changes that may occur on the database side. The high-level of the audit infrastructure is as follows:

As shown in the picture above, a PowerShell script carries out stopping and restarting the audit target and then we use the archive audit file to import related data to a dedicated database.
Let’s precise we use this process without any issues since a couple of years and we were surprised to experience such behavior at this moment. Enough surprising for me to write a blog post … Digging further to the root cause, we pointed out to a specific pattern that seemed to be the root cause of our specific issue:

1. Open transaction
2. Foreach row in a file execute an UPSERT statement
3. Commit transaction

This is a RBAR pattern and it may become slow according the number of lines it has to deal with. In addition, the logic is encapsulated within a single transaction leading to accumulate locks during all the transaction duration. Thinking about it, we didn’t face the specific locking issue with other queries so far because they are executed within short transactions by design.

This point is important because enabling SQL Server audits implies also extra metadata locks. We decided to mimic this behavior on a TEST environment in order to figure out what happened exactly.

Here the scripts we used for that purpose:

TSQL script:

- Create audit
USE [master]
GO

CREATE SERVER AUDIT [Audit-Target-Login]
TO FILE
( FILEPATH = N'/var/opt/mssql/log/'
,MAXSIZE = 0 MB
,MAX_ROLLOVER_FILES = 2147483647
,RESERVE_DISK_SPACE = OFF
)
WITH
( QUEUE_DELAY = 1000
,ON_FAILURE = CONTINUE
)
WHERE (
[server_principal_name] like '%\%'
AND NOT [server_principal_name] like '%\svc%'
AND NOT [server_principal_name] like 'NT SERVICE\%'
AND NOT [server_principal_name] like 'NT AUTHORITY\%'
AND NOT [server_principal_name] like '%XDCP%'
);

ALTER SERVER AUDIT [Audit-Target-Login] WITH (STATE = ON);
GO

CREATE SERVER AUDIT SPECIFICATION [Server-Audit-Target-Login]
FOR SERVER AUDIT [Audit-Target-Login]
ADD (FAILED_DATABASE_AUTHENTICATION_GROUP),
ADD (SUCCESSFUL_DATABASE_AUTHENTICATION_GROUP),
ADD (FAILED_LOGIN_GROUP),
ADD (SUCCESSFUL_LOGIN_GROUP),
ADD (LOGOUT_GROUP)
WITH (STATE = ON)
GO

USE [DBA]
GO

-- Tables to simulate the scenario
CREATE TABLE dbo.T (
id INT,
col1 VARCHAR(50)
);

CREATE TABLE dbo.T2 (
id INT,
col1 VARCHAR(50)
);

INSERT INTO dbo.T VALUES (1, REPLICATE('T',20));
INSERT INTO dbo.T2 VALUES (1, REPLICATE('T',20));

PowerShell scripts:

Session 1: Simulating SQL pattern

# Scenario simulation
$server ='127.0.0.1'
$Database ='DBA'

$Connection =New-Object System.Data.SQLClient.SQLConnection
$Connection.ConnectionString = "Server=$server;Initial Catalog=$Database;Integrated Security=false;User ID=sa;Password=P@SSw0rd1;Application Name=TESTLOCK"
$Connection.Open()

$Command = New-Object System.Data.SQLClient.SQLCommand
$Command.Connection = $Connection
$Command.CommandTimeout = 500

$sql =
"
MERGE T AS T
USING T2 AS S ON T.id = S.id
WHEN MATCHED THEN UPDATE SET T.col1 = 'TT'
WHEN NOT MATCHED THEN INSERT (col1) VALUES ('TT');

WAITFOR DELAY '00:00:03'
"

#Begin Transaction
$command.Transaction = $connection.BeginTransaction()

# Simulate for each file => Execute merge statement
while(1 -eq 1){

$Command.CommandText =$sql
$Result =$Command.ExecuteNonQuery()

}

$command.Transaction.Commit()
$Connection.Close()

Session 2: Simulating stopping / starting SQL Server audit for archiving purpose

$creds = New-Object System.Management.Automation.PSCredential -ArgumentList ($user, $password)

$Query = "
USE master;
ALTER SERVER AUDIT [Audit-Target-Login]
WITH ( STATE = OFF );

ALTER SERVER AUDIT [Audit-Target-Login]
WITH ( STATE = ON );
"

Invoke-DbaQuery `
-SqlInstance $server `
-Database $Database `
-SqlCredential $creds `
-Query $Query

First, we wanted to get a comprehensive picture of locks acquired during the execution of this specific SQL pattern by with an extended event session and lock_acquired event as follows:

CREATE EVENT SESSION [locks]
ON SERVER
ADD EVENT sqlserver.lock_acquired
(
ACTION(sqlserver.client_app_name,
sqlserver.session_id,
sqlserver.transaction_id)
WHERE ([sqlserver].[client_app_name]=N'TESTLOCK'))
ADD TARGET package0.histogram
(
SET filtering_event_name=N'sqlserver.lock_acquired',
source=N'resource_type',source_type=(0)
)
WITH
(
MAX_MEMORY=4096 KB,
EVENT_RETENTION_MODE=ALLOW_SINGLE_EVENT_LOSS,
MAX_DISPATCH_LATENCY=30 SECONDS,
MAX_EVENT_SIZE=0 KB,
MEMORY_PARTITION_MODE=NONE,
TRACK_CAUSALITY=OFF,
STARTUP_STATE=OFF
)
GO

Here the output we got after running the first PowerShell session:

We confirm METADATA locks in addition to usual locks acquired to the concerned structures. We correlated this output with sp_WhoIsActive (and @get_locks = 1) after running the second PowerShell session. Let’s precise that you may likely have to run the 2nd query several times to reproduce the initial issue.

Here a picture of locks respectively acquired by session 1 and in waiting state by session 2:

…

We may identify clearly metadata locks acquired on the SQL Server audit itself (METDATA.AUDIT_ACTIONS with Sch-S) and the second query with ALTER SERVER AUDIT … WITH (STATE = OFF) statement that is waiting on the same resource (Sch-M). Unfortunately, my google Fu didn’t provide any relevant information on this topic excepted the documentation related to sys.dm_tran_locks DMV. My guess is writing events to audits requires a stable the underlying infrastructure and SQL Server needs to protect concerned components (with Sch-S) against concurrent modifications (Sch-M). Anyway, it is easy to figure out that subsequent queries could be blocked (with incompatible Sch-S on the audit resource) while the previous ones are running.

The query pattern exposed previously (unlike short transactions) is a good catalyst for such blocking scenario due to the accumulation and duration of locks within one single transaction. It may be confirmed by the XE’s output:

We managed to get a reproductible scenario with TSQL and PowerShell scripts. In addition, I also ran queries from other databases to confirm it may compromise responsiveness of the entire workload on the same instance (respectively DBA3 and DBA4 databases in my test).

How we fixed this issue?

Even it is only one part of the solution, I’m a strong believer this pattern remains a performance killer and using a set-bases approach may help to reduce drastically number and duration of locks and implicitly chances to make this blocking scenario happen again. Let’s precise it is not only about MERGE statement because I managed to reproduce the same issue with INSERT and UPDATE statements as well.

Then, this scenario really made us think about a long-term solution because we cannot guarantee this pattern will not be used by other teams in the future. Looking further at the PowerShell script which carries out steps of archiving the audit file and inserting data to the audit database, we finally added a QueryTimeout parameter value to 10s to the concerned Invoke-DbaQuery command as follows:

...

$query = "
USE [master];

IF EXISTS (SELECT 1
FROM sys.dm_server_audit_status
WHERE [name] = '$InstanceAuditPrefix-$AuditName')
BEGIN
ALTER SERVER AUDIT [$InstanceAuditPrefix-$AuditName]
WITH (STATE = OFF);
END

ALTER SERVER AUDIT [$InstanceAuditPrefix-$AuditName]
WITH (STATE = ON);
"

Invoke-DbaQuery `
-SqlInstance $Instance `
-SqlCredential $SqlCredential `
-Database master `
-Query $query `
-EnableException `
-QueryTimeout 5

...

Therefore, because we want to prioritize the business workload over the SQL Server audit operation, if such situation occurs again, stopping the SQL Server audit will timeout after reaching 5s which was relevant in our context. The next iteration of the PowerShell is able to restart at the last stage executed previously.

Hope this blog post helps.

See you!

SQL Server index rebuid online and blocking scenario

mikedavem — Sun, 30 Aug 2020 21:18:28 +0000

A couple of months ago, I experienced a problem about index rebuild online operation on SQL Server. In short, the operation was supposed to be online and to never block concurrent queries. But in fact, it was not the case (or to be more precise, it was partially the case) and to make the scenario more complex, we experienced different behaviors regarding the context. Let’s start the story with the initial context: in my company, we usually go through continuous deployment including SQL modification scripts and because we usually rely on daily pipeline, we must ensure related SQL operations are not too disruptive to avoid impacting the user experience.

Sometimes, we must introduce new indexes to deployment scripts and according to how disruptive the script can be, a discussion between Devs and Ops is initiated, and it results either to manage manually by the Ops team or to deploy it automatically through the automatic deployment pipeline by Devs.

Non-disruptive operations can be achieved in many ways and ONLINE capabilities of SQL Server may be part of the solution and this is what I suggested with one of our scripts. Let’s illustrate this context with the following example. I created a table named dbo.t1 with a bunch of rows:

USE [test];

SET NOCOUNT ON;

DROP TABLE IF EXISTS dbo.t1;
GO

CREATE TABLE dbo.t1 (
id INT IDENTITY(1,1) NOT NULL PRIMARY KEY,
col1 VARCHAR(50) NULL
);
GO

INSERT INTO dbo.t1 (col1) VALUES (REPLICATE('T', 50));
GO …
EXEC sp_spaceused 'dbo.t1'
--name rows reserved data index_size unused
--t1 5226496 1058000 KB 696872 KB 342888 KB 18240 KB

Go ahead and let’ set the context with a pattern of scripts deployment we went through during this specific deployment. Let’s precise this script is over simplified, but I keep the script voluntary simple to focus only on the most important part. You will notice the script includes two steps with operations on the same table including updating / fixing values in col2 first and then rebuilding index on col1.

/* Code before */

-- Update some values in the col1 colum
UPDATE [dbo].[t1]
SET col1 = REPLICATE('B', 50)

-- Then create an index on col1 column
CREATE INDEX [col1]
ON [dbo].[t1] (col1) WITH (ONLINE = ON);
GO

At the initial stage, the creation of index was by default (OFFLINE). Having discussed this point with the DEV team, we decided to create the index ONLINE in this context. The choice between OFFLINE / ONLINE operation is often not trivial and should be evaluated carefully but to keep simple, let’s say it was the right way to go in our context. Generally speaking, online operations are slower, but the tradeoff was acceptable in order to minimize blocking issues during this deployment. At least, this is what I thought …

In my demo, without any concurrent workload against the dbo.t1 table, creating the index offline took 6s compared to the online method with 12s. So, an expected result here …

Let’s run this another query in another session:

SELECT id, col1
FROM dbo.t1
WHERE id BETWEEN 1 AND 2

In a normal situation, this query should be blocked in a short time corresponding to the duration of the update operation. But once the update is done, blocking situation should disappear even during the index rebuild operation that is performed ONLINE.

But now let’s add Flyway to the context. Flyway is an open source tool we are using for automatic deployment of SQL objects. The deployment script was executed from it in ACC environment and we noticed longer blocked concurrent accesses this time. This goes against what we would ideally like. Digging through this issue with the DEV team, we also noticed the following message when running the deployment script:

Warning: Online index operation on table ‘dbo.t1 will proceed but concurrent access to the table may be limited due to residual lock on the table from a previous operation in the same transaction.

This is something I didn’t noticed from SQL Server Management Studio when I tested the same deployment script. So, what happened here?

Referring to the Flyway documentation, it is mentioned that Flyway always wraps the execution of an entire migration within a single transaction by default and it was exactly the root cause of the issue.

Let’s try with some experimentations:

Test 1: Update + rebuilding index online in implicit transaction mode (one transaction per query).

-- Update some values in the col1 colum
UPDATE [dbo].[t1]
SET col1 = REPLICATE('B', 50)

-- Then create an index on col1 column
CREATE INDEX [col1]
ON [dbo].[t1] (col1) WITH (ONLINE = ON);
GO
-- In another session
SELECT id, col1
FROM dbo.t1
WHERE id BETWEEN 1 AND 2

Test 2: Update + rebuilding index online within one single explicit transaction

BEGIN TRAN;

-- Update some values in the col1 colum
UPDATE [dbo].[t1]
SET col1 = REPLICATE('B', 50)

-- Then create an index on col1 column
CREATE INDEX [col1]
ON [dbo].[t1] (col1) WITH (ONLINE = ON);
GO
COMMIT TRAN;
-- In another session
SELECT id, col1
FROM dbo.t1
WHERE id BETWEEN 1 AND 2

After running these two scripts, we can notice the blocking duration of SELECT query is longer in test2 as shown in the picture below:

In the test 1, the duration of the blocking session corresponds to that for updating operation (first step of the script). However, in the test 2, we must include the time for creating the index but let’s precise the index is not the blocking operation at all, but it increases the residual locking put by the previous update operation. In short, this is exactly what the warning message is telling us. I think you can imagine easily which impact such situation may implies if the index creation takes a long time. You may get exactly the opposite of what you really expected.

Obviously, this is not a recommended situation and creating an index should be run in very narrow and constrained transaction.But from my experience, things are never always obvious and regarding your context, you should keep an eye of how transactions are managed especially when it comes automatic deployment stuff that could be quickly out of the scope of the DBA / Ops team. Strong collaboration with DEV team is recommended to anticipate this kind of issue.

See you !!

Universal usage of NVARCHAR type and performance impact

mikedavem — Wed, 27 May 2020 17:06:24 +0000

A couple of weeks, I read an article from Brent Ozar about using NVARCHAR as a universal parameter. It was a good reminder and from my experience, I confirm this habit has never been a good idea. Although it depends on the context, chances are you will almost find an exception that proves the rule.

A couple of days ago, I felt into a situation that illustrated perfectly this issue, and, in this blog, I decided to share my experience and demonstrate how the impact may be in a real production scenario.
So, let’s start with the culprit. I voluntary masked some contextual information but the principal is here. The query is pretty simple:

DECLARE @P0 DATETIME
DECLARE @P1 INT
DECLARE @P2 NVARCHAR(4000)
DECLARE @P3 DATETIME
DECLARE @P4 NVARCHAR(4000)

UPDATE TABLE SET DATE = @P0
WHERE ID = @P1
AND IDENTIFIER = @P2
AND P_DATE >= @P3
AND W_O_ID = (
SELECT TOP 1 ID FROM TABLE2
WHERE Identifier = @P4
ORDER BY ID DESC)

And the corresponding execution plan:

The most interesting part concerns the TABLE2 table. As you may notice the @P4 input parameter type is NVARCHAR and it is evident we get a CONVERT_IMPLICIT in the concerned Predicate section above. The CONVERT_IMPLICIT function is required because of data type precedence. It results to a costly operator that will scan all the data from TABLE2. As you probably know, CONVERT_IMPLICT prevents sargable condition and normally this is something we could expect here referring to the distribution value in the statistic histogram and the underlying index on the Identifier column.

EXEC sp_helpindex 'TABLE2';

DBCC SHOW_STATISTICS ('TABLE2', 'IX___IDENTIFIER')
WITH HISTOGRAM;

Another important point to keep in mind is that scanning all the data from the TABLE 2 table may be at a certain cost (> 1GB) even if data resides in memory.

EXEC sp_spaceused 'TABLE2'

The execution plan warning confirms the potential overhead of retrieving few rows in the TABLE2 table:

To set a little bit more the context, the concerned application queries are mainly based on JDBC Prepared statements which imply using NVARCHAR(4000) with string parameters regardless the column type in the database (VARCHAR / NVARCHAR). This is at least what we noticed from during my investigations.

So, what? Well, in our DEV environment the impact was imperceptible, and we had interesting discussions with the DEV team on this topic and we basically need to improve the awareness and the visibility on this field. (Another discussion and probably another blog post) …

But chances are your PROD environment will tell you a different story when it comes a bigger workload and concurrent query executions. In my context, from an infrastructure standpoint, the symptom was an abnormal increase of the CPU consumption a couple of days ago. Usually, the CPU consumption was roughly 20% up to 30% and in fact, the issue was around for a longer period, but we didn’t catch it due to a « normal » CPU footprint on this server.

So, what happened here? We’re using SQL Server 2017 with Query Store enabled on the concerned database. This feature came to the rescue and brought attention to the first clue: A query plan regression that led increasing IO consumption in the second case (and implicitly the additional CPU resource consumption as well).

You have probably noticed both the execution plans are using an index scan at the right but the more expensive one (at the bottom) uses a different index strategy. Instead of using the primary key and clustered index (PK_xxx), a non-clustered index on the IX_xxx_Identifier column in the second query execution plan is used with the same CONVERT_IMPLICIT issue.

According to the query store statistics, number of executions per business day is roughly 25000 executions with ~ 8.5H of CPU time consumed during this period (18.05.2020 – 26.05.2020) that was a very different order of magnitude compared to what we may have in the DEV environment

At this stage, I would say investigating why a plan regression occurred doesn’t really matter because in both cases the most expensive operator concerns an index scan and again, we expect an index seek. Getting rid of the implicit conversion by using VARCHAR type to make the conditional clause sargable was a better option for us. Thus, the execution plan would be:

The first workaround in mind was to force the better plan in the query store (automatic tuning with FORCE_LAST_GOOD_PLAN = ON is disabled) but having discussed this point with the DEV team, we managed to deploy a fix very fast to address this issue and to reduce drastically the CPU consumption on this SQL Server instance as shown below. The picture is self-explanatory:

The fix consisted in adding CAST / CONVERT function to the right side of the equality (parameter and not the column) to avoid side effect on the JDBC driver. Therefore, we get another version of the query and a different query hash as well. The query update is pretty similar to the following one:

DECLARE @P0 DATETIME
DECLARE @P1 INT
DECLARE @P2 NVARCHAR(4000)
DECLARE @P3 DATETIME
DECLARE @P4 NVARCHAR(4000)

UPDATE TABLE SET DATE = @P0
WHERE ID = @P1
AND IDENTIFIER = CAST(@P2 AS varchar(50))
AND P_DATE >= @P3
AND W_O_ID = (
SELECT TOP 1 ID FROM TABLE2
WHERE Identifier = CAST(@P4 AS varchar(50))
ORDER BY ID DESC)

Sometime later, we gathered query store statistics of both the former and new query to confirm the performance improvement as shown below:

Finally changing the data type led to enable using an index seek operator to reduce drastically the SQL Server CPU consumption and logical read operations by far.

QED!

SQL Server on Linux and new FUA support for XFS filesystem

mikedavem — Mon, 13 Apr 2020 17:34:32 +0000

I wrote a (dbi services) blog post concerning Linux and SQL Server IO behavior changes before and after SQL Server 2017 CU6. Now, I was looking forward seeing some new improvements with Force Unit Access (FUA) that was implemented with Linux XFS enhancements since the Linux Kernel 4.18.

As reminder, SQL Server 2017 CU6 provides added a way to guarantee data durability by using « forced flush » mechanism explained here. To cut the story short, SQL Server has strict storage requirement such as Write Ordering, FUA and things go differently on Linux than Windows to achieve durability. What is FUA and why is it important for SQL Server? From Wikipedia: Force Unit Access (aka FUA) is an I/O write command option that forces written data all the way to stable storage. FUA appeared in the SCSI command set but good news, it was later adopted by other standards over the time. SQL Server relies on it to meet WAL and ACID capabilities.

On the Linux world and before the Kernel 4.18, FUA was handled and optimized only for the filesystem journaling. However, data storage always used the multi-step flush process that could introduce SQL Server IO storage slowness (Issue write to block device for the data + issue block device flush to ensure durability with O_DSYNC).

On the Windows world, installing and using a SQL Server instance assumes you are compliant with the Microsoft storage requirements and therefore the first RTM version shipped on Linux came only with O_DIRECT assuming you already ensure that SQL Server IO are able to be written directly into a non-volatile storage through the kernel, drivers and hardware before the acknowledgement. Forced flush mechanism – based on fdatasync() – was then introduced to address scenarios with no safe DIRECT_IO capabilities.

But referring to the Bob Dorr article, Linux Kernel 4.18 comes with XFS enhancements to handle FUA for data storage and it is obviously of benefit to SQL Server. FUA support is intended to improve write requests by shorten the path of write requests as shown below:

Picture from existing IO workflow on Bob Dorr’s article

This is an interesting improvement for write intensive workload and it seems to be confirmed from the tests performed by Microsoft and Bob Dorr in his article.

Let’s the experiment begins with my lab environment based on a Centos 7 on Hyper-V with an upgraded kernel version: 5.6.3-1.e17.elrepo.x86_64.

$uname -r
5.6.3-1.el7.elrepo.x86_64

$cat /etc/os-release | grep VERSION
VERSION="7 (Core)"
VERSION_ID="7"
CENTOS_MANTISBT_PROJECT_VERSION="7"
REDHAT_SUPPORT_PRODUCT_VERSION="7"

Let’s precise that my tests are purely experimental and instead of upgrading the Kernel to a newer version you may directly rely on RHEL 8 based distros which comes with kernel version 4.18 for example.

My lab environment includes 2 separate SSD disks to host the DATA + TLOG database files as follows:

I:\ drive : SQL Data volume (sdb – XFS filesystem)
T:\ drive : SQL TLog volume (sda – XFS filesystem)

The general performance is not so bad

Initially I just dedicated on disk for both SQL DATA and TLOG but I quickly noticed some IO waits (iostats output) leading to make me lunconfident with my test results

Spreading IO on physically separate volumes helped to reduce drastically these phenomena afterwards:

First, I enabled FUA capabilities on Hyper-V side as follows:

Set-VMHardDiskDrive -VMName CENTOS7 -ControllerType SCSI -OverrideCacheAttributes WriteCacheAndFUAEnabled

Get-VMHardDiskDrive -VMName CENTOS7 | `
ft VMName, ControllerType, ControllerLocation, Path, WriteHardeningMethod -AutoSize

Then I checked if FUA is enabled and supported from an OS perspective including sda (TLOG) and sdb (SQL DATA) disks:

$ lsblk -f
NAME FSTYPE LABEL UUID MOUNTPOINT
sdb
└─sdb1 xfs 06910f69-27a3-4711-9093-f8bf80d15d72 /sqldata
sr0
sda
├─sda2 xfs f5a9bded-130f-4642-bd6f-9f27563a4e16 /boot
├─sda3 LVM2_member QsbKEt-28yT-lpfZ-VCbj-v5W5-vnVr-2l7nih
│ ├─centos-swap swap 7eebbb32-cef5-42e9-87c3-7df1a0b79f11 [SWAP]
│ └─centos-root xfs 90f6eb2f-dd39-4bef-a7da-67aa75d1843d /
└─sda1 vfat 7529-979E /boot/efi

$ dmesg | grep sda
[ 1.665478] sd 0:0:0:0: [sda] 83886080 512-byte logical blocks: (42.9 GB/40.0 GiB)
[ 1.665479] sd 0:0:0:0: [sda] 4096-byte physical blocks
[ 1.665774] sd 0:0:0:0: [sda] Write Protect is off
[ 1.665775] sd 0:0:0:0: [sda] Mode Sense: 0f 00 10 00
[ 1.670321] sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, supports DPO and FUA
[ 1.683833] sda: sda1 sda2 sda3
[ 1.708938] sd 0:0:0:0: [sda] Attached SCSI disk
[ 5.607914] EXT4-fs (sda2): mounted filesystem with ordered data mode. Opts: (null)

Finally according to the documentation, I configured the trace flag 3979 and control.alternatewritethrough=0 parameters at startup parameters for my SQL Server instance.

$ /opt/mssql/bin/mssql-conf traceflag 3979 on

$ /opt/mssql/bin/mssql-conf set control.alternatewritethrough 0

$ systemctl restart mssql-server

The first I performed was pretty similar to those in my previous (dbi services) blog post.

CREATE TABLE dummy_test (
id INT IDENTITY,
col1 VARCHAR(2000) DEFAULT REPLICATE('T', 2000)
);

INSERT INTO dummy_test DEFAULT VALUES;
GO 67

For a sake of curiosity, I looked at the corresponding strace output:

$ cat sql_strace_fua.txt
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
78.13 360.618066 61739 5841 2219 futex
6.88 31.731833 1511040 21 15 restart_syscall
3.81 17.592176 130312 135 io_getevents
2.95 13.607314 98604 138 epoll_wait
2.88 13.313667 633984 21 21 rt_sigtimedwait
2.60 11.997925 1333103 9 nanosleep
1.79 8.279781 242 34256 gettid
0.84 3.876021 226 17124 getcpu
0.03 0.138836 347 400 sched_yield
0.01 0.062348 254 245 getrusage
0.01 0.056065 406 138 69 readv
0.01 0.038107 343 111 read
0.01 0.037883 743 51 mmap
0.01 0.037498 180 208 epoll_ctl
0.01 0.035654 517 69 writev
0.01 0.025542 370 69 io_submit
0.00 0.019760 282 70 write
0.00 0.019555 477 41 open
0.00 0.016285 1629 10 rt_sigaction
0.00 0.012359 301 41 close
0.00 0.010069 205 49 munmap
0.00 0.006977 303 23 rt_sigprocmask
0.00 0.006256 153 41 fstat
0.00 0.004646 465 10 10 stat
0.00 0.000860 215 4 madvise
0.00 0.000321 161 2 sched_setaffinity
0.00 0.000295 148 2 set_robust_list
0.00 0.000281 141 2 clone
0.00 0.000236 118 2 sigaltstack
0.00 0.000093 47 2 arch_prctl
0.00 0.000046 23 2 sched_getaffinity
------ ----------- ----------- --------- --------- ----------------
100.00 461.546755 59137 2334 total

… And as I expected, with FUA enabled no fsync() / fdatasync() called anymore and writing to a stable storage is achieved directly by FUA commands. Now iomap_dio_rw() is determining if REQ_FUA can be used and issuing generic_write_sync() is still necessary. To dig further to the IO layer we need to rely to another tool blktrace (mentioned to the Bob Dorr’s article as well).

In my case I got to different pictures of blktrace output between forced flushed mechanism (the default) and FUA oriented IO:

-> With forced flush

34.694734500 14225 18425192 8,16 0 17164 A WS 2048 sqlservr
34.694735000 14225 18425192 8,16 0 17165 Q WS 2048 sqlservr
34.694737000 14225 18425192 8,16 0 17166 X WS 1024 sqlservr
34.694738100 14225 18425192 8,16 0 17167 G WS 1024 sqlservr
34.694739800 14225 18426216 8,16 0 17169 G WS 1024 sqlservr
34.694740900 14225 18425192 8,16 0 17171 D WS 1024 sqlservr
34.694747200 14225 18426216 8,16 0 17174 D WS 1024 sqlservr
34.713665000 14225 0 8,16 0 17175 Q FWS 0 sqlservr
34.713668100 14225 0 8,16 0 17176 G FWS 0 sqlservr

WS (Write Synchronous) is performed but SQL Server still needs to go through the multi-step flush process with the additional FWS (PERFLUSH|WRITE|SYNC).

-> FUA

0.000000000 16305 55106536 8,0 0 1 A WFS 8 sqlservr
0.000000400 16305 57615336 8,0 0 2 A WFS 8 sqlservr
0.000001100 16305 57615336 8,0 0 3 Q WFS 8 sqlservr
0.000005200 16305 57615336 8,0 0 4 G WFS 8 sqlservr
0.001377800 16305 55106544 8,0 0 6 A WFS 16 sqlservr

FWS has disappeared with only WFS commands which are basically REQ_WRITE with the REQ_FUA request

I spent some times to read some interesting discussions in addition to the Bob Dorr’s wonderful article. Here an interesting pointer to a a discussion about REQ_FUA for instance.

But what about performance gain?

I had 2 simple scenarios to play with in order to bring out FUA helpfulness including the harden the dirty pages in the BP with checkpoint process and harden the log buffer to disk during the commit phase. When forced flush method is used, each component relies on additional FlushFileBuffers() function to achieve durability. This event can be easily tracked from an XE session including flush_file_buffers and make_writes_durable events.

First scenario (10K inserts within a transaction and checkpoint)

In this scenario my intention was to stress the checkpoint process with a bunch of buffers and dirty pages to flush to disk when it kicks in.

USE dummy;

SET NOCOUNT ON;
-- Disable checkpoint to control when it will kick in
DBCC TRACEON(3505);
-- Check traceflag
DBCC TRACESTATUS;

DECLARE @i INT = 0;
DECLARE @iteration INT = 0;
DECLARE @start_upd DATETIME;
DECLARE @start_chkpt DATETIME;
DECLARE @end_upd DATETIME;
DECLARE @end_chkpt DATETIME;

TRUNCATE TABLE dummy_test;

WHILE @iteration < 251
BEGIN

SET @start_upd = GETDATE();

BEGIN TRAN;

WHILE @i <= 10000
BEGIN
INSERT INTO dummy_test DEFAULT VALUES;
SET @i += 1;
END

COMMIT TRAN;

SET @end_upd = GETDATE();

SET @i = 0;

SET @start_chkpt = GETDATE();
CHECKPOINT;
SET @end_chkpt = GETDATE();
PRINT 'INS: ' + CAST(DATEDIFF(ms, @start_upd, @end_upd) AS VARCHAR(50)) + ' - CHKPT: ' + CAST(DATEDIFF(ms, @start_chkpt, @end_chkpt) AS VARCHAR(50));

SET @iteration += 1;
END

The result is as follows:

In my case, I noticed ~ 17% of improvement for the checkpoint process and ~7% for the insert transaction including the commit phase with flushing data to the TLog. In parallel, looking at the extended event aggregated output confirms that FUA avoids a lot of additional operations to persist data on disk illustrated by flush_file_buffers and make_writes_durable events.

Second scenario (100x 1 insert within a transaction and checkpoint)

In this scenario, I wanted to stress the log writer by forcing a lot of small transactions to commit. I updated the TSQL code as shown below:

USE dummy;

SET NOCOUNT ON;
-- Disable checkpoint to control when it will kick in
DBCC TRACEON(3505);
-- Check traceflag
DBCC TRACESTATUS;

DECLARE @i INT = 0;
DECLARE @iteration INT = 0;
DECLARE @start_upd DATETIME;
DECLARE @start_chkpt DATETIME;
DECLARE @end_upd DATETIME;
DECLARE @end_chkpt DATETIME;

TRUNCATE TABLE dummy_test;

WHILE @iteration < 251
BEGIN

SET @start_upd = GETDATE();

WHILE @i <= 100
BEGIN
INSERT INTO dummy_test DEFAULT VALUES;
SET @i += 1;
END

SET @end_upd = GETDATE();

SET @i = 0;

SET @start_chkpt = GETDATE();
CHECKPOINT;
SET @end_chkpt = GETDATE();
PRINT 'INS: ' + CAST(DATEDIFF(ms, @start_upd, @end_upd) AS VARCHAR(50)) + ' - CHKPT: ' + CAST(DATEDIFF(ms, @start_chkpt, @end_chkpt) AS VARCHAR(50));

SET @iteration += 1;
END

The new picture is the following:

This time the improvement is definitely more impressive with a decrease of ~80% of the execution time about the INSERT + COMMIT and ~77% concerning the checkpoint phase!!!

Looking at the extended event session confirms the shorten IO path has something to do with it

Well, shortening the IO path and relying directing on initial FUA instructions was definitely a good idea both to join performance and to meet WAL and ACID capabilities. Anyway, I’m glad to see Microsoft to contribute improving to the Linux Kernel!!!

Mitigating Scalar UDF’s procedural code performance with SQL 2019 and Scalar UDF Inlining capabilities

mikedavem — Thu, 05 Mar 2020 15:10:02 +0000

A couple of days ago, I read the write-up of my former colleague @FranckPachot about refactoring procedural code to SQL. This is recurrent subject in the database world and I was interested in transposing this article to SQL Server because it was about refactoring a Scalar-Valued function to a SQL view. The latter one is a great alternative when it comes performance but something new was shipped with SQL Server 2019 and could address (or at least could mitigate) this recurrent scenario.

First of all, Scalar-Valued functions (from the User Defined Function category) are interesting objects for code modularity, factoring and reusability. No surprise to see them widely used by DEVs. But they are not always suited to performance considerations especially when it concerns the “impedance mismatch” problem. This is term used to refer to the problems that occurs due to differences between the database model and the programming language model. Indeed, from one side, a database world with SQL language that is declarative, and with queries that are set or multiset-oriented. To another side, programing world with imperative-oriented languages requiring accessing each tuple individually for processing.

To cut the story short, Scalar UDF provides programing benefits for DEVs but when performance matters, we discourage to use them for the aforementioned reasons. Before continuing, let’s precise that all the scripts and demos in the next sections are based on salika-db project on GitHub. Franck Pachot used the mysql version and fortunately there exists a sample for SQL Server as well. Furthermore, the mysql function used as initial example by Franck may be translated to SQL Server as follows:

-- Scalar function
CREATE OR ALTER FUNCTION inventory_in_stock (@p_inventory_id INT)
RETURNS BIT
BEGIN
DECLARE @v_rentals INT;
DECLARE @v_out INT;
DECLARE @verif BIT;

--AN ITEM IS IN-STOCK IF THERE ARE EITHER NO ROWS IN THE rental TABLE
--FOR THE ITEM OR ALL ROWS HAVE return_date POPULATED

SET @v_rentals = (SELECT COUNT(*) FROM rental WHERE inventory_id = @p_inventory_id);

IF @v_rentals = 0
BEGIN
SET @verif = 1
END
ELSE
BEGIN
SET @v_out = (SELECT COUNT(rental_id)
FROM inventory
LEFT JOIN rental ON inventory.inventory_id = rental.inventory_id
WHERE inventory.inventory_id = @p_inventory_id
AND rental.return_date IS NULL)

IF @v_out > 0
SET @verif = 0;
ELSE
SET @verif = 1;
END;

RETURN @verif;
END
GO

During his write-up, Franck provided a natural alternative of this UDF based on a SQL view and here a similar solution applied to SQL Server:

CREATE OR ALTER VIEW v_inventory_stock_status
AS

SELECT
i.inventory_id,
CASE
WHEN NOT EXISTS (SELECT 1 FROM dbo.rental AS r WHERE r.inventory_id = i.inventory_id AND r.return_date IS NULL) THEN 1
ELSE 0
END AS inventory_in_stock
FROM dbo.inventory AS i
GO

Then similar to what Franck did, we can join this view with the inventory table to get the expected outcome:

select count(v.inventory_id),inventory_in_stock
from inventory AS i
left join v_inventory_stock_status AS v ON i.inventory_id = v.inventory_id
group by v.inventory_in_stock;
go

There is another alternative that could be use here base on a CTE rather than a TSQL view as follows. However, the performance is similar in both cases and it is up to each DEV which solution fits with their needs:

;with cte
as
(
SELECT
i.inventory_id,
CASE
WHEN NOT EXISTS (SELECT 1 FROM dbo.rental AS r WHERE r.inventory_id = i.inventory_id AND r.return_date IS NULL) THEN 1
ELSE 0
END AS inventory_in_stock
FROM dbo.inventory AS i
)
select count(v.inventory_id),inventory_in_stock
from inventory AS i
left join cte AS v ON i.inventory_id = v.inventory_id
group by v.inventory_in_stock;
go

I compared then the performance between the UDF based version and the TSQL view:

-- udf
select count(*),dbo.inventory_in_stock(inventory_id)
from inventory
group by dbo.inventory_in_stock(inventory_id)
GO
-- view
select count(v.inventory_id),inventory_in_stock
from inventory AS i
left join v_inventory_stock_status AS v ON i.inventory_id = v.inventory_id
group by v.inventory_in_stock;
go

The outcome below (CPU, Reads, Writes, Duration) is as expected. The SQL view is the winner by far.

Similar to Franck’s finding, the performance gain is as the cost of rewriting the code for DEVs in this scenario. But SQL Server 2019 provides another interesting way to continue using the UDF abstraction without compromising on performance: Scalar T-SQL UDF Inlining feature and I was curious to see how much improvement we get with such capabilities for this scenario.

First time I executed the following UDF-based TSQL script on SQL Server 2019 RTM (be sure to be in 150 compatibility mode), I ran into some OOM issues for the second query:

-- SQL 2017-
ALTER DATABASE SCOPED CONFIGURATION SET TSQL_SCALAR_UDF_INLINING = OFF;
GO
SELECT dbo.inventory_in_stock(10)
GO
-- SQL 2019+
ALTER DATABASE SCOPED CONFIGURATION SET TSQL_SCALAR_UDF_INLINING = ON;
GO
SELECT dbo.inventory_in_stock(10)

Msg 8624, Level 16, State 17, Line 14
Internal Query Processor Error: The query processor could not produce a query plan. For more information, contact Customer Support Services.

To be honest, not a surprise to be honest because I already aware of it by reading the blog post of @sqL_handle a couple of weeks ago. Updating to CU2 fixed my issue. The second shot revealed some interesting outcomes.
The query plan of first query (<= SQL 2017) is as we may expected usually from executing a TSQL scalar function. From an execution perspective, this black box is materialized in the form of the compute scalar operator as shown below:

But the story has changed with Scalar UDF Inlining capability. This is illustrated by the below pictures which are sample of a larger execution plan:

…

The query optimizer has inferred some relation operations from my (imperative based) scalar UDF based on the Froid framework and provides several benefits including compiler optimization and parallelism (initially not possible with UDFs).

Let’s perform the same benchmark test that I performed between the UDF-based and the TSQL view based queries. In fact, I had to propose a slightly variation of the query to hope kicking in the Scalar UDF Inline capability:

-- First UDF query
select count(*),dbo.inventory_in_stock(inventory_id)
from inventory
group by dbo.inventory_in_stock(inventory_id)
GO

-- Variation of the first query
;with cte
as
(
select inventory_id,dbo.inventory_in_stock(inventory_id) as inventory_in_stock
from inventory
)
select count(*), inventory_in_stock
from cte
group by inventory_in_stock
GO

From a performance perspective, it is worth noting the improvement is not necessarily on the read operation but more the CPU and Duration times.

But let’s push the tests further by increasing the amount of data. As a reminder, the performance of the test is tied to the number of UDF execution and implicitly number of records in the Inventory table.

So, let’s add a bunch of records to the Inventory table …

INSERT inventory (film_id, store_id, last_update)
SELECT
film_id,
store_id,
GETDATE()
FROM inventory;

… and let’s execute this script to get respectively a total of 146592 and 2345472 rows for each test. Here the corresponding performance outcomes:

I noticed more rows there are in the inventory table better performance we get for each corresponding test:

…

Well, interesting outcome without rewriting any code isn’t it? An 80% decrease in average for query duration time and 61% for CPU time execution. For a sake of curiosity let’s take a look at the different query plans:

Scalar UDF Inlining not enabled

Again, the real cost is hidden by the UDF black box through the compute scalar operator but we guess easily that every row processed by compute Scalar operator implies the dbo.inventory_in_stock() function.

Scalar UDF Inlining enabled

Without going into details of the execution plan, something that draw attention is compiler optimizer tricks kicked in including parallelism. All the optimization stuff done by the query processor is helpful to improve the overall performance of the query.

So last point, does Scalar UDF Inlining scale better than the SQL view?

This last output seems to confirm the SQL view remains the winner among the alternatives in this specific scenario and you will have to choose best solution and likely the acceptable tradeoff that will fit with your context.

See you!

SQL DB Azure, performance scaling thoughts

mikedavem — Thu, 20 Feb 2020 21:09:54 +0000

Let’s continue with Azure stories and performance scaling …

A couple of weeks ago, we studied opportunities to replace existing clustered indexes (CI) with columnstore indexes (CCI) for some facts. To cut the story short and to focus on the right topic of this write-up, we prepared a creation script for specific CCIs based on the Niko’s technique variation (no MAXDOP = 1 meaning we enable parallelism) in order to get a better segment alignment.

-- Recreation of clustered index
CREATE CLUSTERED INDEX [PK_FACT_IDX]
ON dbo.FactTable (KeyColumn)
WITH (DROP_EXISTING = ON, DATA_COMPRESSION = PAGE);

-- Creation of the CCI
CREATE CLUSTERED COLUMNSTORE INDEX [PK_FACT_IDX]
ON dbo.FactTable
WITH (DROP_EXISTING = ON);

-- Recreation of [[... n] nonclustered indexes
CREATE INDEX [IDX_xxx … n]
ON dbo.FactTable (column)
WITH (DROP_EXISTING = ON, DATA_COMPRESSION = PAGE);

Before deploying those indexes in our SQL DB Azure environment, we staged a first scenario in on-premises instance and the creation of all indexes took ~ 1h. It is worth noting that our tests are based on the same database with the same data in all cases. But guess what, the story was different in Azure and I got feedbacks from another team who was responsible to deploy indexes in Azure, the creation script was a bit longer (~ 4h).
I definitely enjoyed this story because we got a deeper understanding of DB Azure performance topic.

=> Moving to the cloud means we’ll get slower performance?

Before drawing conclusions to quickly a good habit to get is to compare specifications between environments. It’s not about comparing oranges and apples. Well let’s set my own context: from one side, the on-premises virtual SQL Server environment specification includes 8vCPUs (Intel(R) Xeon(R) CPU E5-2670 v3 @ 2.30GHz), 64 GB of RAM and a high-performance storage array with micro latency device dedicated to our IO intensive workloads. From the vendor specifications, we may except very interesting IO performance with a general throughput greater than 100 KIOPs (Random) or 1GB/s (sequential). On another side, the SQL DB Azure is based on the service pricing tier General Purpose: Serverless Gen5, 8 vCores. We use the vCore purchasing model and referring to the Microsoft documentation, hardware generation 5 includes a compute specification based on Intel E5-2673 v4 (Broadwell) 2.3-GHz and Intel SP-8160 (Skylake) processors. Added to this, the service pricing tier comes with a remote SSD based storage including IO latency around 5-7ms and 2560 IOPs max. Given the opportunity of the infrastructure elasticity, we could scale to up 16 vCores, 48GB of RAM and 5120 IOPs for data. Obviously, latency remains the same in this case.

As illustration, creation of all indexes (CI + CCI + NCIs) performed in our on-premises environment gave the following storage performance figures: ~ 700MB/s and 13K IOPs for maximum values that were an aggregation of DATA + LOG activity on D: drive. Rebuilding indexes are high resource consuming operations in terms of CPU as well and we obviously noticed CPU saturation at different steps of the operation.

…

As an aside, we may notice the creation of CCI is a less intensive operation in terms of resources and we retrieve the same pattern in Azure below. Talking of which, let’s compare with our SQL Azure DB. There are different ways to get performance metrics including the portal which enables monitoring performance through easy-to-use interface or DMVs for each Azure DB like sys.dm_db_resource_stats. It is worth noting that in SQL Azure DB metrics are expressed as percentage of the service tier limit, so you need to adjust your analysis with the tier you’re using. First, we observed the same resource utilization pattern for all steps of the creation script but within a different timeline – duration has increased to 4h (as mentioned by another team). There is a clear picture of reaching the limit of the configured service tier, especially for Log IO (green line) and we already switched from GP_S_Gen5_8 to GP_S_Gen5_16 service tier

In addition, Wait stats gave interesting insights as well:

Excluding the traditional PAGEIOLATCH_xx waits, the LOG_RATE_GOVERNOR wait type appeared in the top waits and confirms that we bumped into the limits imposed on transaction log I/O by our performance tier.

=> Scaling vs Upgrading the Service for better performance?

With SQL DB Azure PaaS, we may benefit from elastic architecture. Firstly, scaling the number of CPUs is a factor of improvement and there is a direct relationship with storage (IOPs), memory or disk space allocated for tempdb for instance. But the order of magnitude varies with the service tier as shown below:

For General Purpose ServerLess Generation 5 service tier – Resources per Core

Something relevant here because even performance increases with the number of vCores provisioned, we can deduce Log IO saturation from our test in Azure (especially in the first step of the CI creation) results of max log rate limitation that doesn’t scale in the same way. This is especially relevant here because as said previously index creation can be an resource intensive operation with a huge impact on the transaction log.

What would be a solution to speed-up this operation?

First viable solution in our context would be to switch to SIMPLE recovery model that fits perfectly with our scenario because we could get minimally-logged capabilities and a lower impact on the transaction log and because it is suitable for DW environments. Unfortunately, at the moment of this write-up, this is not supported and I suggest you to vote on feedback Azure if you are interested in.
From an infrastructure standpoint, improving max log rate throughput is only possible by upgrading to a higher service tier (but at the cost of higher fees obviously). For a sake of curiosity, I did a try with the BC_Gen5_16 service tier specifications:

Even if this new service tier seems to be a better fit (suggested by the relative percentage of resource usage) …

… there are important notes here:

1) Business Critical Tier is not available for Serverless architecture

2) Moving to a different service is not instantaneous and it may require several hours according to the database size (~ 3h for a total size of ~500GB database size in my case). Well, this is not viable option even if get better performance. Indeed, if we add the time to upgrade to a higher service tier (3h) + time to run the creation script (3h or 25% of performance gain compared to the previous GP_S_Gen5_16 service tier). We may obviously upgrade again to reach performance closer to our on-premises environment but does it worth fighting for here only for an index creation script?

Concerning our scenario (Data Warehouse), it is generally easy to schedule a non-peak hours time frame that doesn’t overlap with the processing-oriented workload but it could not be the case for everyone!

See you!

Expérimentation d’une mise à jour de statistiques sur une grosse table par des voies détournées

mikedavem — Thu, 25 Jan 2018 06:52:08 +0000

Ceci est mon premier blog de l’année 2018 et depuis un moment d’ailleurs. En effet, l’année dernière j’ai mis toute mon énergie à réajuster mes connaissances Linux avec la nouvelle stratégie Open Source de Microsoft. Mais en même temps, j’ai réalisé un certain nombre de tâches intéressantes chez certains clients et en voici une pour commencer cette nouvelle année. Dans ce billet, j’aimerai souligner une approche particulière (selon moi) pour optimiser une mise à jour de statistiques pour une grosse table.

> Lire la suite (en anglais)

David Barbarin
MVP & MCM SQL Server

Groupes de disponibilités AlwaysOn and problème de statistique sur les secondaires

mikedavem — Sun, 15 Jan 2017 15:30:34 +0000

Je voudrais partager avec vous un problème intéressant de statistiques que vous pouvez rencontrer avec les réplicas en lecture seule dans une infrastructure de groupe de disponibilités. Pour ceux qui les utilisent pour des besoins de Reporting, continuez la lecture de ce billet car il s’agit d’un problème de comportement de mise à jour de statistiques sur ceux-ci pouvant impliquer un problème d’estimation de cardinalités pouvant avoir de graves conséquences sur les performances de vos requêtes.

> Lire la suite (en anglais)

David Barbarin
MVP & MCM SQL Server