David Barbarin » SQL Server

Graphing SQL Server wait stats on Prometheus and Grafana

mikedavem — Thu, 09 Sep 2021 21:19:22 +0000

Wait stats are essential performance metrics for diagnosing SQL Server Performance problems. Related metrics can be monitored from different DMVs including sys.dm_os_wait_stats and sys.dm_db_wait_stats (Azure).

As you probably know, there are 2 categories of DMVs in SQL Server: Point in time versus cumulative and DMVs mentioned previously are in the second category. It means data in these DMVs are accumulative and incremented every time wait events occur. Values reset only when SQL Server restarts or when you intentionally run DBCC SQLPERF command. Baselining these metric values require taking snapshots to compare day-to-day activity or maybe simply trends for a given timeline. Paul Randal kindly provided a TSQL script for trend analysis in a specified time range in this blog post. The interesting part of this script is the focus of most relevant wait types and corresponding statistics. This is basically the kind of scripts I used for many years when I performed SQL Server audits at customer shops but today working as database administrator for a company, I can rely on our observability stack that includes Telegraf / Prometheus and Grafana to do the job.

In a previous write-up, I explained the choice of such platform for SQL Server. But transposing the Paul’s script logic to Prometheus and Grafana was not a trivial stuff, but the result was worthy. It was an interesting topic that I want to share with Ops and DBA who wants to baseline SQL Server telemetry on Prometheus and Grafana observability platform.

So, let’s start with metrics provided by Telegraf collector agent and then scraped by Prometheus job:
– sqlserver_waitstats_wait_time_ms
– sqlserver_waitstats_waiting_tasks_count
– sqlserver_waitstats_resource_wait_time_ms
– sqlserver_waitstats_signe_wait_time_ms

In the context of the blog post we will focus only on the first 2 ones of the above list, but the same logic applies for others.

As a reminder, we want to graph most relevant wait types and their average value within a time range specified in a Grafana dashboard. In fact, this is a 2 steps process:

1) Identifying most relevant wait types by computing their ratio with the total amount of wait time within the specific time range.
2) Graphing in Grafana these most relevant wait types with their corresponding average value for every Prometheus step in the time range.

To address the first point, we need to rely on special Prometheus rate() function and group_left modifier.

As per the Prometheus documentation, rate() gives the per second average rate of change over the specified range interval by using the boundary metric points in it. That is exactly what we need to compute the total average of wait time (in ms) per wait type in a specified time range. rate() needs a range vector as input. Let’s illustrate what is a range vector with the following example. For a sake of simplicity, I filtered with sqlserver_waitstats_wait_time_ms metric to one specific SQL Server instance and wait type (PAGEIOLATCH_EX). Range vector is expressed with a range interval at the end of the query as you can see below:

sqlserver_waitstats_wait_time_ms{sql_instance="$Instance",wait_type="PAGEIOLATCH_EX"}[1m]

The result is a set of data metrics within the specified range interval as show below:

We got for each data metric the value and the corresponding timestamp in epoch format. You can convert this epoch format to user friendly one by using date -r -j for example. Another important point here: The sqlserver_waitstats_wait_time_ms metric is a counter in Prometheus world because value keeps increasing over the time as you can see above (from top to bottom). The same concept exists in SQL Server with cumulative DMV category as explained at the beginning. It explains why we need to use rate() function for drawing the right representation of increase / decrease rate over the time between data metric points. We got 12 data metrics with an interval of 5s between each value. This is because in my context we defined a Prometheus scrape interval of 5s for SQL Server => 60s/5s = 12 data points and 11 steps. The next question is how rate calculates per-second rate of change between data points. Referring to my previous example, I can get the rate value by using the following prompQL query:

rate(sqlserver_waitstats_wait_time_ms{sql_instance="$Instance",wait_type="PAGEIOLATCH_EX"}[1m])

… and the corresponding value:

To understand this value, let’s have a good reminder of mathematic lesson at school with slope calculation.

Image from Wikipedia

The basic idea of slope value is to find the rate of change of one variable compared to another. Less the distance between two data points we have, more chance we have to get a precise approximate value of the slope. And this is exactly what it is happening with Prometheus when you zoom in or out by changing the range interval. A good resolution is also determined by the Prometheus scraping interval especially when your metrics are extremely volatile. This is something to keep in mind with Prometheus. We are working with approximation by design. So let’s do some math with a slope calculation of the above range vector:

Slope = DV/DT = (332628-332582)/(@1631125796.971 – @1631125746.962) =~ 0.83

Excellent! This is how rate() works and the beauty of this function is that slope calculation is doing automatically for all the steps within the range interval.

But let’s go back to the initial requirement. We need to calculate per wait type the average value of wait time between the first and last point in the specified range vector. We can now step further by using Prometheus aggregation operator as follows:

sum by (wait_type) (rate(sqlserver_waitstats_wait_time_ms{sql_instance="$Instance"}[1m]))

Please note we could have written it another way without using the sum by aggregator but it allows naturally to exclude all unwanted labels for the result metric. It will be particularly helpful for the next part. Anyway, Here a sample of the output:

Then we can compute label (wait type) ratio (or percentage). First attempt and naïve approach could be as follows:

sum by (wait_type) (rate(sqlserver_waitstats_wait_time_ms{sql_instance="$Instance"}[1m]))/ sum(rate(sqlserver_waitstats_wait_time_ms{sql_instance='$Instance'}[1m]))

But we get empty query result. Bad joke right? We need to understand that.

First part of the query gives total amount of wait time per wait type. I put a sample of the results for simplicity:

It results a new set of metrics with only one label for wait_type. Second part gives to total amount of wait time for all wait types as show below:

With SQL statement, we instinctively select columns that have matching values in concerned tables. Those columns are often concerned by primary or foreign keys. In Prometheus world, vector matching is performing the same way by using all labels at the starting point. But samples are selected or dropped from the result vector based either on « ignoring » and « on » keywords. In my case, they are no matching labels so we must tell Prometheus to ignore the remaining label (wait_type) on the first part of the query:

sum by (wait_type) (rate(sqlserver_waitstats_wait_time_ms{sql_instance="$Instance"}[1m]))/ ignoring(wait_type) sum(rate(sqlserver_waitstats_wait_time_ms{sql_instance='$Instance'}[1m]))

But another error message …

Error executing query: multiple matches for labels: many-to-one matching must be explicit (group_left/group_right)

In the many-to -one or one-to-many vector matching with Prometheus, samples are selected using keywords like group_left or group_right. In other words, we are telling Prometheus to perform a cross join in this case with this final query before performing division between values:

sum by (wait_type) (rate(sqlserver_waitstats_wait_time_ms{sql_instance="$Instance"}[1m]))/ ignoring(wait_type) group_left sum(rate(sqlserver_waitstats_wait_time_ms{sql_instance='$Instance'}[1m]))

Here we go!

We finally managed to calculate ratio per wait type with a specified range interval. Last thing is to select most relevant wait types by excluding first irrelevant wait types. Most of wait types come from the exclusion list provided by Paul Randal’s script. We also decided to only focus on max top 5 wait types with ratio > 10% but it is up to you to change these values:

topk(5, sum by (wait_type) (rate(sqlserver_waitstats_wait_time_ms{sql_instance='$Instance',measurement_db_type="SQLServer",wait_type!~'(BROKER_EVENTHANDLER|BROKER_RECEIVE_WAITFOR|BROKER_TASK_STOP|BROKER_TO_FLUSH|BROKER_TRANSMITTER|CHECKPOINT_QUEUE|CHKPT|CLR_AUTO_EVENT|CLR_MANUAL_EVENT|CLR_SEMAPHORE|DBMIRROR_DBM_EVENT|DBMIRROR_EVENTS_QUEUE|DBMIRROR_WORKER_QUEUE|DBMIRRORING_CMD|DIRTY_PAGE_POLL|DISPATCHER_QUEUE_SEMAPHORE|EXECSYNC|FSAGENT|FT_IFTS_SCHEDULER_IDLE_WAIT|FT_IFTSHC_MUTEX|KSOURCE_WAKEUP|LAZYWRITER_SLEEP|LOGMGR_QUEUE|MEMORY_ALLOCATION_EXT|ONDEMAND_TASK_QUEUE|PARALLEL_REDO_DRAIN_WORKER|PARALLEL_REDO_LOG_CACHE|PARALLEL_REDO_TRAN_LIST|PARALLEL_REDO_WORKER_SYNC|PARALLEL_REDO_WORKER_WAIT_WORK|PREEMPTIVE_OS_FLUSHFILEBUFFERS|PREEMPTIVE_XE_GETTARGETSTATE|PWAIT_ALL_COMPONENTS_INITIALIZED|PWAIT_DIRECTLOGCONSUMER_GETNEXT|QDS_PERSIST_TASK_MAIN_LOOP_SLEEP|QDS_ASYNC_QUEUE|QDS_CLEANUP_STALE_QUERIES_TASK_MAIN_LOOP_SLEEP|QDS_SHUTDOWN_QUEUE|REDO_THREAD_PENDING_WORK|REQUEST_FOR_DEADLOCK_SEARCH|RESOURCE_QUEUE|SERVER_IDLE_CHECK|SLEEP_BPOOL_FLUSH|SLEEP_DBSTARTUP|SLEEP_DCOMSTARTUP|SLEEP_MASTERDBREADY|SLEEP_MASTERMDREADY|SLEEP_MASTERUPGRADED|SLEEP_MSDBSTARTUP|SLEEP_SYSTEMTASK|SLEEP_TASK|SLEEP_TEMPDBSTARTUP|SNI_HTTP_ACCEPT|SOS_WORK_DISPATCHER|SP_SERVER_DIAGNOSTICS_SLEEP|SQLTRACE_BUFFER_FLUSH|SQLTRACE_INCREMENTAL_FLUSH_SLEEP|SQLTRACE_WAIT_ENTRIES|VDI_CLIENT_OTHER|WAIT_FOR_RESULTS|WAITFOR|WAITFOR_TASKSHUTDOW|WAIT_XTP_RECOVERY|WAIT_XTP_HOST_WAIT|WAIT_XTP_OFFLINE_CKPT_NEW_LOG|WAIT_XTP_CKPT_CLOSE|XE_DISPATCHER_JOIN|XE_DISPATCHER_WAIT|XE_TIMER_EVENT|MEMORY_ALLOCATION_EXT|ONDEMAND_TASK_QUEUE|PREEMPTIVE_HADR_LEASE_MECHANISM|PREEMPTIVE_SP_SERVER_DIAGNOSTICS|PREEMPTIVE_ODBCOPS|PREEMPTIVE_OS_LIBRARYOPS|PREEMPTIVE_OS_COMOPS|PREEMPTIVE_OS_CRYPTOPS|PREEMPTIVE_OS_PIPEOPS|PREEMPTIVE_OS_AUTHENTICATIONOPS|PREEMPTIVE_OS_GENERICOPS|PREEMPTIVE_OS_VERIFYTRUST|PREEMPTIVE_OS_FILEOPS|PREEMPTIVE_OS_DEVICEOPS|PREEMPTIVE_OS_QUERYREGISTRY|PREEMPTIVE_OS_WRITEFILE|PREEMPTIVE_XE_CALLBACKEXECUTEPREEMPTIVE_XE_DISPATCHER|PREEMPTIVE_XE_GETTARGETSTATEPREEMPTIVE_XE_SESSIONCOMMIT|PREEMPTIVE_XE_TARGETINITPREEMPTIVE_XE_TARGETFINALIZE|PREEMPTIVE_XHTTP|PWAIT_EXTENSIBILITY_CLEANUP_TASK|PREEMPTIVE_OS_DISCONNECTNAMEDPIPE|PREEMPTIVE_OS_DELETESECURITYCONTEXT|PREEMPTIVE_OS_CRYPTACQUIRECONTEXT|PREEMPTIVE_HTTP_REQUEST|RESOURCE_GOVERNOR_IDLE|HADR_FABRIC_CALLBACK|PVS_PREALLOCATE)'}[1m])) / ignoring(wait_type) group_left sum(rate(sqlserver_waitstats_wait_time_ms{sql_instance='$Instance',measurement_db_type="SQLServer",wait_type!~'(BROKER_EVENTHANDLER|BROKER_RECEIVE_WAITFOR|BROKER_TASK_STOP|BROKER_TO_FLUSH|BROKER_TRANSMITTER|CHECKPOINT_QUEUE|CHKPT|CLR_AUTO_EVENT|CLR_MANUAL_EVENT|CLR_SEMAPHORE|DBMIRROR_DBM_EVENT|DBMIRROR_EVENTS_QUEUE|DBMIRROR_WORKER_QUEUE|DBMIRRORING_CMD|DIRTY_PAGE_POLL|DISPATCHER_QUEUE_SEMAPHORE|EXECSYNC|FSAGENT|FT_IFTS_SCHEDULER_IDLE_WAIT|FT_IFTSHC_MUTEX|KSOURCE_WAKEUP|LAZYWRITER_SLEEP|LOGMGR_QUEUE|MEMORY_ALLOCATION_EXT|ONDEMAND_TASK_QUEUE|PARALLEL_REDO_DRAIN_WORKER|PARALLEL_REDO_LOG_CACHE|PARALLEL_REDO_TRAN_LIST|PARALLEL_REDO_WORKER_SYNC|PARALLEL_REDO_WORKER_WAIT_WORK|PREEMPTIVE_OS_FLUSHFILEBUFFERS|PREEMPTIVE_XE_GETTARGETSTATE|PWAIT_ALL_COMPONENTS_INITIALIZED|PWAIT_DIRECTLOGCONSUMER_GETNEXT|QDS_PERSIST_TASK_MAIN_LOOP_SLEEP|QDS_ASYNC_QUEUE|QDS_CLEANUP_STALE_QUERIES_TASK_MAIN_LOOP_SLEEP|QDS_SHUTDOWN_QUEUE|REDO_THREAD_PENDING_WORK|REQUEST_FOR_DEADLOCK_SEARCH|RESOURCE_QUEUE|SERVER_IDLE_CHECK|SLEEP_BPOOL_FLUSH|SLEEP_DBSTARTUP|SLEEP_DCOMSTARTUP|SLEEP_MASTERDBREADY|SLEEP_MASTERMDREADY|SLEEP_MASTERUPGRADED|SLEEP_MSDBSTARTUP|SLEEP_SYSTEMTASK|SLEEP_TASK|SLEEP_TEMPDBSTARTUP|SNI_HTTP_ACCEPT|SOS_WORK_DISPATCHER|SP_SERVER_DIAGNOSTICS_SLEEP|SQLTRACE_BUFFER_FLUSH|SQLTRACE_INCREMENTAL_FLUSH_SLEEP|SQLTRACE_WAIT_ENTRIES|VDI_CLIENT_OTHER|WAIT_FOR_RESULTS|WAITFOR|WAITFOR_TASKSHUTDOW|WAIT_XTP_RECOVERY|WAIT_XTP_HOST_WAIT|WAIT_XTP_OFFLINE_CKPT_NEW_LOG|WAIT_XTP_CKPT_CLOSE|XE_DISPATCHER_JOIN|XE_DISPATCHER_WAIT|XE_TIMER_EVENT|MEMORY_ALLOCATION_EXT|ONDEMAND_TASK_QUEUE|PREEMPTIVE_HADR_LEASE_MECHANISM|PREEMPTIVE_SP_SERVER_DIAGNOSTICS|PREEMPTIVE_ODBCOPS|PREEMPTIVE_OS_LIBRARYOPS|PREEMPTIVE_OS_COMOPS|PREEMPTIVE_OS_CRYPTOPS|PREEMPTIVE_OS_PIPEOPS|PREEMPTIVE_OS_AUTHENTICATIONOPS|PREEMPTIVE_OS_GENERICOPS|PREEMPTIVE_OS_VERIFYTRUST|PREEMPTIVE_OS_FILEOPS|PREEMPTIVE_OS_DEVICEOPS|PREEMPTIVE_OS_QUERYREGISTRY|PREEMPTIVE_OS_WRITEFILE|PREEMPTIVE_XE_CALLBACKEXECUTEPREEMPTIVE_XE_DISPATCHER|PREEMPTIVE_XE_GETTARGETSTATEPREEMPTIVE_XE_SESSIONCOMMIT|PREEMPTIVE_XE_TARGETINITPREEMPTIVE_XE_TARGETFINALIZE|PREEMPTIVE_XHTTP|PWAIT_EXTENSIBILITY_CLEANUP_TASK|PREEMPTIVE_OS_DISCONNECTNAMEDPIPE|PREEMPTIVE_OS_DELETESECURITYCONTEXT|PREEMPTIVE_OS_CRYPTACQUIRECONTEXT|PREEMPTIVE_HTTP_REQUEST|RESOURCE_GOVERNOR_IDLE|HADR_FABRIC_CALLBACK|PVS_PREALLOCATE)'}[1m]))) >= 0.1

I got 3 relevant wait types with their correspond ratio in the specified time range.

Pretty cool stuff but we must now to go through the second requirement. We want to graph the average value of the identified wait types within a specified time range in Grafana dashboard. First thing consists in including the above Prometheus query as variable in the Grafana dashboard. Here how I setup my Top5Waits variable in Grafana:

Some interesting points here: variable dependency kicks in with my $Top5Waits variable that depends hierarchically on another $Instance variable in my dashboard (from another Prometheus query). You probably have noticed the use of [${__range_s}s] to determine the range interval but depending on the Grafana $__interval may be a good fit as well.

In turn, $Top5Waits can be used from another query but this time directly in a Grafana dashboard panel with the average value of most relevant wait types as shown below:

Calculating wait type average is not a hard task by itself. In fact, we can apply the same methods than previously by matching the sqlserver_waitstats_wait_tine_ms and sqlserver_waitstats_waiting_task_count and to divide their corresponding values to obtain the average wait time (in ms) for each step within the time range (remember how the rate () function works). Both metrics own the same set of labels, so we don’t need to use « on » or « ignoring » keywords in this case. But we must introduce the $Top5Waits variable in the label filter in the first metric as follows:

rate(sqlserver_waitstats_wait_time_ms{sql_instance='$Instance',wait_type=~"$Top5Waits",measurement_db_type="SQLServer"}[$__rate_interval])/rate(sqlserver_waitstats_waiting_tasks_count{sql_instance='$Instance',wait_type=~"$Top5Waits",measurement_db_type="SQLServer"}[$__rate_interval])

We finally managed to get an interesting dynamic measurement of SQL Server telemetry wait stats. Hope this blog post helps!
Let me know your feedback if your are using SQL Server wait stats in Prometheus and Grafana in a different way !

Creating dynamic Grafana dashboard for SQL Server

mikedavem — Sun, 11 Apr 2021 19:52:09 +0000

A couple of months ago I wrote about “Why we moved SQL Server monitoring to Prometheus and Grafana”. I talked about the creation of two dashboards. The first one is blackbox monitoring-oriented and aims to spot in (near) real-time resource pressure / saturation issues with self-explained gauges, numbers and colors indicating healthy (green) or unhealthy resources (orange / red). We also include availability group synchronization health metric in the dashboard. We will focus on it in this write-up.

As a reminder, this Grafana dashboard gets information from Prometheus server and metrics related to MSSQL environments. For a sake of clarity, in this dashboard, environment defines one availability group and a set of 2 AG replicas (A or B) in synchronous replication mode. In other words, ENV1 value corresponds to availability group name and to SQL instance names member of the AG group with SERVERA\ENV1 (first replica), SERVERB\ENV1 (second replica).

In the picture above, you can notice 2 sections. One is for availability group and health monitoring and the second includes a set of black box metrics related to saturation and latencies (CPU, RAM, Network, AG replication delay, SQL Buffer Pool, blocked processes …). Good job for one single environment but what if I want to introduce more availability groups and SQL instances in the game?

The first and easiest (or naïve) way we went through when we started writing this dashboard was to copy / paste all the stuff done for one environment the panels as shown below:

After creating a new row (can be associated to section in the present context) at the bottom, all panels were copied from ENV1 to the new fresh section ENV2. New row is created by converting anew panel into row as show below:

Then I need to modify manually ALL the new metrics with the new environment. Let’s illustrate the point with Batch Requests/sec metric as example. The corresponding Prometheus query for the first replica (A) is: (the initial query has been simplified for the purpose of this blog post):

irate(sqlserver_performance{sql_instance='SERVERA:ENV1',counter="Batch Requests/sec"}[$__range])

Same query exists for secondary replica (B) but with a different label value:

irate(sqlserver_performance{sql_instance='SERVERB:ENV1',counter="Batch Requests/sec"}[$__range])

SERVERA:ENV1 and SERVERB:ENV1 are static values that correspond to the name of each SQL Server instance – respectively SERVERA\ENV1 and SERVERB\ENV1. As you probably already guessed and according to our naming convention, for the new environment and related panels, we obviously changed initial values ENV1 with new one ENV2. But having more environments or providing filtering capabilities to focus only on specific environments make the current process tedious and we need introduce dynamic stuff in the game … Good news, Grafana provides such capabilities with dynamic creation of rows and panels. and rows.

Generating dynamic panels in the same section (row)

Referring to the dashboard, first section concerns availability group health metric. When adding a new environment – meaning a new availability group – we want a new dedicated panel creating automatically in the same section (AG health).
Firstly, we need to add a multi-value variable in the dashboard. Values can be static or dynamic from another query regarding your context. (up to you to choose the right solution according to your context).

Once created, a drop-down list appears at the upper left in the dashboard and now we can perform multi selections or we can filter to specific ones.

Then we need to make panel in the AG Heath section dynamic as follows:
– Change the title value with corresponding dashboard (optional)
– Configure repeat options values with the variable (mandatory). You can also define max panel per row

According to this setup, we can display 4 panels (or availability groups) max per row. The 5th will be created and placed to a new line in the same section as shown below:

Finally, we must replace static label values defined in the query by the variable counterpart. For the availability group we are using sqlserver_hadr_replica_states_replica_synchronization_health metric as follows (again, I voluntary put a sample of the entire query for simplicity purpose):

… sqlserver_hadr_replica_states_replica_synchronization_health{sql_instance=~'SERVER[A|B]:$ENV',measurement_db_type="SQLServer"}) …

You can notice the regex expression used to get information from SQL Instances either from primary (A) or secondary (B). The most interesting part concerns the environment that is now dynamic with $ENV variable.

Generating dynamic sections (rows)

As said previously, sections are in fact rows in the Grafana dashboard and rows can contain panels. If we add new environment, we want also to see a new section (and panels) related to it. Configuring dynamic rows is pretty similar to panels. We only need to change the “Repeat for section” with the environment variable as follows (Title remains optional):

As for AG Health panel, we also need to replace static label values in ALL panels with the new environment variable. Thus, referring to the previous Batch Requests / sec example, the updated Prometheus query will be as follows: (respectively for primary and secondary replicas):

irate(sqlserver_performance{sql_instance='SERVERA:$ENV',counter="Batch Requests/sec"}[$__range])

…

irate(sqlserver_performance{sql_instance='SERVERB:$ENV',counter="Batch Requests/sec"}[$__range])

The dashboard is now ready, and all dynamic kicks in when a new SQL Server instance is added to the list of monitored items. Here an example of outcome in our context:

Happy monitoring!

SQL Server index rebuid online and blocking scenario

mikedavem — Sun, 30 Aug 2020 21:18:28 +0000

A couple of months ago, I experienced a problem about index rebuild online operation on SQL Server. In short, the operation was supposed to be online and to never block concurrent queries. But in fact, it was not the case (or to be more precise, it was partially the case) and to make the scenario more complex, we experienced different behaviors regarding the context. Let’s start the story with the initial context: in my company, we usually go through continuous deployment including SQL modification scripts and because we usually rely on daily pipeline, we must ensure related SQL operations are not too disruptive to avoid impacting the user experience.

Sometimes, we must introduce new indexes to deployment scripts and according to how disruptive the script can be, a discussion between Devs and Ops is initiated, and it results either to manage manually by the Ops team or to deploy it automatically through the automatic deployment pipeline by Devs.

Non-disruptive operations can be achieved in many ways and ONLINE capabilities of SQL Server may be part of the solution and this is what I suggested with one of our scripts. Let’s illustrate this context with the following example. I created a table named dbo.t1 with a bunch of rows:

USE [test];

SET NOCOUNT ON;

DROP TABLE IF EXISTS dbo.t1;
GO

CREATE TABLE dbo.t1 (
id INT IDENTITY(1,1) NOT NULL PRIMARY KEY,
col1 VARCHAR(50) NULL
);
GO

INSERT INTO dbo.t1 (col1) VALUES (REPLICATE('T', 50));
GO …
EXEC sp_spaceused 'dbo.t1'
--name rows reserved data index_size unused
--t1 5226496 1058000 KB 696872 KB 342888 KB 18240 KB

Go ahead and let’ set the context with a pattern of scripts deployment we went through during this specific deployment. Let’s precise this script is over simplified, but I keep the script voluntary simple to focus only on the most important part. You will notice the script includes two steps with operations on the same table including updating / fixing values in col2 first and then rebuilding index on col1.

/* Code before */

-- Update some values in the col1 colum
UPDATE [dbo].[t1]
SET col1 = REPLICATE('B', 50)

-- Then create an index on col1 column
CREATE INDEX [col1]
ON [dbo].[t1] (col1) WITH (ONLINE = ON);
GO

At the initial stage, the creation of index was by default (OFFLINE). Having discussed this point with the DEV team, we decided to create the index ONLINE in this context. The choice between OFFLINE / ONLINE operation is often not trivial and should be evaluated carefully but to keep simple, let’s say it was the right way to go in our context. Generally speaking, online operations are slower, but the tradeoff was acceptable in order to minimize blocking issues during this deployment. At least, this is what I thought …

In my demo, without any concurrent workload against the dbo.t1 table, creating the index offline took 6s compared to the online method with 12s. So, an expected result here …

Let’s run this another query in another session:

SELECT id, col1
FROM dbo.t1
WHERE id BETWEEN 1 AND 2

In a normal situation, this query should be blocked in a short time corresponding to the duration of the update operation. But once the update is done, blocking situation should disappear even during the index rebuild operation that is performed ONLINE.

But now let’s add Flyway to the context. Flyway is an open source tool we are using for automatic deployment of SQL objects. The deployment script was executed from it in ACC environment and we noticed longer blocked concurrent accesses this time. This goes against what we would ideally like. Digging through this issue with the DEV team, we also noticed the following message when running the deployment script:

Warning: Online index operation on table ‘dbo.t1 will proceed but concurrent access to the table may be limited due to residual lock on the table from a previous operation in the same transaction.

This is something I didn’t noticed from SQL Server Management Studio when I tested the same deployment script. So, what happened here?

Referring to the Flyway documentation, it is mentioned that Flyway always wraps the execution of an entire migration within a single transaction by default and it was exactly the root cause of the issue.

Let’s try with some experimentations:

Test 1: Update + rebuilding index online in implicit transaction mode (one transaction per query).

-- Update some values in the col1 colum
UPDATE [dbo].[t1]
SET col1 = REPLICATE('B', 50)

-- Then create an index on col1 column
CREATE INDEX [col1]
ON [dbo].[t1] (col1) WITH (ONLINE = ON);
GO
-- In another session
SELECT id, col1
FROM dbo.t1
WHERE id BETWEEN 1 AND 2

Test 2: Update + rebuilding index online within one single explicit transaction

BEGIN TRAN;

-- Update some values in the col1 colum
UPDATE [dbo].[t1]
SET col1 = REPLICATE('B', 50)

-- Then create an index on col1 column
CREATE INDEX [col1]
ON [dbo].[t1] (col1) WITH (ONLINE = ON);
GO
COMMIT TRAN;
-- In another session
SELECT id, col1
FROM dbo.t1
WHERE id BETWEEN 1 AND 2

After running these two scripts, we can notice the blocking duration of SELECT query is longer in test2 as shown in the picture below:

In the test 1, the duration of the blocking session corresponds to that for updating operation (first step of the script). However, in the test 2, we must include the time for creating the index but let’s precise the index is not the blocking operation at all, but it increases the residual locking put by the previous update operation. In short, this is exactly what the warning message is telling us. I think you can imagine easily which impact such situation may implies if the index creation takes a long time. You may get exactly the opposite of what you really expected.

Obviously, this is not a recommended situation and creating an index should be run in very narrow and constrained transaction.But from my experience, things are never always obvious and regarding your context, you should keep an eye of how transactions are managed especially when it comes automatic deployment stuff that could be quickly out of the scope of the DBA / Ops team. Strong collaboration with DEV team is recommended to anticipate this kind of issue.

See you !!

Monitoring Azure SQL Databases with Azure Monitor and Automation

mikedavem — Sun, 23 Aug 2020 15:32:07 +0000

Supervising Cloud Infrastructure is an important aspect of Cloud administration and Azure SQL Databases are no exception. This is something we are continuously improving at my company.

On-prem, DBAs often rely on well-established products but with Cloud-based architectures, often implemented through DevOps projects and developers, monitoring should be been redefined and include some new topics as:

1) Cloud service usage and fees observability
2) Metrics and events detection that could affect bottom line
3) Implementing a single platform to report all data that comes from different sources
4) Trigger rules with data if workload reaches over or drops below certain levels or when an event is enough relevant to not meet the configuration standard and implies unwanted extra billing or when it compromises the company security rules.
5) Monitoring of the user experience

A key benefit often discussed about Cloud computing, and mainly driven by DevOps, is how it enables agility. One of the meaning of term agility is tied to the rapid provisioning of computer resources (in seconds or minutes) and this shortening provisioning path enables work to start quickly. You may be tempted to grant some provisioning permissions to DEV teams and from my opinion this is not a bad thing, but it may come with some drawbacks if not under control by Ops team including database area. Indeed, for example I have in mind some real cases including architecture configuration drift, security breaches created by unwanted item changes, or idle orphan resources for which you keep being charged. All of these scenarios may lead either to security issues or extra billing and I believe it is important to get clear visibility of such events.

In my company, Azure built-in capabilities with Azure Monitor architecture are our first target (at least in a first stage) and seem to address the aforementioned topics. To set the context, we already relied on Azure Monitor infrastructure for different things including Query Performance Insight, SQL Audit analysis through Log Analytics and Azure alerts for some performance metrics. Therefore, it was the obvious way to go further by adding activity log events to the story.

In this blog post, let’s focus on the items 2) 4). I would like to share some experimentations and thoughts about them. As a reminder, items 2) 4) are about catching relevant events to help identifying configuration and security drifts and performing actions accordingly. In addition, as many event-based architectures, additional events may appear or evolve over the time and we started thinking about the concept with the following basic diagram …

… that led to the creation of the two following workflows:
– Workflow 1: To get notified immediately for critical events that may compromise security or lead immediately to important extra billing
– Workflow 2: To get a report of other misconfigured items (including critical ones) on schedule basis but don’t require quick responsiveness of Ops team.

Concerning the first workflow, using alerts on activity logs, action groups and webhooks as input of an Azure automation runbook appeared to be a good solution. On another side, the second one only requires running an Azure automation workbook on schedule basis. In fact, this is the same runbook but with different input parameters according to the targeted environment (e.g. PROD / ACC / INT). In addition, the runbook should be able to identity unmanaged events and notified Ops team who will decide either to skip it or to integrate it to runbook processing.

Azure alerts which can be divided in different categories including metric, log alerts and activity log alerts. The last one drew our attention because it allows getting notified for operation of specific resources by email or by generating JSON schema reusable from Azure Automation runbook. Focusing on the latter, we had come up (I believe) with what we thought was a reasonable solution.

Here the high-level picture of the architecture we have implemented:

1- During the creation of an Azure SQL Server or a database, corresponding alerts are added with Administrative category with a specific scope. Let’s precise that concerned operations must be registered with Azure Resource Manager in order to be used in Activity Log and fortunately they are all including in the Microsoft.Sql resource provider in this case.
2- When an event occurs on the targeted environment, an alert is triggered as well as the concerned runbook.
3- The execution of the same runbook but with different input parameters is scheduled on weekly basis to a general configuration report of our Azure SQL environments.
4- According the event, Ops team gets notified and acts (either to update misconfigured item, or to delete the unauthorized item, or to update runbook code on Git Repo to handle the new event and so on …)

The skeleton of the Azure automation runbook is pretty similar to the following one:

[OutputType("PSAzureOperationResponse")]
param
(
[Parameter (Mandatory=$false)]
[object] $WebhookData
,
[parameter(Mandatory=$False)]
[ValidateSet("PROD","ACC","INT")]
[String]$EnvTarget
,
[parameter(Mandatory=$False)]
[Boolean]$DebugMode = $False
)

If ($WebhookData)
{

# Logic to allow for testing in test pane
If (-Not $WebhookData.RequestBody){
$WebhookData = (ConvertFrom-Json -InputObject $WebhookData)
}

$WebhookBody = (ConvertFrom-Json -InputObject $WebhookData.RequestBody)

$schemaId = $WebhookBody.schemaId

If ($schemaId -eq "azureMonitorCommonAlertSchema") {
# This is the common Metric Alert schema (released March 2019)
$Essentials = [object] ($WebhookBody.data).essentials
# Get the first target only as this script doesn't handle multiple
$status = $Essentials.monitorCondition

# Focus only on succeeded or Fired Events
If ($status -eq "Succeeded" -Or $Status -eq "Fired")
{
# Extract info from webook
$alertTargetIdArray = (($Essentials.alertTargetIds)[0]).Split("/")
$SubId = ($alertTargetIdArray)[2]
$ResourceGroupName = ($alertTargetIdArray)[4]
$ResourceType = ($alertTargetIdArray)[6] + "/" + ($alertTargetIdArray)[7]

# Determine code path depending on the resourceType
if ($ResourceType -eq "microsoft.sql/servers")
{
# DEBUG
Write-Output "This is a SQL Server Resource."

$firedDate = $Essentials.firedDateTime
$AlertContext = [object] ($WebhookBody.data).alertContext
$channel = $AlertContext.channels
$EventSource = $AlertContext.eventSource
$Level = $AlertContext.level
$Operation = $AlertContext.operationName
$Properties = [object] ($WebhookBody.data).alertContext.properties
$EventName = $Properties.eventName
$EventStatus = $Properties.status
$Description = $Properties.description_scrubbed
$Caller = $Properties.caller
$IPAddress = $Properties.ipAddress
$ResourceName = ($alertTargetIdArray)[8]
$DatabaseName = ($alertTargetIdArray)[10]
$Operation_detail = $Operation.Split('/')

# Check firewall rules
If ($EventName -eq 'OverwriteFirewallRules'){
Write-Output "Firewall Overwrite is detected ..."
# Code to handle firewall update event
}
# Update DB => No need to be monitored in real time
Elseif ($EventName -eq 'UpdateDatabase') {
# Code to handle Database config update event or skip
}
# Create DB
Elseif ($EventName -eq 'CreateDatabase' -Or `
$Operation -eq 'Microsoft.Sql/servers/databases/write'){
Write-Output "Azure Database creation has been detected ..."
# Code to handle Database creation event or skip
}
# Delete DB
Elseif ($EventName -eq 'DeleteDatabase' -Or `
$Operation -eq 'Microsoft.Sql/servers/databases/delete') {
Write-Output "Azure Database has been deleted ..."
# Code to handle Database deletion event or skip
}
Elseif ($Operation -eq 'Microsoft.Sql/servers/databases/transparentDataEncryption/write') {
Write-Output "Azure Database Encryption update has been detected ..."
# Code to handle Database encryption update event or skip
}
Elseif ($Operation -eq 'Microsoft.Sql/servers/databases/auditingSettings/write') {
Write-Output "Azure Database Audit update has been detected ..."
# Code to handle Database audit update event or skip
}
Elseif ($Operation -eq 'Microsoft.Sql/servers/databases/securityAlertPolicies/write' -or $Operation -eq 'Microsoft.Sql/servers/databases/vulnerabilityAssessments/write') {
Write-Output "Azure ADS update has been detected ..."
# Code to handle ADS update event or skip
}
ElseIf ($Operation -eq 'Microsoft.Sql/servers/databases/backupShortTermRetentionPolicies/write'){
Write-Output "Azure Retention Backup has been modified ..."
# Code to handle Database retention backup update event or skip
}
# ... other ones
Else {
Write-Output "Event not managed yet "
}
else {
# ResourceType not supported
Write-Error "$ResourceType is not a supported resource type for this runbook."
}
}
}
Else {
# The alert status was not 'Activated' or 'Fired' so no action taken
Write-Verbose ("No action taken. Alert status: " + $status) -Verbose
}
}
Else{
# SchemaID doesn't correspond to azureMonitorCommonAlertSchema =>> Skip
Write-Host "Skip ..."
}
}
Else {
Write-Output "No Webhook detected ... switch to normal mode ..."

If ([String]::IsNullOrEmpty($EnvTarget)){
Write-Error '$EnvTarget is mandatory in normal mode'
}

#########################################################
# Code for a complete check of Azure SQL DB environment #
#########################################################
}

Some comments about the PowerShell script:

1) Input parameters should include either the Webhook data or specific parameter values for a complete Azure SQL DB check.
2) The first section should include your own functions to respond to different events. In our context, currently we drew on DBAChecks thinking to develop a derived model but why not using directly DBAChecks in a near future?
3) When an event is triggered, a JSON schema is generated and provides insight. The point here is you must navigate through different properties according to the operation type (cf. BOL).
4) The increase of events to manage could be a potential issue making the runbook fat especially if we keep both the core functions and event processing. To mitigate this topic, we are thinking to move functions into modules in Azure automation (next step).

Bottom line

Thanks to Azure built-in capabilities we improved our visibility of events that occur on the Azure SQL environment (both expected and unexcepted) and we’re now able to act accordingly. But I should tell you that going this way is not a free lunch and we achieved a reasonable solution after some programming and testing efforts. If you can invest time, it is probably the kind of solution you can add to your study.

See you

AAD user creation on behalf AAD Service Principal with Azure SQL DB

mikedavem — Sun, 02 Aug 2020 22:28:06 +0000

An interesting improvement was announced by the SQL AAD team on Monday 27th July 2020 and concerns the support for Azure AD user creation on behalf of Azure AD Applications for Azure SQL as mentioned to this Microsoft blog post.

In my company, this is something we were looking for a while with our database refresh process in Azure. Before talking this new feature, let me share a brief history of different considerations we had for this DB refresh process over the time with different approaches we went through. First let’s precise DB Refresh includes usually at least two steps: restoring backup / copying database – you have both ways in Azure SQL Database – and realigning security context with specific users regarding your targeted environment (ACC / INT …). But the latter is not as trivial as you may expect if you opted to use either a SQL Login / User or a Service Principal to carry out this operation in your process. Indeed, in both cases creating an Azure AD User or Group is not supported, and if you try you will face this error message:

‘’ is not a valid login or you do not have permission.

All the stuff (either Azure automation runbook and PowerShell modules on-prem) done so far and described afterwards meets the same following process:

First, we used Invoke-SQCMD in Azure Automation runbook with T-SQL query to create a copy of a source database to the target server. T-SQL is mandatory in this case as per documented in the Microsoft BOL because PROD and ACC or INT servers are not on the same subscription. Here a simplified sample of code:

...
$CopyDBCMD = @{
'Database' = 'master'
'ServerInstance' = $TargetServerName
'Username' = $SQLUser
'Password' = $SQLPWD
'Query' = 'CREATE DATABASE '+ '[' + $DatabaseName + '] ' + 'AS COPY OF ' + '[' + $SourceServerName + '].[' + $DatabaseName + ']'
}

Invoke-Sqlcmd @CopyDBCMD
...

But as you likely know, Invoke-SQLCMD doesn’t support AAD authentication and because SQL Login authentication was the only option here, it led us dealing with an annoying issue about the security configuration step with AAD users or groups as you may imagine.

Then, because we based authentication mainly on trust architecture and our security rules require using it including apps with managed identities or service principals, we wanted also to introduce this concept to our database refresh process. Fortunately, service principals are supported with Azure SQL DBs since v12 with access token for authentication by ADALSQL. The corresponding DLL is required on your server or if you use it from Azure Automation like us, we added the ADAL.PS module but be aware it is now deprecated, and I advise you to strongly invest in moving to MSAL. Here a sample we used:

...
$response = Get-ADALToken `
-ClientId $clientId `
-ClientSecret $clientSecret `
-Resource $resourceUri `
-Authority $authorityUri `
-TenantId $tenantName

...

$connectionString = "Server=tcp:$SqlInstanceFQDN,1433;Initial Catalog=master;Persist Security Info=False;MultipleActiveResultSets=False;Encrypt=True;TrustServerCertificate=False;"
# Create the connection object
$connection = New-Object System.Data.SqlClient.SqlConnection($connectionString)
# Set AAD generated token to SQL connection token
$connection.AccessToken = $response.AccessToken

Try {
$connection.Open()
...
}
...

But again, even if the copy or restore steps are well managed, we still got stuck with security reconfiguration, because service principals were not supported for creating AAD users or groups so far …

In the meantime, we found out a temporary and interesting solution based on dbatools framework and the Invoke-dbaquery command which supports AAD authentication (Login + Password). As we may not rely on service principal in this case, using a dedicated AAD account was an acceptable tradeoff to manage all the database refresh process steps. But going through this way comes with some disadvantages because running Invoke-dbaquery in a full Azure automation mode is not possible with missing ADALsql.dll. Workaround may be to use hybrid-worker, but we didn’t want to add complexity to our current architecture only for this special case. Instead we decided to move the logic of the Azure automation runbook into on-prem PowerShell framework which already include logic for DB refresh for on-prem SQL Server instances.

Here a simplified sample of code we are using:

...
Try {
# Connect to get access to Key Vault info
Connect-AzAccount | Out-Null

[String]$user = (Get-AzKeyVaultSecret -VaultName $KeyvaultName -Name "AZSQL-SQLBCKUSER").SecretValueText
[System.Security.SecureString]$pwd = ConvertTo-SecureString (Get-AzKeyVaultSecret -VaultName $KeyvaultName -Name "AZSQL-SQLBCKPWD").SecretValueText -AsPlainText -Force
[String]$SourceServerName = (Get-AzKeyVaultSecret -VaultName $KeyvaultName -Name "AZSQL-NAME").SecretValueText
[String]$TargetServerName = (Get-AzKeyVaultSecret -VaultName $KeyvaultName -Name "AZSQL-TARGETNAME").SecretValueText + '.database.windows.net'

# DB Restore will be performed in the context of dedicated AAD account
$pscredential = New-Object -TypeName System.Management.Automation.PSCredential($user, $pwd)

Write-Host "Restoring DB:$DatabaseName from Source Server: $SourceServerName to Target Server: $TargetServerName"

$Query = "CREATE DATABASE [$DatabaseName] AS COPY OF [$SourceServerName].[$DatabaseName]"
Invoke-DbaQuery `
-SqlInstance $TargetServerName `
-Database master `
-SqlCredential $pscredential `
-Query $Query `
-EnableException

# Wait for DB online and ready ...
# Code should be implemented for this check

Write-Output "Applying security configuration to DB: $DatabaseName on Server:$TargetServerName"

$Query = "
DROP USER [az_sql_ro];CREATE USER [az_sql_ro] FROM EXTERNAL PROVIDER;
"
Invoke-DbaQuery `
-SqlInstance $TargetServerName `
-Database $DatabaseName `
-SqlCredential $pscredential `
-Query $Query `
-EnableException

}
Catch {
Write-Host "Error encountered: $($_.Exception.Message)"
}
...

Referring to the PowerShell code above, in the second step, we create an AAG group [az_sql_ro] on behalf of the AAD dedicated account with the CLAUSE FROM EXTERNAL PROVIDER.

Finally, with the latest news published by the SQL AAD team, we will likely consider using back service principal instead of dedicated Windows AAD account. This Microsoft blog post explains in details how it works and what you have to setup to make it work correctly. I don’t want to duplicate what is already explained so I will apply the new stuff to my context.

Referring to the above blog post, you need first to setup a server identity for your Azure SQL Server as below:

Set-AzSqlServer `
-ResourceGroupName sandox-rg `
-ServerName a-s-sql02 `
-AssignIdentity

# Check server identity
Get-AzSqlServer `
-ResourceGroupName sandox-rg `
-ServerName a-s-sql02 | `
Select-Object ServerName, Identity

ServerName Identity
---------- --------
a-s-sql02 Microsoft.Azure.Management.Sql.Models.ResourceIdentity

Let’s have a look at the server identity

# Get identity details
$identity = Get-AzSqlServer `
-ResourceGroupName sandox-rg `
-ServerName a-s-sql02

$identity.identity

PrincipalId Type TenantId
----------- ---- --------
7f0d16f7-b172-4c97-94d3-34f0f7ed93cf SystemAssigned 2fcd19a7-ab24-4aef-802b-6851ef5d1ed5

In fact, assigning a server identity means creating a system assigned managed identity in the Azure AD tenant that’s trusted by the subscription of the instance. To keep things simple, let’s say that System Managed Identity in Azure is like to Managed Account or Group Managed Account on-prem. Those identities are self-managed by the system. Then you need to grant this identity the Azure AD « Directory Readers « permission to get rights for creating AAD Users or Groups on behalf of this identity. A PowerShell script is provided by Microsoft here a sample of code I applied in my context for testing:

...
Try {
$DatabaseName = "test-DBA"

# Connect to get access to Key Vault info
Connect-AzAccount | Out-Null

[String]$user = (Get-AzKeyVaultSecret -VaultName $KeyvaultName -Name "AZSQL-SQLBCKAPPID").SecretValueText
[System.Security.SecureString]$pwd = ConvertTo-SecureString (Get-AzKeyVaultSecret -VaultName $KeyvaultName -Name "AZSQL-SQLBCKAPPSECRET").SecretValueText -AsPlainText -Force
[String]$SourceServerName = (Get-AzKeyVaultSecret -VaultName $KeyvaultName -Name "AZSQL-NAME").SecretValueText
[String]$TargetServerName = (Get-AzKeyVaultSecret -VaultName $KeyvaultName -Name "AZSQL-TARGETNAME").SecretValueText + '.database.windows.net'

# DB Restore will be performed in the context of dedicated AAD account
$pscredential = New-Object -TypeName System.Management.Automation.PSCredential($user, $pwd)

$adalPath = "${env:ProgramFiles}\WindowsPowerShell\Modules\Az.Profile.7.0\PreloadAssemblies"
# To install the latest AzureRM.profile version execute -Install-Module -Name AzureRM.profile
$adal = "$adalPath\Microsoft.IdentityModel.Clients.ActiveDirectory.dll"
$adalforms = "$adalPath\Microsoft.IdentityModel.Clients.ActiveDirectory.WindowsForms.dll"
[System.Reflection.Assembly]::LoadFrom($adal) | Out-Null
$resourceAppIdURI = 'https://database.windows.net/'

# Set Authority to Azure AD Tenant
$authority = 'https://login.windows.net/' + $tenantId

$ClientCred = [Microsoft.IdentityModel.Clients.ActiveDirectory.ClientCredential]::new($clientId, $clientSecret)
$authContext = [Microsoft.IdentityModel.Clients.ActiveDirectory.AuthenticationContext]::new($authority)
$authResult = $authContext.AcquireTokenAsync($resourceAppIdURI,$ClientCred)
$Tok = $authResult.Result.CreateAuthorizationHeader()
$Tok=$Tok.Replace("Bearer ","")

Write-host "Token generated is ..."
$Tok
Write-host ""

Write-Host "Create SQL connectionstring"
$conn = New-Object System.Data.SqlClient.SQLConnection

$conn.ConnectionString = "Data Source=$TargetServerName;Initial Catalog=master;Connect Timeout=30"
$conn.AccessToken = $Tok

Write-host "Connect to database and execute SQL script"
$conn.Open()

Write-Host "Check connected user ..."
$Query = "SELECT USER_NAME() AS [user_name];"
$command = New-Object -TypeName System.Data.SqlClient.SqlCommand($Query, $conn)
$Command.ExecuteScalar()
$conn.Close()

Write-Host "Restoring DB:$DatabaseName from Source Server: $SourceServerName to Target Server: $TargetServerName"

$conn.ConnectionString = "Data Source=$TargetServerName;Initial Catalog=master;Connect Timeout=30"
$conn.AccessToken = $Tok
$conn.Open()
$Query = "DROP DATABASE IF EXISTS [$DatabaseName]; CREATE DATABASE [$DatabaseName] AS COPY OF [$SourceServerName].[$DatabaseName]"
$command = New-Object -TypeName System.Data.SqlClient.SqlCommand($Query, $conn)
$command.CommandTimeout = 1200
$command.ExecuteNonQuery()
$conn.Close()

# Wait for DB online and ready ...
# Code should be implemented for this check

Write-Output "Applying security configuration to DB: $DatabaseName on Server:$TargetServerName"

$conn.ConnectionString = "Data Source=$TargetServerName;Initial Catalog=$DatabaseName;Connect Timeout=30"
$conn.AccessToken = $Tok
$conn.Open()
$Query = 'CREATE USER [az_sql_ro] FROM EXTERNAL PROVIDER;'
$command = New-Object -TypeName System.Data.SqlClient.SqlCommand($Query, $conn)
$command.ExecuteNonQuery()
$conn.Close()

}
Catch {
Write-Output "Error encountered: $($_.Exception.Message)"
}
...

Using service principal required few changes in my case. I now get credentials of the service principal (ClientId and Secret) from Azure Key Vault instead of the AAD dedicated account used in previous example. I also changed the way to connect to SQL Server by relying on ADALSQL to get the access token instead of using dbatools commands. Indeed, as far as I know, dbatools doesn’t support this authentication way (yet?).

The authentication process becomes as follows:

My first test seems to be relevant:

This improvement looks promise and may cover broader scenarios as the one I described in this blog post. This feature is in preview at the moment of this write-up and I hope to see it coming soon in GA as well as a potential support of preferred PowerShell framework DBAtools

See you!

SQL Server on Linux and new FUA support for XFS filesystem

mikedavem — Mon, 13 Apr 2020 17:34:32 +0000

I wrote a (dbi services) blog post concerning Linux and SQL Server IO behavior changes before and after SQL Server 2017 CU6. Now, I was looking forward seeing some new improvements with Force Unit Access (FUA) that was implemented with Linux XFS enhancements since the Linux Kernel 4.18.

As reminder, SQL Server 2017 CU6 provides added a way to guarantee data durability by using « forced flush » mechanism explained here. To cut the story short, SQL Server has strict storage requirement such as Write Ordering, FUA and things go differently on Linux than Windows to achieve durability. What is FUA and why is it important for SQL Server? From Wikipedia: Force Unit Access (aka FUA) is an I/O write command option that forces written data all the way to stable storage. FUA appeared in the SCSI command set but good news, it was later adopted by other standards over the time. SQL Server relies on it to meet WAL and ACID capabilities.

On the Linux world and before the Kernel 4.18, FUA was handled and optimized only for the filesystem journaling. However, data storage always used the multi-step flush process that could introduce SQL Server IO storage slowness (Issue write to block device for the data + issue block device flush to ensure durability with O_DSYNC).

On the Windows world, installing and using a SQL Server instance assumes you are compliant with the Microsoft storage requirements and therefore the first RTM version shipped on Linux came only with O_DIRECT assuming you already ensure that SQL Server IO are able to be written directly into a non-volatile storage through the kernel, drivers and hardware before the acknowledgement. Forced flush mechanism – based on fdatasync() – was then introduced to address scenarios with no safe DIRECT_IO capabilities.

But referring to the Bob Dorr article, Linux Kernel 4.18 comes with XFS enhancements to handle FUA for data storage and it is obviously of benefit to SQL Server. FUA support is intended to improve write requests by shorten the path of write requests as shown below:

Picture from existing IO workflow on Bob Dorr’s article

This is an interesting improvement for write intensive workload and it seems to be confirmed from the tests performed by Microsoft and Bob Dorr in his article.

Let’s the experiment begins with my lab environment based on a Centos 7 on Hyper-V with an upgraded kernel version: 5.6.3-1.e17.elrepo.x86_64.

$uname -r
5.6.3-1.el7.elrepo.x86_64

$cat /etc/os-release | grep VERSION
VERSION="7 (Core)"
VERSION_ID="7"
CENTOS_MANTISBT_PROJECT_VERSION="7"
REDHAT_SUPPORT_PRODUCT_VERSION="7"

Let’s precise that my tests are purely experimental and instead of upgrading the Kernel to a newer version you may directly rely on RHEL 8 based distros which comes with kernel version 4.18 for example.

My lab environment includes 2 separate SSD disks to host the DATA + TLOG database files as follows:

I:\ drive : SQL Data volume (sdb – XFS filesystem)
T:\ drive : SQL TLog volume (sda – XFS filesystem)

The general performance is not so bad

Initially I just dedicated on disk for both SQL DATA and TLOG but I quickly noticed some IO waits (iostats output) leading to make me lunconfident with my test results

Spreading IO on physically separate volumes helped to reduce drastically these phenomena afterwards:

First, I enabled FUA capabilities on Hyper-V side as follows:

Set-VMHardDiskDrive -VMName CENTOS7 -ControllerType SCSI -OverrideCacheAttributes WriteCacheAndFUAEnabled

Get-VMHardDiskDrive -VMName CENTOS7 | `
ft VMName, ControllerType, ControllerLocation, Path, WriteHardeningMethod -AutoSize

Then I checked if FUA is enabled and supported from an OS perspective including sda (TLOG) and sdb (SQL DATA) disks:

$ lsblk -f
NAME FSTYPE LABEL UUID MOUNTPOINT
sdb
└─sdb1 xfs 06910f69-27a3-4711-9093-f8bf80d15d72 /sqldata
sr0
sda
├─sda2 xfs f5a9bded-130f-4642-bd6f-9f27563a4e16 /boot
├─sda3 LVM2_member QsbKEt-28yT-lpfZ-VCbj-v5W5-vnVr-2l7nih
│ ├─centos-swap swap 7eebbb32-cef5-42e9-87c3-7df1a0b79f11 [SWAP]
│ └─centos-root xfs 90f6eb2f-dd39-4bef-a7da-67aa75d1843d /
└─sda1 vfat 7529-979E /boot/efi

$ dmesg | grep sda
[ 1.665478] sd 0:0:0:0: [sda] 83886080 512-byte logical blocks: (42.9 GB/40.0 GiB)
[ 1.665479] sd 0:0:0:0: [sda] 4096-byte physical blocks
[ 1.665774] sd 0:0:0:0: [sda] Write Protect is off
[ 1.665775] sd 0:0:0:0: [sda] Mode Sense: 0f 00 10 00
[ 1.670321] sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, supports DPO and FUA
[ 1.683833] sda: sda1 sda2 sda3
[ 1.708938] sd 0:0:0:0: [sda] Attached SCSI disk
[ 5.607914] EXT4-fs (sda2): mounted filesystem with ordered data mode. Opts: (null)

Finally according to the documentation, I configured the trace flag 3979 and control.alternatewritethrough=0 parameters at startup parameters for my SQL Server instance.

$ /opt/mssql/bin/mssql-conf traceflag 3979 on

$ /opt/mssql/bin/mssql-conf set control.alternatewritethrough 0

$ systemctl restart mssql-server

The first I performed was pretty similar to those in my previous (dbi services) blog post.

CREATE TABLE dummy_test (
id INT IDENTITY,
col1 VARCHAR(2000) DEFAULT REPLICATE('T', 2000)
);

INSERT INTO dummy_test DEFAULT VALUES;
GO 67

For a sake of curiosity, I looked at the corresponding strace output:

$ cat sql_strace_fua.txt
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
78.13 360.618066 61739 5841 2219 futex
6.88 31.731833 1511040 21 15 restart_syscall
3.81 17.592176 130312 135 io_getevents
2.95 13.607314 98604 138 epoll_wait
2.88 13.313667 633984 21 21 rt_sigtimedwait
2.60 11.997925 1333103 9 nanosleep
1.79 8.279781 242 34256 gettid
0.84 3.876021 226 17124 getcpu
0.03 0.138836 347 400 sched_yield
0.01 0.062348 254 245 getrusage
0.01 0.056065 406 138 69 readv
0.01 0.038107 343 111 read
0.01 0.037883 743 51 mmap
0.01 0.037498 180 208 epoll_ctl
0.01 0.035654 517 69 writev
0.01 0.025542 370 69 io_submit
0.00 0.019760 282 70 write
0.00 0.019555 477 41 open
0.00 0.016285 1629 10 rt_sigaction
0.00 0.012359 301 41 close
0.00 0.010069 205 49 munmap
0.00 0.006977 303 23 rt_sigprocmask
0.00 0.006256 153 41 fstat
0.00 0.004646 465 10 10 stat
0.00 0.000860 215 4 madvise
0.00 0.000321 161 2 sched_setaffinity
0.00 0.000295 148 2 set_robust_list
0.00 0.000281 141 2 clone
0.00 0.000236 118 2 sigaltstack
0.00 0.000093 47 2 arch_prctl
0.00 0.000046 23 2 sched_getaffinity
------ ----------- ----------- --------- --------- ----------------
100.00 461.546755 59137 2334 total

… And as I expected, with FUA enabled no fsync() / fdatasync() called anymore and writing to a stable storage is achieved directly by FUA commands. Now iomap_dio_rw() is determining if REQ_FUA can be used and issuing generic_write_sync() is still necessary. To dig further to the IO layer we need to rely to another tool blktrace (mentioned to the Bob Dorr’s article as well).

In my case I got to different pictures of blktrace output between forced flushed mechanism (the default) and FUA oriented IO:

-> With forced flush

34.694734500 14225 18425192 8,16 0 17164 A WS 2048 sqlservr
34.694735000 14225 18425192 8,16 0 17165 Q WS 2048 sqlservr
34.694737000 14225 18425192 8,16 0 17166 X WS 1024 sqlservr
34.694738100 14225 18425192 8,16 0 17167 G WS 1024 sqlservr
34.694739800 14225 18426216 8,16 0 17169 G WS 1024 sqlservr
34.694740900 14225 18425192 8,16 0 17171 D WS 1024 sqlservr
34.694747200 14225 18426216 8,16 0 17174 D WS 1024 sqlservr
34.713665000 14225 0 8,16 0 17175 Q FWS 0 sqlservr
34.713668100 14225 0 8,16 0 17176 G FWS 0 sqlservr

WS (Write Synchronous) is performed but SQL Server still needs to go through the multi-step flush process with the additional FWS (PERFLUSH|WRITE|SYNC).

-> FUA

0.000000000 16305 55106536 8,0 0 1 A WFS 8 sqlservr
0.000000400 16305 57615336 8,0 0 2 A WFS 8 sqlservr
0.000001100 16305 57615336 8,0 0 3 Q WFS 8 sqlservr
0.000005200 16305 57615336 8,0 0 4 G WFS 8 sqlservr
0.001377800 16305 55106544 8,0 0 6 A WFS 16 sqlservr

FWS has disappeared with only WFS commands which are basically REQ_WRITE with the REQ_FUA request

I spent some times to read some interesting discussions in addition to the Bob Dorr’s wonderful article. Here an interesting pointer to a a discussion about REQ_FUA for instance.

But what about performance gain?

I had 2 simple scenarios to play with in order to bring out FUA helpfulness including the harden the dirty pages in the BP with checkpoint process and harden the log buffer to disk during the commit phase. When forced flush method is used, each component relies on additional FlushFileBuffers() function to achieve durability. This event can be easily tracked from an XE session including flush_file_buffers and make_writes_durable events.

First scenario (10K inserts within a transaction and checkpoint)

In this scenario my intention was to stress the checkpoint process with a bunch of buffers and dirty pages to flush to disk when it kicks in.

USE dummy;

SET NOCOUNT ON;
-- Disable checkpoint to control when it will kick in
DBCC TRACEON(3505);
-- Check traceflag
DBCC TRACESTATUS;

DECLARE @i INT = 0;
DECLARE @iteration INT = 0;
DECLARE @start_upd DATETIME;
DECLARE @start_chkpt DATETIME;
DECLARE @end_upd DATETIME;
DECLARE @end_chkpt DATETIME;

TRUNCATE TABLE dummy_test;

WHILE @iteration < 251
BEGIN

SET @start_upd = GETDATE();

BEGIN TRAN;

WHILE @i <= 10000
BEGIN
INSERT INTO dummy_test DEFAULT VALUES;
SET @i += 1;
END

COMMIT TRAN;

SET @end_upd = GETDATE();

SET @i = 0;

SET @start_chkpt = GETDATE();
CHECKPOINT;
SET @end_chkpt = GETDATE();
PRINT 'INS: ' + CAST(DATEDIFF(ms, @start_upd, @end_upd) AS VARCHAR(50)) + ' - CHKPT: ' + CAST(DATEDIFF(ms, @start_chkpt, @end_chkpt) AS VARCHAR(50));

SET @iteration += 1;
END

The result is as follows:

In my case, I noticed ~ 17% of improvement for the checkpoint process and ~7% for the insert transaction including the commit phase with flushing data to the TLog. In parallel, looking at the extended event aggregated output confirms that FUA avoids a lot of additional operations to persist data on disk illustrated by flush_file_buffers and make_writes_durable events.

Second scenario (100x 1 insert within a transaction and checkpoint)

In this scenario, I wanted to stress the log writer by forcing a lot of small transactions to commit. I updated the TSQL code as shown below:

USE dummy;

SET NOCOUNT ON;
-- Disable checkpoint to control when it will kick in
DBCC TRACEON(3505);
-- Check traceflag
DBCC TRACESTATUS;

DECLARE @i INT = 0;
DECLARE @iteration INT = 0;
DECLARE @start_upd DATETIME;
DECLARE @start_chkpt DATETIME;
DECLARE @end_upd DATETIME;
DECLARE @end_chkpt DATETIME;

TRUNCATE TABLE dummy_test;

WHILE @iteration < 251
BEGIN

SET @start_upd = GETDATE();

WHILE @i <= 100
BEGIN
INSERT INTO dummy_test DEFAULT VALUES;
SET @i += 1;
END

SET @end_upd = GETDATE();

SET @i = 0;

SET @start_chkpt = GETDATE();
CHECKPOINT;
SET @end_chkpt = GETDATE();
PRINT 'INS: ' + CAST(DATEDIFF(ms, @start_upd, @end_upd) AS VARCHAR(50)) + ' - CHKPT: ' + CAST(DATEDIFF(ms, @start_chkpt, @end_chkpt) AS VARCHAR(50));

SET @iteration += 1;
END

The new picture is the following:

This time the improvement is definitely more impressive with a decrease of ~80% of the execution time about the INSERT + COMMIT and ~77% concerning the checkpoint phase!!!

Looking at the extended event session confirms the shorten IO path has something to do with it

Well, shortening the IO path and relying directing on initial FUA instructions was definitely a good idea both to join performance and to meet WAL and ACID capabilities. Anyway, I’m glad to see Microsoft to contribute improving to the Linux Kernel!!!

Introducing SQL Server with Portworx and storage orchestration

mikedavem — Sun, 15 Dec 2019 22:08:03 +0000

Stateful applications like databases need special considerations on K8s world. This is because data persistence is important and we need also something at the storage layer communicating with the container orchestrator to take advantage of its scheduling capabilities. For Stateful applications, StatefulSet may be only part of the solution because it primary focuses on the Pod availability and we have to rely on the application capabilities for data replication stuff. But StatefulSet doesn’t address that of the underlying storage at all. At the moment of this write-up, StatefulSet-based solutions for SQL Server such availability groups are not supported yet on production.

So, with Stateful applications we may consider other solutions like GlusterFS or NFS as distributed storage spanning all the nodes of the K8s cluster, but they often don’t meet the requirements of a database workload running in production with high throughput and IOPS requirement and data migration.

Products exist in the market and seem to address these specific requirements and I was very curious to get a better picture of their capabilities. During my investigation, I went through a very interesting one named Portworx for a potential customer’s project. The interesting part of Portworx consists of a container-native, orchestration-aware storage fabric including the storage operation and administration inside K8s. It aggregates underlying storage and exposes it as a software-defined, programmable block device.

From a high-level perspective, Portworx is using a custom scheduler – STORK (STorage Orchestration Runtime for Kubernetes) to assist K8s in placing a Pod in the same node where the associated PVC resides. It reduces drastically some complex stuff around annotations and labeling to perfmon some affinity rules.

In this blog post, I will focus only on the high-availability topic which is addressed by Portworx with volume’s content synchronization between K8s nodes and aggregated disks. Therefore Portworx requires to define the redundancy of the dataset between replicas through a replication factor value by the way.

I cannot expose my customer’s architecture here but let’s try top apply the concept to my lab environment.

As shown above, my lab environment includes 4 k8s nodes with 3 nodes that will act as worker. Each worker node owns its local storage based on SSD disks (One for the SQL Server data files and the another one will handle Portworx metadata activity – Journal disk). After deploying Portworx on my K8s cluster here a big picture of my configuration:

$ kubectl get daemonset -n kube-system | egrep "(stork|portworx|px)"
portworx 3 3 3 3 3
portworx-api 3 3 3 3 3

Portworx is a DaemonSet-based installation. Each Portworx node will discover the availability storage to create a container-native block storage device with:
– /dev/sdb for my SQL Server data
– /dev/sdc for hosting my journal

$ kubectl get pod -n kube-system | egrep "(stork|portworx|px)"

portworx-555wf 1/1 Running 18 2d23h
portworx-api-2pv6s 1/1 Running 8 2d23h
portworx-api-s8zzr 1/1 Running 8 2d23h
portworx-api-vnqh2 1/1 Running 4 2d23h
portworx-pjxl8 1/1 Running 17 2d23h
portworx-wrcdf 1/1 Running 389 2d10h
px-lighthouse-55db75b59c-qd2nc 3/3 Running 0 35h
stork-5d568485bb-ghlt9 1/1 Running 0 35h
stork-5d568485bb-h2sqm 1/1 Running 13 2d23h
stork-5d568485bb-xxd4b 1/1 Running 1 2d4h
stork-scheduler-56574cdbb5-7td6v 1/1 Running 0 35h
stork-scheduler-56574cdbb5-skw5f 1/1 Running 4 2d4h
stork-scheduler-56574cdbb5-v5slj 1/1 Running 9 2d23h

The above picture shows different stork pods that may influence scheduling based on the location of volumes that a pod requires. In addition, the PX cluster (part of Portworx Enterprise Platform) includes all the Portworx pods and allows getting to monitor and performance insights of each related pod (SQL Server instance here).

Let’s have a look at the global configuration by using the pxctl command (first section):

$ PX_POD=$(kubectl get pods -l name=portworx -n kube-system -o jsonpath='{.items[0].metadata.name}')
$ kubectl exec $PX_POD -n kube-system -- /opt/pwx/bin/pxctl status
Status: PX is operational
License: Trial (expires in 28 days)
Node ID: 590d7afd-9d30-4624-8082-5f9cb18ecbfd
IP: 192.168.90.63
Local Storage Pool: 1 pool
POOL IO_PRIORITY RAID_LEVEL USABLE USED STATUS ZONE REGION
0 HIGH raid0 20 GiB 8.5 GiB Online default default
Local Storage Devices: 1 device
Device Path Media Type Size Last-Scan
0:1 /dev/sdb STORAGE_MEDIUM_MAGNETIC 20 GiB 08 Dec 19 21:59 UTC
total - 20 GiB
Cache Devices:
No cache devices
Journal Device:
1 /dev/sdc1 STORAGE_MEDIUM_MAGNETIC
…

Portworx has created a pool composed of my 3 replicas / Kubernetes nodes with a 20GB SSD each. I just used a default configuration without specifying any zone or region stuff for fault tolerance capabilities. This is not my focus at this moment. According to Portworx’s performance tuning documentation, I configured a journal device to improve I/O performance by offloading PX metadata writes to a separate storage.

Second section:

…
Nodes: 3 node(s) with storage (3 online)
IP ID SchedulerNodeName StorageNode Used Capacity Status StorageStatus Version Kernel OS
192.168.5.62 b0ac4fa3-29c2-40a8-9033-1d0558ec31fd k8n2.dbi-services.test Yes 3.1 GiB 20 GiB Online Up 2.3.0.0-103206b 3.10.0-1062.1.2.el7.x86_64 CentOS Linux 7 (Core)
192.168.40.61 9fc5bc45-5602-4926-ab38-c74f0a8a8b2c k8n1.dbi-services.test Yes 8.6 GiB 20 GiB Online Up 2.3.0.0-103206b 3.10.0-1062.1.2.el7.x86_64 CentOS Linux 7 (Core)
192.168.80.63 590d7afd-9d30-4624-8082-5f9cb18ecbfd k8n3.dbi-services.test Yes 8.5 GiB 20 GiB Online Up (This node) 2.3.0.0-103206b 3.10.0-1062.1.2.el7.x86_64 CentOS Linux 7 (Core)
Global Storage Pool
Total Used : 20 GiB
Total Capacity : 60 GiB

All my nodes are up for a total storage of 60 GiB. Let’s deploy a Portworx Storage Class with the following specification:

kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
name: portworx-sc
provisioner: kubernetes.io/portworx-volume
parameters:
repl: "3"
nodes: "b0ac4fa3-29c2-40a8-9033-1d0558ec31fd,9fc5bc45-5602-4926-ab38-c74f0a8a8b2c,590d7afd-9d30-4624-8082-5f9cb18ecbfd"
label: "name=mssqlvol"
fs: "xfs"
io_profile: "db"
priority_io: "high"
journal: "true"
allowVolumeExpansion: true

The important parameters are:

repl: « 3 » => Number of replicas (K8s nodes) where data will be replicated

nodes: « b0ac4fa3-29c2-40a8-9033-1d0558ec31fd,9fc5bc45-5602-4926-ab38-c74f0a8a8b2c,590d7afd-9d30-4624-8082-5f9cb18ecbfd » => Number of replicas used for data replication. Replicas are identified by their ID. Each write is synchronously replicated to a quorum set of nodes whereas read throughput is aggregated, where multiple nodes can service one read request in parallel streams.

fs: « xfs » => I used a Linux FS supported by SQL Server on Linux

io_profile: « db » => By default, Portworx is able to use some profiles according to the access pattern. Here I just forced it to use db profile that implements a write-back flush coalescing algorithm.

priority_io: « high » => I deliberately configured the IO priority value to high for my pool in order to favor maximum throughput and low latency transactional workloads. I used SSD storage accordingly.

journal: « true » => The volumes used by this storage class will use the journal dedicated device

allowVolumeExpansion: true => is an interesting parameter to allow online expansion of the concerned volume(s). As an aside, it is worth noting that volume expansion capabilities is pretty new (> v1.11+) on K8s word for the following in-tree volume plugins: AWS-EBS, GCE-PD, Azure Disk, Azure File, Glusterfs, Cinder, Portworx, and Ceph RBD

Then, let’s use Dynamic Provisioning with the following PVC specification:

kind: PersistentVolumeClaim
apiVersion: v1
metadata:
name: pvcsc001
annotations:
volume.beta.kubernetes.io/storage-class: portworx-sc
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 20Gi

Usual specification for a PVC … I just claimed 20Gi of storage based on my portworx storage class. After deploying both the Storage Class and PVC here the new picture of my configuration:

$ kubectl get sc
NAME PROVISIONER AGE
portworx-sc kubernetes.io/portworx-volume 3d14h
stork-snapshot-sc stork-snapshot 3d23h

$ kubectl get pvc
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
pvcsc001 Bound pvc-98d12db5-17ff-11ea-9d3a-00155dc4b604 20Gi RWO portworx-sc 3d13h

Note that there is also a special storage class implemention for snapshot capabilities, we will talk about this capability in next write-up. My PVC pvcs001 is ready to be used by my Stateful application. Now it’s time to deploy a Stateful application with my SQL Server pod and the specification below. Let’s say that Portworx volumes are usable for non-root execution containers when specifying fsGroup parameter (securityContext section). So,this is a good fit with the non-root execution capabilities shipped with SQL Server pod You will also notice there is no special labeling or affinity stuff between my pod and the PVC. I just defined the volume mount, the corresponding PVC and that’s it!

apiVersion: apps/v1beta1
kind: Deployment
metadata:
name: mssql-deployment
spec:
replicas: 1
template:
metadata:
labels:
app: mssql
spec:
securityContext:
runAsUser: 10001
runAsGroup: 10001
fsGroup: 10001
terminationGracePeriodSeconds: 10
containers:
- name: mssql
image: mcr.microsoft.com/mssql/server:2019-GA-ubuntu-16.04
ports:
- containerPort: 1433
env:
- name: MSSQL_PID
value: "Developer"
- name: ACCEPT_EULA
value: "Y"
- name: MSSQL_SA_PASSWORD
valueFrom:
secretKeyRef:
name: sql-secrets
key: sapassword
volumeMounts:
- name: mssqldb
mountPath: /var/opt/mssql
resources:
limits:
cpu: "3500m"
requests:
cpu: "2000m"
volumes:
- name: mssqldb
persistentVolumeClaim:
claimName: pvcsc001

---
apiVersion: v1
kind: Service
metadata:
name: mssql-deployment
spec:
selector:
app: mssql
ports:
- protocol: TCP
port: 1470
targetPort: 1433
type: LoadBalancer

Let’s take a look at the deployment status:

$ kubectl get deployment,pod,svc
NAME READY UP-TO-DATE AVAILABLE AGE
deployment.extensions/mssql-deployment 1/1 1 1 3d7h

NAME READY STATUS RESTARTS AGE
pod/mssql-deployment-67fdd4759-vtzmz 1/1 Running 0 45m

NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/kubernetes ClusterIP 10.96.0.1 443/TCP 4d
service/mssql-deployment LoadBalancer 10.98.246.160 192.168.40.61 1470:32374/TCP 3d7h

We’re now ready to test the HA capabilities of Portworx! Let’s see how STORK influences the scheduling to get my SQL Server pod on the same node where my PVC resides. The pxctl command provides different options to get information about the PX cluster and volumes as well as configuration and management capabilities. Here a picture of my volumes:

$ kubectl exec $PX_POD -n kube-system -- /opt/pwx/bin/pxctl volume list
ID NAME SIZE HA SHARED ENCRYPTED IO_PRIORITY STATUS SNAP-ENABLED
675137742462835449 pvc-98d12db5-17ff-11ea-9d3a-00155dc4b604 20 GiB 2 no no HIGH up - attached on 192.168.40.61 no
$ kubectl get pod -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
mssql-deployment-67fdd4759-vtzmz 1/1 Running 0 48m 172.16.160.54 k8n1.dbi-services.test

My SQL Server pod and my Portworx storage sit together on the K8n1.dbi-services.test node. The PX web console is also available and provides the same kind of information as pxctl command does.

Let’s now simulate the K8n1.dbi-services.test node failure. In this scenario both my PVC and my SQL Server pod are going to move to the next available – K8n2 (192.168.20.62). This is where STORK comes into play to stick my pod with my PVC location.

…

$ kubectl get pod -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
mssql-deployment-67fdd4759-rbxcb 1/1 Running 0 31m 172.16.197.157 k8n2.dbi-services.test

Another important point, my SQL Server data survived to my pod restart and remained available through my SQL Server instance as expected !! It was a short introduction to Portworx capabilities here and I will continue to share about it in a near future!

See you !

David Barbarin

SQL Server sur Docker Swarm

mikedavem — Mon, 12 Feb 2018 17:51:48 +0000

SQL Server 2017 est disponible sur de multiples plateformes: Windows, Linux et Docker. La dernière plateforme fournit des fonctionnalités de containerisation avec setup rapide et sans prérequis spécifiques avant d’exécuter des bases de données SQL Server qui sont probablement la clé du succès pour les développeurs.

> Lire la suite (en anglais)

David Barbarin
MVP & MCM SQL Server

Prochaine édition des 24 HOP 2017 francophone

mikedavem — Tue, 02 Jan 2018 17:36:07 +0000

La prochaine édition du 24 Hours of PASS 2017 edition francophone se déroulera les 29-30 juin prochain.

Pour rappel le format est simple: 24 webinars gratuits répartis sur 2 jours de 07:00 à 18h00 GMT et en Français. La seule obligation: s’inscrire aux sessions auxquelles vous assisterez. Cela vous permettra également de récupérer l’enregistrement vidéo si vous voulez la visionner à nouveau par la suite.

Cette année il y en aura encore pour tous les goûts. Du monitoring, de la performance, de l’Azure, de la BI, du BigData et machine learning, de la modélisation, de la haute disponibilité, de l’open source et des nouveautés concernant la prochaine version de SQL Server!

Pour ma part j’aurai le privilège de présenter une session concernant les nouvelles possibilités en terme de haute disponibilité avec SQL Server dans un monde mixte (Windows et Linux) et un monde “full Linux”.

Au plaisir de vous y retrouver!

David Barbarin
MVP & MCM SQL Server

SQL Server 2017 – Linux et scénarios log shipping

mikedavem — Tue, 02 Jan 2018 17:30:24 +0000

Dans ce billet, nous aborderons la fonctionnalité logshipping disponible depuis SQL Server CTP 2.0. Précisons d’abord que c’est une fonctionnalité HA OS-agnostic et qu’il est possible de créer sa propre solution custom même sous Linux (via des jobs cron par exemple). Mais il s’agit ici de parler de la solution out-of-the-box disponible maintenant sous Linux …

> Lire la suite (en anglais)

David Barbarin
MVP & MCM SQL Server