<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>David Barbarin &#187; SQL Server</title>
	<atom:link href="https://blog.developpez.com/mikedavem/ptag/sql-server/feed" rel="self" type="application/rss+xml" />
	<link>https://blog.developpez.com/mikedavem</link>
	<description>MVP DataPlatform - MCM SQL Server</description>
	<lastBuildDate>Thu, 09 Sep 2021 21:19:50 +0000</lastBuildDate>
	<language>fr-FR</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>https://wordpress.org/?v=4.1.42</generator>
	<item>
		<title>Graphing SQL Server wait stats on Prometheus and Grafana</title>
		<link>https://blog.developpez.com/mikedavem/p13209/devops/graphing-sql-server-wait-stats-on-prometheus-and-grafana</link>
		<comments>https://blog.developpez.com/mikedavem/p13209/devops/graphing-sql-server-wait-stats-on-prometheus-and-grafana#comments</comments>
		<pubDate>Thu, 09 Sep 2021 21:19:22 +0000</pubDate>
		<dc:creator><![CDATA[mikedavem]]></dc:creator>
				<category><![CDATA[DevOps]]></category>
		<category><![CDATA[Performance]]></category>
		<category><![CDATA[grafana]]></category>
		<category><![CDATA[monitoring]]></category>
		<category><![CDATA[observability]]></category>
		<category><![CDATA[prometheus]]></category>
		<category><![CDATA[prompQL]]></category>
		<category><![CDATA[SQL Server]]></category>
		<category><![CDATA[telegraf]]></category>

		<guid isPermaLink="false">http://blog.developpez.com/mikedavem/?p=1816</guid>
		<description><![CDATA[Wait stats are essential performance metrics for diagnosing SQL Server Performance problems. Related metrics can be monitored from different DMVs including sys.dm_os_wait_stats and sys.dm_db_wait_stats (Azure). As you probably know, there are 2 categories of DMVs in SQL Server: Point in &#8230; <a href="https://blog.developpez.com/mikedavem/p13209/devops/graphing-sql-server-wait-stats-on-prometheus-and-grafana">Lire la suite <span class="meta-nav">&#8594;</span></a>]]></description>
				<content:encoded><![CDATA[<p>Wait stats are essential performance metrics for diagnosing SQL Server Performance problems. Related metrics can be monitored from different DMVs including sys.dm_os_wait_stats and sys.dm_db_wait_stats (Azure).</p>
<p>As you probably know, there are 2 categories of DMVs in SQL Server: Point in time versus cumulative and DMVs mentioned previously are in the second category. It means data in these DMVs are accumulative and incremented every time wait events occur. Values reset only when SQL Server restarts or when you intentionally run DBCC SQLPERF command. Baselining these metric values require taking snapshots to compare day-to-day activity or maybe simply trends for a given timeline.  Paul Randal kindly provided a TSQL script for trend analysis in a specified time range in this <a href="https://www.sqlskills.com/blogs/paul/capturing-wait-statistics-period-time/" rel="noopener" target="_blank">blog post</a>.  The interesting part of this script is the focus of most relevant wait types and corresponding statistics. This is basically the kind of scripts I used for many years when I performed SQL Server audits at customer shops but today working as database administrator for a company, I can rely on our observability stack that includes Telegraf / Prometheus and Grafana to do the job.</p>
<p><span id="more-1816"></span></p>
<p>In a previous <a href="https://blog.developpez.com/mikedavem/p13203/sql-server-2014/why-we-moved-sql-server-monitoring-on-prometheus-and-grafana" rel="noopener" target="_blank">write-up</a>, I explained the choice of such platform for SQL Server. But transposing the Paul’s script logic to Prometheus and Grafana was not a trivial stuff, but the result was worthy. It was an interesting topic that I want to share with Ops and DBA who wants to baseline SQL Server telemetry on Prometheus and Grafana observability platform.  </p>
<p>So, let’s start with metrics provided by Telegraf collector agent and then scraped by Prometheus job:<br />
&#8211;	sqlserver_waitstats_wait_time_ms<br />
&#8211;	sqlserver_waitstats_waiting_tasks_count<br />
&#8211;	sqlserver_waitstats_resource_wait_time_ms<br />
&#8211;	sqlserver_waitstats_signe_wait_time_ms</p>
<p>In the context of the blog post we will focus only on the first 2 ones of the above list, but the same logic applies for others. </p>
<p>As a reminder, we want to graph most relevant wait types and their average value within a time range specified in a Grafana dashboard. In fact, this is a 2 steps process: </p>
<p>1) Identifying most relevant wait types by computing their ratio with the total amount of wait time within the specific time range.<br />
2) Graphing in Grafana these most relevant wait types with their corresponding average value for every Prometheus step in the time range.</p>
<p>To address the first point, we need to rely on special Prometheus <a href="https://prometheus.io/docs/prometheus/latest/querying/functions/#rate" rel="noopener" target="_blank">rate()</a> function and <a href="https://prometheus.io/docs/prometheus/latest/querying/operators/" rel="noopener" target="_blank">group_left</a> modifier. </p>
<p>As per the Prometheus documentation, rate() gives the per second average rate of change over the specified range interval by using the boundary metric points in it. That is exactly what we need to compute the total average of wait time (in ms) per wait type in a specified time range. rate() needs a range vector as input. Let’s illustrate what is a range vector with the following example. For a sake of simplicity, I filtered with sqlserver_waitstats_wait_time_ms metric to one specific SQL Server instance and wait type (PAGEIOLATCH_EX). Range vector is expressed with a range interval at the end of the query as you can see below:</p>
<div class="codecolorer-container text default" style="overflow:auto;white-space:nowrap;border:1px solid #9F9F9F;width:650px;"><div class="text codecolorer" style="padding:5px;font:normal 12px/1.4em Monaco, Lucida Console, monospace;white-space:nowrap">sqlserver_waitstats_wait_time_ms{sql_instance=&quot;$Instance&quot;,wait_type=&quot;PAGEIOLATCH_EX&quot;}[1m]</div></div>
<p>The result is a set of data metrics within the specified range interval as show below:</p>
<p><a href="http://blog.developpez.com/mikedavem/files/2021/09/blog-177-range-vector.png"><img src="http://blog.developpez.com/mikedavem/files/2021/09/blog-177-range-vector.png" alt="blog 177 - range vector" width="238" height="256" class="alignnone size-full wp-image-1818" /></a></p>
<p>We got for each data metric the value and the corresponding timestamp in epoch format.  You can convert this epoch format to user friendly one by using <strong>date -r -j </strong> for example. Another important point here: The sqlserver_waitstats_wait_time_ms metric is a counter in Prometheus world because value keeps increasing over the time as you can see above (from top to bottom). The same concept exists in SQL Server with cumulative DMV category as explained at the beginning. It explains why we need to use rate() function for drawing the right representation of increase / decrease rate over the time between data metric points. We got 12 data metrics with an interval of 5s between each value. This is because in my context we defined a Prometheus scrape interval of 5s for SQL Server =&gt; 60s/5s = 12 data points and 11 steps. The next question is how rate calculates per-second rate of change between data points. Referring to my previous example, I can get the rate value by using the following prompQL query:</p>
<div class="codecolorer-container text default" style="overflow:auto;white-space:nowrap;border:1px solid #9F9F9F;width:650px;"><div class="text codecolorer" style="padding:5px;font:normal 12px/1.4em Monaco, Lucida Console, monospace;white-space:nowrap">rate(sqlserver_waitstats_wait_time_ms{sql_instance=&quot;$Instance&quot;,wait_type=&quot;PAGEIOLATCH_EX&quot;}[1m])</div></div>
<p>&#8230; and the corresponding value:</p>
<p><a href="http://blog.developpez.com/mikedavem/files/2021/09/blog-177-rate-value.png"><img src="http://blog.developpez.com/mikedavem/files/2021/09/blog-177-rate-value.png" alt="blog 177 - rate value" width="211" height="67" class="alignnone size-full wp-image-1820" /></a></p>
<p>To understand this value, let’s have a good reminder of mathematic lesson at school with <a href="https://en.wikipedia.org/wiki/Slope" rel="noopener" target="_blank">slope calculation</a>. </p>
<p><a href="http://blog.developpez.com/mikedavem/files/2021/09/Tangent_function_animation.gif"><img src="http://blog.developpez.com/mikedavem/files/2021/09/Tangent_function_animation.gif" alt="Tangent_function_animation" width="300" height="285" class="alignnone size-full wp-image-1823" /></a></p>
<p><em>Image from Wikipedia</em></p>
<p>The basic idea of slope value is to find the rate of change of one variable compared to another. Less the distance between two data points we have, more chance we have to get a precise approximate value of the slope. And this is exactly what it is happening with Prometheus when you zoom in or out by changing the range interval. A good resolution is also determined by the Prometheus scraping interval especially when your metrics are extremely volatile. This is something to keep in mind with Prometheus. We are working with approximation by design. So let&rsquo;s do some math with a slope calculation of the above range vector:</p>
<p>Slope = DV/DT = (332628-332582)/(@1631125796.971 &#8211; @1631125746.962) =~ 0.83</p>
<p>Excellent! This is how rate() works and the beauty of this function is that slope calculation is doing automatically for all the steps within the range interval.</p>
<p>But let’s go back to the initial requirement. We need to calculate per wait type the average value of wait time between the first and last point in the specified range vector. We can now step further by using Prometheus aggregation operator as follows:</p>
<div class="codecolorer-container text default" style="overflow:auto;white-space:nowrap;border:1px solid #9F9F9F;width:650px;"><div class="text codecolorer" style="padding:5px;font:normal 12px/1.4em Monaco, Lucida Console, monospace;white-space:nowrap">sum by (wait_type) (rate(sqlserver_waitstats_wait_time_ms{sql_instance=&quot;$Instance&quot;}[1m]))</div></div>
<p>Please note we could have written it another way without using the sum by aggregator but it allows naturally to exclude all unwanted labels for the result metric. It will be particularly helpful for the next part. Anyway, Here a sample of the output:</p>
<p><a href="http://blog.developpez.com/mikedavem/files/2021/09/blog-177-aggregation-by-waittype.png"><img src="http://blog.developpez.com/mikedavem/files/2021/09/blog-177-aggregation-by-waittype-1024x145.png" alt="blog 177 - aggregation by waittype" width="584" height="83" class="alignnone size-large wp-image-1826" /></a></p>
<p>Then we can compute label (wait type) ratio (or percentage). First attempt and naïve approach could be as follows:</p>
<div class="codecolorer-container text default" style="overflow:auto;white-space:nowrap;border:1px solid #9F9F9F;width:650px;"><div class="text codecolorer" style="padding:5px;font:normal 12px/1.4em Monaco, Lucida Console, monospace;white-space:nowrap">sum by (wait_type) (rate(sqlserver_waitstats_wait_time_ms{sql_instance=&quot;$Instance&quot;}[1m]))/ sum(rate(sqlserver_waitstats_wait_time_ms{sql_instance='$Instance'}[1m]))</div></div>
<p>But we get empty query result. Bad joke right? We need to understand that. </p>
<p>First part of the query gives total amount of wait time per wait type. I put a sample of the results for simplicity:</p>
<p><a href="http://blog.developpez.com/mikedavem/files/2021/09/blog-177-aggregation-by-waittype1.png"><img src="http://blog.developpez.com/mikedavem/files/2021/09/blog-177-aggregation-by-waittype1-1024x145.png" alt="blog 177 - aggregation by waittype" width="584" height="83" class="alignnone size-large wp-image-1828" /></a></p>
<p>It results a new set of metrics with only one label for wait_type. Second part gives to total amount of wait time for all wait types as show below:</p>
<p><a href="http://blog.developpez.com/mikedavem/files/2021/09/blog-177-total-waits.png"><img src="http://blog.developpez.com/mikedavem/files/2021/09/blog-177-total-waits.png" alt="blog 177 - total waits" width="479" height="39" class="alignnone size-full wp-image-1829" /></a></p>
<p>With SQL statement, we instinctively select columns that have matching values in concerned tables. Those columns are often concerned by primary or foreign keys. In Prometheus world, vector matching is performing the same way by using all labels at the starting point. But samples are selected or dropped from the result vector based either on &laquo;&nbsp;ignoring&nbsp;&raquo; and &laquo;&nbsp;on&nbsp;&raquo; keywords. In my case, they are no matching labels so we must tell Prometheus to ignore the remaining label (wait_type) on the first part of the query:</p>
<div class="codecolorer-container text default" style="overflow:auto;white-space:nowrap;border:1px solid #9F9F9F;width:650px;"><div class="text codecolorer" style="padding:5px;font:normal 12px/1.4em Monaco, Lucida Console, monospace;white-space:nowrap">sum by (wait_type) (rate(sqlserver_waitstats_wait_time_ms{sql_instance=&quot;$Instance&quot;}[1m]))/ ignoring(wait_type) sum(rate(sqlserver_waitstats_wait_time_ms{sql_instance='$Instance'}[1m]))</div></div>
<p>But another error message &#8230;</p>
<p><strong>Error executing query: multiple matches for labels: many-to-one matching must be explicit (group_left/group_right)</strong></p>
<p>In the many-to -one or one-to-many vector matching with Prometheus, samples are selected using keywords like group_left or group_right. In other words, we are telling Prometheus to perform a cross join in this case with this final query before performing division between values:</p>
<div class="codecolorer-container text default" style="overflow:auto;white-space:nowrap;border:1px solid #9F9F9F;width:650px;"><div class="text codecolorer" style="padding:5px;font:normal 12px/1.4em Monaco, Lucida Console, monospace;white-space:nowrap">sum by (wait_type) (rate(sqlserver_waitstats_wait_time_ms{sql_instance=&quot;$Instance&quot;}[1m]))/ ignoring(wait_type) group_left sum(rate(sqlserver_waitstats_wait_time_ms{sql_instance='$Instance'}[1m]))</div></div>
<p>Here we go!</p>
<p><a href="http://blog.developpez.com/mikedavem/files/2021/09/blog-177-ratio-per-label.png"><img src="http://blog.developpez.com/mikedavem/files/2021/09/blog-177-ratio-per-label-1024x149.png" alt="blog 177 - ratio per label" width="584" height="85" class="alignnone size-large wp-image-1830" /></a></p>
<p>We finally managed to calculate ratio per wait type with a specified range interval. Last thing is to select most relevant wait types by excluding first irrelevant wait types. Most of wait types come from the exclusion list provided by Paul Randal’s script. We also decided to only focus on max top 5 wait types with ratio  &gt; 10% but it is up to you to change these values:</p>
<div class="codecolorer-container text default" style="overflow:auto;white-space:nowrap;border:1px solid #9F9F9F;width:650px;"><div class="text codecolorer" style="padding:5px;font:normal 12px/1.4em Monaco, Lucida Console, monospace;white-space:nowrap">topk(5, sum by (wait_type) (rate(sqlserver_waitstats_wait_time_ms{sql_instance='$Instance',measurement_db_type=&quot;SQLServer&quot;,wait_type!~'(BROKER_EVENTHANDLER|BROKER_RECEIVE_WAITFOR|BROKER_TASK_STOP|BROKER_TO_FLUSH|BROKER_TRANSMITTER|CHECKPOINT_QUEUE|CHKPT|CLR_AUTO_EVENT|CLR_MANUAL_EVENT|CLR_SEMAPHORE|DBMIRROR_DBM_EVENT|DBMIRROR_EVENTS_QUEUE|DBMIRROR_WORKER_QUEUE|DBMIRRORING_CMD|DIRTY_PAGE_POLL|DISPATCHER_QUEUE_SEMAPHORE|EXECSYNC|FSAGENT|FT_IFTS_SCHEDULER_IDLE_WAIT|FT_IFTSHC_MUTEX|KSOURCE_WAKEUP|LAZYWRITER_SLEEP|LOGMGR_QUEUE|MEMORY_ALLOCATION_EXT|ONDEMAND_TASK_QUEUE|PARALLEL_REDO_DRAIN_WORKER|PARALLEL_REDO_LOG_CACHE|PARALLEL_REDO_TRAN_LIST|PARALLEL_REDO_WORKER_SYNC|PARALLEL_REDO_WORKER_WAIT_WORK|PREEMPTIVE_OS_FLUSHFILEBUFFERS|PREEMPTIVE_XE_GETTARGETSTATE|PWAIT_ALL_COMPONENTS_INITIALIZED|PWAIT_DIRECTLOGCONSUMER_GETNEXT|QDS_PERSIST_TASK_MAIN_LOOP_SLEEP|QDS_ASYNC_QUEUE|QDS_CLEANUP_STALE_QUERIES_TASK_MAIN_LOOP_SLEEP|QDS_SHUTDOWN_QUEUE|REDO_THREAD_PENDING_WORK|REQUEST_FOR_DEADLOCK_SEARCH|RESOURCE_QUEUE|SERVER_IDLE_CHECK|SLEEP_BPOOL_FLUSH|SLEEP_DBSTARTUP|SLEEP_DCOMSTARTUP|SLEEP_MASTERDBREADY|SLEEP_MASTERMDREADY|SLEEP_MASTERUPGRADED|SLEEP_MSDBSTARTUP|SLEEP_SYSTEMTASK|SLEEP_TASK|SLEEP_TEMPDBSTARTUP|SNI_HTTP_ACCEPT|SOS_WORK_DISPATCHER|SP_SERVER_DIAGNOSTICS_SLEEP|SQLTRACE_BUFFER_FLUSH|SQLTRACE_INCREMENTAL_FLUSH_SLEEP|SQLTRACE_WAIT_ENTRIES|VDI_CLIENT_OTHER|WAIT_FOR_RESULTS|WAITFOR|WAITFOR_TASKSHUTDOW|WAIT_XTP_RECOVERY|WAIT_XTP_HOST_WAIT|WAIT_XTP_OFFLINE_CKPT_NEW_LOG|WAIT_XTP_CKPT_CLOSE|XE_DISPATCHER_JOIN|XE_DISPATCHER_WAIT|XE_TIMER_EVENT|MEMORY_ALLOCATION_EXT|ONDEMAND_TASK_QUEUE|PREEMPTIVE_HADR_LEASE_MECHANISM|PREEMPTIVE_SP_SERVER_DIAGNOSTICS|PREEMPTIVE_ODBCOPS|PREEMPTIVE_OS_LIBRARYOPS|PREEMPTIVE_OS_COMOPS|PREEMPTIVE_OS_CRYPTOPS|PREEMPTIVE_OS_PIPEOPS|PREEMPTIVE_OS_AUTHENTICATIONOPS|PREEMPTIVE_OS_GENERICOPS|PREEMPTIVE_OS_VERIFYTRUST|PREEMPTIVE_OS_FILEOPS|PREEMPTIVE_OS_DEVICEOPS|PREEMPTIVE_OS_QUERYREGISTRY|PREEMPTIVE_OS_WRITEFILE|PREEMPTIVE_XE_CALLBACKEXECUTEPREEMPTIVE_XE_DISPATCHER|PREEMPTIVE_XE_GETTARGETSTATEPREEMPTIVE_XE_SESSIONCOMMIT|PREEMPTIVE_XE_TARGETINITPREEMPTIVE_XE_TARGETFINALIZE|PREEMPTIVE_XHTTP|PWAIT_EXTENSIBILITY_CLEANUP_TASK|PREEMPTIVE_OS_DISCONNECTNAMEDPIPE|PREEMPTIVE_OS_DELETESECURITYCONTEXT|PREEMPTIVE_OS_CRYPTACQUIRECONTEXT|PREEMPTIVE_HTTP_REQUEST|RESOURCE_GOVERNOR_IDLE|HADR_FABRIC_CALLBACK|PVS_PREALLOCATE)'}[1m])) / ignoring(wait_type) group_left sum(rate(sqlserver_waitstats_wait_time_ms{sql_instance='$Instance',measurement_db_type=&quot;SQLServer&quot;,wait_type!~'(BROKER_EVENTHANDLER|BROKER_RECEIVE_WAITFOR|BROKER_TASK_STOP|BROKER_TO_FLUSH|BROKER_TRANSMITTER|CHECKPOINT_QUEUE|CHKPT|CLR_AUTO_EVENT|CLR_MANUAL_EVENT|CLR_SEMAPHORE|DBMIRROR_DBM_EVENT|DBMIRROR_EVENTS_QUEUE|DBMIRROR_WORKER_QUEUE|DBMIRRORING_CMD|DIRTY_PAGE_POLL|DISPATCHER_QUEUE_SEMAPHORE|EXECSYNC|FSAGENT|FT_IFTS_SCHEDULER_IDLE_WAIT|FT_IFTSHC_MUTEX|KSOURCE_WAKEUP|LAZYWRITER_SLEEP|LOGMGR_QUEUE|MEMORY_ALLOCATION_EXT|ONDEMAND_TASK_QUEUE|PARALLEL_REDO_DRAIN_WORKER|PARALLEL_REDO_LOG_CACHE|PARALLEL_REDO_TRAN_LIST|PARALLEL_REDO_WORKER_SYNC|PARALLEL_REDO_WORKER_WAIT_WORK|PREEMPTIVE_OS_FLUSHFILEBUFFERS|PREEMPTIVE_XE_GETTARGETSTATE|PWAIT_ALL_COMPONENTS_INITIALIZED|PWAIT_DIRECTLOGCONSUMER_GETNEXT|QDS_PERSIST_TASK_MAIN_LOOP_SLEEP|QDS_ASYNC_QUEUE|QDS_CLEANUP_STALE_QUERIES_TASK_MAIN_LOOP_SLEEP|QDS_SHUTDOWN_QUEUE|REDO_THREAD_PENDING_WORK|REQUEST_FOR_DEADLOCK_SEARCH|RESOURCE_QUEUE|SERVER_IDLE_CHECK|SLEEP_BPOOL_FLUSH|SLEEP_DBSTARTUP|SLEEP_DCOMSTARTUP|SLEEP_MASTERDBREADY|SLEEP_MASTERMDREADY|SLEEP_MASTERUPGRADED|SLEEP_MSDBSTARTUP|SLEEP_SYSTEMTASK|SLEEP_TASK|SLEEP_TEMPDBSTARTUP|SNI_HTTP_ACCEPT|SOS_WORK_DISPATCHER|SP_SERVER_DIAGNOSTICS_SLEEP|SQLTRACE_BUFFER_FLUSH|SQLTRACE_INCREMENTAL_FLUSH_SLEEP|SQLTRACE_WAIT_ENTRIES|VDI_CLIENT_OTHER|WAIT_FOR_RESULTS|WAITFOR|WAITFOR_TASKSHUTDOW|WAIT_XTP_RECOVERY|WAIT_XTP_HOST_WAIT|WAIT_XTP_OFFLINE_CKPT_NEW_LOG|WAIT_XTP_CKPT_CLOSE|XE_DISPATCHER_JOIN|XE_DISPATCHER_WAIT|XE_TIMER_EVENT|MEMORY_ALLOCATION_EXT|ONDEMAND_TASK_QUEUE|PREEMPTIVE_HADR_LEASE_MECHANISM|PREEMPTIVE_SP_SERVER_DIAGNOSTICS|PREEMPTIVE_ODBCOPS|PREEMPTIVE_OS_LIBRARYOPS|PREEMPTIVE_OS_COMOPS|PREEMPTIVE_OS_CRYPTOPS|PREEMPTIVE_OS_PIPEOPS|PREEMPTIVE_OS_AUTHENTICATIONOPS|PREEMPTIVE_OS_GENERICOPS|PREEMPTIVE_OS_VERIFYTRUST|PREEMPTIVE_OS_FILEOPS|PREEMPTIVE_OS_DEVICEOPS|PREEMPTIVE_OS_QUERYREGISTRY|PREEMPTIVE_OS_WRITEFILE|PREEMPTIVE_XE_CALLBACKEXECUTEPREEMPTIVE_XE_DISPATCHER|PREEMPTIVE_XE_GETTARGETSTATEPREEMPTIVE_XE_SESSIONCOMMIT|PREEMPTIVE_XE_TARGETINITPREEMPTIVE_XE_TARGETFINALIZE|PREEMPTIVE_XHTTP|PWAIT_EXTENSIBILITY_CLEANUP_TASK|PREEMPTIVE_OS_DISCONNECTNAMEDPIPE|PREEMPTIVE_OS_DELETESECURITYCONTEXT|PREEMPTIVE_OS_CRYPTACQUIRECONTEXT|PREEMPTIVE_HTTP_REQUEST|RESOURCE_GOVERNOR_IDLE|HADR_FABRIC_CALLBACK|PVS_PREALLOCATE)'}[1m]))) &amp;gt;= 0.1</div></div>
<p>I got 3 relevant wait types with their correspond ratio in the specified time range.</p>
<p><a href="http://blog.developpez.com/mikedavem/files/2021/09/blog-177-ratio-per-label-top-5.png"><img src="http://blog.developpez.com/mikedavem/files/2021/09/blog-177-ratio-per-label-top-5-1024x67.png" alt="blog 177 - ratio per label top 5" width="584" height="38" class="alignnone size-large wp-image-1832" /></a></p>
<p>Pretty cool stuff but we must now to go through the second requirement. We want to graph the average value of the identified wait types within a specified time range in Grafana dashboard. First thing consists in including the above Prometheus query as variable in the Grafana dashboard. Here how I setup my Top5Waits variable in Grafana:</p>
<p><a href="http://blog.developpez.com/mikedavem/files/2021/09/blog-177-granafa-top5waits.png"><img src="http://blog.developpez.com/mikedavem/files/2021/09/blog-177-granafa-top5waits-1024x501.png" alt="blog 177 - granafa top5waits" width="584" height="286" class="alignnone size-large wp-image-1833" /></a></p>
<p>Some interesting points here: variable dependency kicks in with my $Top5Waits variable that depends hierarchically on another $Instance variable in my dashboard (from another Prometheus query). You probably have noticed the use of [${__range_s}s] to determine the range interval but depending on the Grafana $__interval may be a good fit as well. </p>
<p>In turn, $Top5Waits can be used from another query but this time directly in a Grafana dashboard panel with the average value of most relevant wait types as shown below:</p>
<p><a href="http://blog.developpez.com/mikedavem/files/2021/09/blog-177-grafana-avg-wait-stats.png"><img src="http://blog.developpez.com/mikedavem/files/2021/09/blog-177-grafana-avg-wait-stats-1024x400.png" alt="blog 177 - grafana avg wait stats" width="584" height="228" class="alignnone size-large wp-image-1834" /></a></p>
<p>Calculating wait type average is not a hard task by itself. In fact, we can apply the same methods than previously by matching the sqlserver_waitstats_wait_tine_ms and sqlserver_waitstats_waiting_task_count and to divide their corresponding values to obtain the average wait time (in ms) for each step within the time range (remember how the rate () function works). Both metrics own the same set of labels, so we don’t need to use &laquo;&nbsp;on&nbsp;&raquo; or &laquo;&nbsp;ignoring&nbsp;&raquo; keywords in this case. But we must introduce the $Top5Waits variable in the label filter in the first metric as follows:</p>
<div class="codecolorer-container text default" style="overflow:auto;white-space:nowrap;border:1px solid #9F9F9F;width:650px;"><div class="text codecolorer" style="padding:5px;font:normal 12px/1.4em Monaco, Lucida Console, monospace;white-space:nowrap">rate(sqlserver_waitstats_wait_time_ms{sql_instance='$Instance',wait_type=~&quot;$Top5Waits&quot;,measurement_db_type=&quot;SQLServer&quot;}[$__rate_interval])/rate(sqlserver_waitstats_waiting_tasks_count{sql_instance='$Instance',wait_type=~&quot;$Top5Waits&quot;,measurement_db_type=&quot;SQLServer&quot;}[$__rate_interval])</div></div>
<p>We finally managed to get an interesting dynamic measurement of SQL Server telemetry wait stats. Hope this blog post helps!<br />
Let me know your feedback if your are using SQL Server wait stats in Prometheus and Grafana in a different way !</p>
]]></content:encoded>
			<wfw:commentRss></wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Creating dynamic Grafana dashboard for SQL Server</title>
		<link>https://blog.developpez.com/mikedavem/p13207/sql-server-2008-r2/creating-dynamic-grafana-dashboard-for-sql-server</link>
		<comments>https://blog.developpez.com/mikedavem/p13207/sql-server-2008-r2/creating-dynamic-grafana-dashboard-for-sql-server#comments</comments>
		<pubDate>Sun, 11 Apr 2021 19:52:09 +0000</pubDate>
		<dc:creator><![CDATA[mikedavem]]></dc:creator>
				<category><![CDATA[DevOps]]></category>
		<category><![CDATA[SQL Server 2008 R2]]></category>
		<category><![CDATA[SQL Server 2014]]></category>
		<category><![CDATA[SQL Server 2016]]></category>
		<category><![CDATA[SQL Server 2017]]></category>
		<category><![CDATA[SQL Server 2019]]></category>
		<category><![CDATA[AlwaysOn;groupes de disponibilité;availability groups]]></category>
		<category><![CDATA[grafana]]></category>
		<category><![CDATA[monitoring]]></category>
		<category><![CDATA[observability]]></category>
		<category><![CDATA[prometheus]]></category>
		<category><![CDATA[SQL Server]]></category>

		<guid isPermaLink="false">http://blog.developpez.com/mikedavem/?p=1784</guid>
		<description><![CDATA[A couple of months ago I wrote about “Why we moved SQL Server monitoring to Prometheus and Grafana”. I talked about the creation of two dashboards. The first one is blackbox monitoring-oriented and aims to spot in (near) real-time resource &#8230; <a href="https://blog.developpez.com/mikedavem/p13207/sql-server-2008-r2/creating-dynamic-grafana-dashboard-for-sql-server">Lire la suite <span class="meta-nav">&#8594;</span></a>]]></description>
				<content:encoded><![CDATA[<p>A couple of months ago I wrote about “<a href="https://blog.developpez.com/mikedavem/p13203/sql-server-2014/why-we-moved-sql-server-monitoring-on-prometheus-and-grafana" rel="noopener" target="_blank">Why we moved SQL Server monitoring to Prometheus and Grafana</a>”. I talked about the creation of two dashboards. The first one is blackbox monitoring-oriented and aims to spot in (near) real-time resource pressure / saturation issues with self-explained gauges, numbers and colors indicating healthy (green) or unhealthy resources (orange / red). We also include availability group synchronization health metric in the dashboard. We will focus on it in this write-up.</p>
<p><span id="more-1784"></span></p>
<p><a href="http://blog.developpez.com/mikedavem/files/2021/04/174-1-mssql-dashboard.jpg"><img src="http://blog.developpez.com/mikedavem/files/2021/04/174-1-mssql-dashboard-1024x158.jpg" alt="174 - 1 - mssql dashboard" width="584" height="90" class="alignnone size-large wp-image-1785" /></a></p>
<p>As a reminder, this Grafana dashboard gets information from Prometheus server and metrics related to MSSQL environments. For a sake of clarity, in this dashboard, environment defines one availability group and a set of 2 AG replicas (A or B) in synchronous replication mode. In other words, <strong>ENV1</strong> value corresponds to availability group name and to SQL instance names member of the AG group with <strong>SERVERA\ENV1</strong> (first replica), <strong>SERVERB\ENV1</strong> (second replica). </p>
<p>In the picture above, you can notice 2 sections. One is for availability group and health monitoring and the second includes a set of black box metrics related to saturation and latencies (CPU, RAM, Network, AG replication delay, SQL Buffer Pool, blocked processes &#8230;). Good job for one single environment but what if I want to introduce more availability groups and SQL instances in the game?</p>
<p>The first and easiest (or naïve) way we went through when we started writing this dashboard was to copy / paste all the stuff done for one environment the panels as shown below:</p>
<p><a href="http://blog.developpez.com/mikedavem/files/2021/04/174-2-mssql-dashboard-static.jpg"><img src="http://blog.developpez.com/mikedavem/files/2021/04/174-2-mssql-dashboard-static-1024x242.jpg" alt="174 - 2 - mssql dashboard static" width="584" height="138" class="alignnone size-large wp-image-1786" /></a></p>
<p>After creating a new row (can be associated to section in the present context) at the bottom, all panels were copied from ENV1 to the new fresh section ENV2. New row is created by converting anew panel into row as show below:</p>
<p><a href="http://blog.developpez.com/mikedavem/files/2021/04/174-3-convert-panel-to-row.jpg"><img src="http://blog.developpez.com/mikedavem/files/2021/04/174-3-convert-panel-to-row-1024x199.jpg" alt="174 - 3 - convert panel to row" width="584" height="113" class="alignnone size-large wp-image-1787" /></a></p>
<p>Then I need to modify manually ALL the new metrics with the new environment. Let’s illustrate the point with Batch Requests/sec metric as example. The corresponding Prometheus query for the first replica (A) is: (the initial query has been simplified for the purpose of this blog post):</p>
<div class="codecolorer-container text default" style="overflow:auto;white-space:nowrap;border:1px solid #9F9F9F;width:650px;"><div class="text codecolorer" style="padding:5px;font:normal 12px/1.4em Monaco, Lucida Console, monospace;white-space:nowrap">irate(sqlserver_performance{sql_instance='SERVERA:ENV1',counter=&quot;Batch Requests/sec&quot;}[$__range])</div></div>
<p>Same query exists for secondary replica (B) but with a different label value:</p>
<div class="codecolorer-container text default" style="overflow:auto;white-space:nowrap;border:1px solid #9F9F9F;width:650px;"><div class="text codecolorer" style="padding:5px;font:normal 12px/1.4em Monaco, Lucida Console, monospace;white-space:nowrap">irate(sqlserver_performance{sql_instance='SERVERB:ENV1',counter=&quot;Batch Requests/sec&quot;}[$__range])</div></div>
<p>SERVERA:ENV1 and SERVERB:ENV1 are static values that correspond to the name of each SQL Server instance – respectively SERVERA\ENV1 and SERVERB\ENV1. As you probably already guessed and according to our naming convention, for the new environment and related panels, we obviously changed initial values ENV1 with new one ENV2. But having more environments or providing filtering capabilities to focus only on specific environments make the current process tedious and we need introduce dynamic stuff in the game &#8230; Good news, Grafana provides such capabilities with dynamic creation of rows and panels. and rows. </p>
<p><strong>Generating dynamic panels in the same section (row)</strong></p>
<p>Referring to the dashboard, first section concerns availability group health metric. When adding a new environment – meaning a new availability group – we want a new dedicated panel creating automatically in the same section (AG health).<br />
Firstly, we need to add a multi-value variable in the dashboard. Values can be static or dynamic from another query regarding your context. (up to you to choose the right solution according to your context).</p>
<p><a href="http://blog.developpez.com/mikedavem/files/2021/04/174-4-grafana_variable.jpg"><img src="http://blog.developpez.com/mikedavem/files/2021/04/174-4-grafana_variable.jpg" alt="174 - 4 - grafana_variable" width="968" height="505" class="alignnone size-full wp-image-1789" /></a></p>
<p>Once created, a drop-down list appears at the upper left in the dashboard and now we can perform multi selections or we can filter to specific ones.</p>
<p><a href="http://blog.developpez.com/mikedavem/files/2021/04/174-5-grafana_variable.jpg"><img src="http://blog.developpez.com/mikedavem/files/2021/04/174-5-grafana_variable.jpg" alt="174 - 5 - grafana_variable" width="202" height="346" class="alignnone size-full wp-image-1790" /></a></p>
<p>Then we need to make panel in the AG Heath section dynamic as follows:<br />
&#8211; Change the title value with corresponding dashboard (optional)<br />
&#8211; Configure repeat options values with the variable (mandatory). You can also define max panel per row</p>
<p><a href="http://blog.developpez.com/mikedavem/files/2021/04/174-6-panel-variabilisation.jpg"><img src="http://blog.developpez.com/mikedavem/files/2021/04/174-6-panel-variabilisation.jpg" alt="174 - 6 - panel variabilisation" width="279" height="414" class="alignnone size-full wp-image-1792" /></a></p>
<p>According to this setup, we can display 4 panels (or availability groups) max per row. The 5th will be created and placed to a new line in the same section as shown below:<br />
<a href="http://blog.developpez.com/mikedavem/files/2021/04/174-7-panel-same-section.jpg"><img src="http://blog.developpez.com/mikedavem/files/2021/04/174-7-panel-same-section-1024x125.jpg" alt="174 - 7 - panel same section" width="584" height="71" class="alignnone size-large wp-image-1793" /></a></p>
<p>Finally, we must replace static label values defined in the query by the variable counterpart. For the availability group we are using <strong>sqlserver_hadr_replica_states_replica_synchronization_health</strong> metric as follows (again, I voluntary put a sample of the entire query for simplicity purpose):</p>
<div class="codecolorer-container text default" style="overflow:auto;white-space:nowrap;border:1px solid #9F9F9F;width:650px;"><div class="text codecolorer" style="padding:5px;font:normal 12px/1.4em Monaco, Lucida Console, monospace;white-space:nowrap">… sqlserver_hadr_replica_states_replica_synchronization_health{sql_instance=~'SERVER[A|B]:$ENV',measurement_db_type=&quot;SQLServer&quot;}) …</div></div>
<p>You can notice the regex expression used to get information from SQL Instances either from primary (A) or secondary (B). The most interesting part concerns the environment that is now dynamic with $ENV variable.</p>
<p><strong>Generating dynamic sections (rows)</strong></p>
<p>As said previously, sections are in fact rows in the Grafana dashboard and rows can contain panels. If we add new environment, we want also to see a new section (and panels) related to it. Configuring dynamic rows is pretty similar to panels. We only need to change the “Repeat for section” with the environment variable as follows (Title remains optional):</p>
<p><a href="http://blog.developpez.com/mikedavem/files/2021/04/174-8-row.jpg"><img src="http://blog.developpez.com/mikedavem/files/2021/04/174-8-row-1024x173.jpg" alt="174 - 8 - row" width="584" height="99" class="alignnone size-large wp-image-1794" /></a></p>
<p>As for AG Health panel, we also need to replace static label values in ALL panels with the new environment variable. Thus, referring to the previous Batch Requests / sec example, the updated Prometheus query will be as follows: (respectively for primary and secondary replicas):</p>
<div class="codecolorer-container text default" style="overflow:auto;white-space:nowrap;border:1px solid #9F9F9F;width:650px;"><div class="text codecolorer" style="padding:5px;font:normal 12px/1.4em Monaco, Lucida Console, monospace;white-space:nowrap">irate(sqlserver_performance{sql_instance='SERVERA:$ENV',counter=&quot;Batch Requests/sec&quot;}[$__range])</div></div>
<p>&#8230;</p>
<div class="codecolorer-container text default" style="overflow:auto;white-space:nowrap;border:1px solid #9F9F9F;width:650px;"><div class="text codecolorer" style="padding:5px;font:normal 12px/1.4em Monaco, Lucida Console, monospace;white-space:nowrap">irate(sqlserver_performance{sql_instance='SERVERB:$ENV',counter=&quot;Batch Requests/sec&quot;}[$__range])</div></div>
<p>The dashboard is now ready, and all dynamic kicks in when a new SQL Server instance is added to the list of monitored items. Here an example of outcome in our context:</p>
<p><a href="http://blog.developpez.com/mikedavem/files/2021/04/174-0-final-dashboard.jpg"><img src="http://blog.developpez.com/mikedavem/files/2021/04/174-0-final-dashboard-1024x404.jpg" alt="174 - 0 - final dashboard" width="584" height="230" class="alignnone size-large wp-image-1795" /></a></p>
<p>Happy monitoring!</p>
]]></content:encoded>
			<wfw:commentRss></wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>SQL Server index rebuid online and blocking scenario</title>
		<link>https://blog.developpez.com/mikedavem/p13199/sql-server-2012/sql-server-index-rebuid-online-and-blocking-scenario</link>
		<comments>https://blog.developpez.com/mikedavem/p13199/sql-server-2012/sql-server-index-rebuid-online-and-blocking-scenario#comments</comments>
		<pubDate>Sun, 30 Aug 2020 21:18:28 +0000</pubDate>
		<dc:creator><![CDATA[mikedavem]]></dc:creator>
				<category><![CDATA[SQL Server 2012]]></category>
		<category><![CDATA[SQL Server 2014]]></category>
		<category><![CDATA[SQL Server 2016]]></category>
		<category><![CDATA[SQL Server 2017]]></category>
		<category><![CDATA[SQL Server 2019]]></category>
		<category><![CDATA[blocking]]></category>
		<category><![CDATA[online operation]]></category>
		<category><![CDATA[performance]]></category>
		<category><![CDATA[SQL]]></category>
		<category><![CDATA[SQL Server]]></category>

		<guid isPermaLink="false">http://blog.developpez.com/mikedavem/?p=1664</guid>
		<description><![CDATA[A couple of months ago, I experienced a problem about index rebuild online operation on SQL Server. In short, the operation was supposed to be online and to never block concurrent queries. But in fact, it was not the case &#8230; <a href="https://blog.developpez.com/mikedavem/p13199/sql-server-2012/sql-server-index-rebuid-online-and-blocking-scenario">Lire la suite <span class="meta-nav">&#8594;</span></a>]]></description>
				<content:encoded><![CDATA[<p>A couple of months ago, I experienced a problem about index rebuild online operation on SQL Server. In short, the operation was supposed to be online and to never block concurrent queries. But in fact, it was not the case (or to be more precise, it was partially the case) and to make the scenario more complex, we experienced different behaviors regarding the context. Let’s start the story with the initial context: in my company, we usually go through continuous deployment including SQL modification scripts and because we usually rely on daily pipeline, we must ensure related SQL operations are not too disruptive to avoid impacting the user experience.</p>
<p><span id="more-1664"></span></p>
<p>Sometimes, we must introduce new indexes to deployment scripts and according to how disruptive the script can be, a discussion between Devs and Ops is initiated, and it results either to manage manually by the Ops team or to deploy it automatically through the automatic deployment pipeline by Devs. </p>
<p>Non-disruptive operations can be achieved in many ways and ONLINE capabilities of SQL Server may be part of the solution and this is what I suggested with one of our scripts. Let’s illustrate this context with the following example. I created a table named dbo.t1 with a bunch of rows:</p>
<div class="codecolorer-container text default" style="overflow:auto;white-space:nowrap;border:1px solid #9F9F9F;width:650px;"><div class="text codecolorer" style="padding:5px;font:normal 12px/1.4em Monaco, Lucida Console, monospace;white-space:nowrap">USE [test];<br />
<br />
SET NOCOUNT ON;<br />
<br />
DROP TABLE IF EXISTS dbo.t1;<br />
GO<br />
<br />
CREATE TABLE dbo.t1 (<br />
&nbsp; &nbsp; id INT IDENTITY(1,1) NOT NULL PRIMARY KEY,<br />
&nbsp; &nbsp; col1 VARCHAR(50) NULL<br />
);<br />
GO<br />
<br />
INSERT INTO dbo.t1 (col1) VALUES (REPLICATE('T', 50));<br />
GO …<br />
EXEC sp_spaceused 'dbo.t1'<br />
--name&nbsp; rows&nbsp; &nbsp; reserved&nbsp; &nbsp; data&nbsp; &nbsp; index_size&nbsp; unused<br />
--t1&nbsp; &nbsp; 5226496 &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 1058000 KB&nbsp; 696872 KB &nbsp; 342888 KB &nbsp; 18240 KB</div></div>
<p>Go ahead and let’ set the context with a pattern of scripts deployment we went through during this specific deployment. Let’s precise this script is over simplified, but I keep the script voluntary simple to focus only on the most important part.  You will notice the script includes two steps with operations on the same table including updating / fixing values in col2 first and then rebuilding index on col1.</p>
<div class="codecolorer-container text default" style="overflow:auto;white-space:nowrap;border:1px solid #9F9F9F;width:650px;"><div class="text codecolorer" style="padding:5px;font:normal 12px/1.4em Monaco, Lucida Console, monospace;white-space:nowrap">/* Code before */<br />
<br />
-- Update some values in the col1 colum<br />
UPDATE [dbo].[t1]<br />
SET col1 = REPLICATE('B', 50)<br />
<br />
-- Then create an index on col1 column<br />
CREATE INDEX [col1]<br />
ON [dbo].[t1] (col1) WITH (ONLINE = ON);<br />
GO</div></div>
<p>At the initial stage, the creation of index was by default (OFFLINE). Having discussed this point with the DEV team, we decided to create the index ONLINE in this context. The choice between OFFLINE / ONLINE operation is often not trivial and should be evaluated carefully but to keep simple, let’s say it was the right way to go in our context. Generally speaking, online operations are slower, but the tradeoff was acceptable in order to minimize blocking issues during this deployment. At least, this is what I thought …</p>
<p>In my demo, without any concurrent workload against the dbo.t1 table, creating the index offline took 6s compared to the online method with 12s. So, an expected result here …</p>
<p>Let’s run this another query in another session:</p>
<div class="codecolorer-container text default" style="overflow:auto;white-space:nowrap;border:1px solid #9F9F9F;width:650px;"><div class="text codecolorer" style="padding:5px;font:normal 12px/1.4em Monaco, Lucida Console, monospace;white-space:nowrap">SELECT id, col1<br />
FROM dbo.t1<br />
WHERE id BETWEEN 1 AND 2</div></div>
<p>In a normal situation, this query should be blocked in a short time corresponding to the duration of the update operation. But once the update is done, blocking situation should disappear even during the index rebuild operation that is performed ONLINE. </p>
<p>But now let’s add <a href="https://flywaydb.org/" rel="noopener" target="_blank">Flyway</a> to the context. Flyway is an open source tool we are using for automatic deployment of SQL objects. The deployment script was executed from it in ACC environment and we noticed longer blocked concurrent accesses this time. This goes against what we would ideally like. Digging through this issue with the DEV team, we also noticed the following message when running the deployment script:</p>
<p><em>Warning: Online index operation on table &lsquo;dbo.t1 will proceed but concurrent access to the table may be limited due to residual lock on the table from a previous operation in the same transaction.<br />
</em></p>
<p>This is something I didn’t noticed from SQL Server Management Studio when I tested the same deployment script. So, what happened here?</p>
<p>Referring to the <a href="https://flywaydb.org/documentation/migrations#transactions" rel="noopener" target="_blank">Flyway documentation</a>, it is mentioned that Flyway always wraps the execution of an entire migration within a single transaction by default and it was exactly the root cause of the issue.</p>
<p>Let’s try with some experimentations: </p>
<p><strong>Test 1</strong>: Update + rebuilding index online in implicit transaction mode (one transaction per query).</p>
<div class="codecolorer-container text default" style="overflow:auto;white-space:nowrap;border:1px solid #9F9F9F;width:650px;"><div class="text codecolorer" style="padding:5px;font:normal 12px/1.4em Monaco, Lucida Console, monospace;white-space:nowrap">-- Update some values in the col1 colum<br />
UPDATE [dbo].[t1]<br />
SET col1 = REPLICATE('B', 50)<br />
<br />
-- Then create an index on col1 column<br />
CREATE INDEX [col1]<br />
ON [dbo].[t1] (col1) WITH (ONLINE = ON);<br />
GO<br />
-- In another session<br />
SELECT id, col1<br />
FROM dbo.t1<br />
WHERE id BETWEEN 1 AND 2</div></div>
<p><strong>Test 2</strong>: Update + rebuilding index online within one single explicit transaction</p>
<div class="codecolorer-container text default" style="overflow:auto;white-space:nowrap;border:1px solid #9F9F9F;width:650px;"><div class="text codecolorer" style="padding:5px;font:normal 12px/1.4em Monaco, Lucida Console, monospace;white-space:nowrap">BEGIN TRAN;<br />
<br />
-- Update some values in the col1 colum<br />
UPDATE [dbo].[t1]<br />
SET col1 = REPLICATE('B', 50)<br />
<br />
-- Then create an index on col1 column<br />
CREATE INDEX [col1]<br />
ON [dbo].[t1] (col1) WITH (ONLINE = ON);<br />
GO<br />
COMMIT TRAN;<br />
-- In another session<br />
SELECT id, col1<br />
FROM dbo.t1<br />
WHERE id BETWEEN 1 AND 2</div></div>
<p>After running these two scripts, we can notice the blocking duration of SELECT query is longer in test2 as shown in the picture below:</p>
<p><a href="http://blog.developpez.com/mikedavem/files/2020/08/166-1-blocked-process.jpg"><img src="http://blog.developpez.com/mikedavem/files/2020/08/166-1-blocked-process.jpg" alt="166 - 1 - blocked process" width="890" height="358" class="alignnone size-full wp-image-1665" /></a></p>
<p>In the test 1, the duration of the blocking session corresponds to that for updating operation (first step of the script). However, in the test 2, we must include the time for creating the index but let’s precise the index is not the blocking operation at all, but it increases the residual locking put by the previous update operation. In short, this is exactly what the warning message is telling us. I think you can imagine easily which impact such situation may implies if the index creation takes a long time. You may get exactly the opposite of what you really expected. </p>
<p>Obviously, this is not a recommended situation and creating an index should be run in very narrow and constrained transaction.But from my experience, things are never always obvious and regarding your context, you should keep an eye of how transactions are managed especially when it comes automatic deployment stuff that could be quickly out of the scope of the DBA / Ops team. Strong collaboration with DEV team is recommended to anticipate this kind of issue.</p>
<p>See you !!</p>
]]></content:encoded>
			<wfw:commentRss></wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Monitoring Azure SQL Databases with Azure Monitor and Automation</title>
		<link>https://blog.developpez.com/mikedavem/p13198/sql-server-2012/monitoring-azure-sql-databases-with-azure-monitor-and-automation</link>
		<comments>https://blog.developpez.com/mikedavem/p13198/sql-server-2012/monitoring-azure-sql-databases-with-azure-monitor-and-automation#comments</comments>
		<pubDate>Sun, 23 Aug 2020 15:32:07 +0000</pubDate>
		<dc:creator><![CDATA[mikedavem]]></dc:creator>
				<category><![CDATA[SQL Azure]]></category>
		<category><![CDATA[SQL Server 2012]]></category>
		<category><![CDATA[Azure]]></category>
		<category><![CDATA[Azure Alerts]]></category>
		<category><![CDATA[Azure Monitor]]></category>
		<category><![CDATA[Azure SQL Database]]></category>
		<category><![CDATA[Cloud]]></category>
		<category><![CDATA[monitoring]]></category>
		<category><![CDATA[SQL Server]]></category>

		<guid isPermaLink="false">http://blog.developpez.com/mikedavem/?p=1653</guid>
		<description><![CDATA[Supervising Cloud Infrastructure is an important aspect of Cloud administration and Azure SQL Databases are no exception. This is something we are continuously improving at my company. On-prem, DBAs often rely on well-established products but with Cloud-based architectures, often implemented &#8230; <a href="https://blog.developpez.com/mikedavem/p13198/sql-server-2012/monitoring-azure-sql-databases-with-azure-monitor-and-automation">Lire la suite <span class="meta-nav">&#8594;</span></a>]]></description>
				<content:encoded><![CDATA[<p>Supervising Cloud Infrastructure is an important aspect of Cloud administration and Azure SQL Databases are no exception. This is something we are continuously improving at my company. </p>
<p>On-prem, DBAs often rely on well-established products but with Cloud-based architectures, often implemented through DevOps projects and developers, monitoring should be been redefined and include some new topics as:</p>
<p><span id="more-1653"></span></p>
<p>1)	Cloud service usage and fees observability<br />
2)	Metrics and events detection that could affect bottom line<br />
3)	Implementing a single platform to report all data that comes from different sources<br />
4)	Trigger rules with data if workload reaches over or drops below certain levels or when an event is enough relevant to not meet the configuration standard and implies unwanted extra billing or when it compromises the company security rules.<br />
5)	Monitoring of the user experience</p>
<p>A key benefit often discussed about Cloud computing, and mainly driven by DevOps, is how it enables agility. One of the meaning of term agility is tied to the rapid provisioning of computer resources (in seconds or minutes) and this shortening provisioning path enables work to start quickly. You may be tempted to grant some provisioning permissions to DEV teams and from my opinion this is not a bad thing, but it may come with some drawbacks if not under control by Ops team including database area. Indeed, for example I have in mind some real cases including architecture configuration drift, security breaches created by unwanted item changes, or idle orphan resources for which you keep being charged. All of these scenarios may lead either to security issues or extra billing and I believe it is important to get clear visibility of such events. </p>
<p>In my company, Azure built-in capabilities with Azure Monitor architecture are our first target (at least in a first stage) and seem to address the aforementioned topics. To set the context, we already relied on Azure Monitor infrastructure for different things including Query Performance Insight, SQL Audit analysis through Log Analytics and Azure alerts for some performance metrics. Therefore, it was the obvious way to go further by adding activity log events to the story. </p>
<p><a href="http://blog.developpez.com/mikedavem/files/2020/08/165-1-Azure-Monitor.jpg"><img src="http://blog.developpez.com/mikedavem/files/2020/08/165-1-Azure-Monitor.jpg" alt="165 - 1 - Azure Monitor" width="843" height="474" class="alignnone size-full wp-image-1655" /></a></p>
<p>In this blog post, let’s focus on the items 2) 4). I would like to share some experimentations and thoughts about them. As a reminder, items 2) 4) are about catching relevant events to help identifying configuration and security drifts and performing actions accordingly. In addition, as many event-based architectures, additional events may appear or evolve over the time and we started thinking about the concept with the following basic diagram …</p>
<p><a href="http://blog.developpez.com/mikedavem/files/2020/08/165-2-Workflow-chart-e1598182358607.jpg"><img src="http://blog.developpez.com/mikedavem/files/2020/08/165-2-Workflow-chart-e1598182358607.jpg" alt="165 - 2 - Workflow chart" width="800" height="533" class="alignnone size-full wp-image-1657" /></a></p>
<p>… that led to the creation of the two following workflows:<br />
&#8211;	Workflow 1: To get notified immediately for critical events that may compromise security or lead immediately to important extra billing<br />
&#8211;	Workflow 2: To get a report of other misconfigured items (including critical ones) on schedule basis but don’t require quick responsiveness of Ops team.</p>
<p>Concerning the first workflow, using <a href="https://docs.microsoft.com/en-us/azure/azure-monitor/platform/activity-log-alerts" rel="noopener" target="_blank">alerts on activity logs</a>, action groups and webhooks as input of an Azure automation runbook appeared to be a good solution. On another side, the second one only requires running an Azure automation workbook on schedule basis. In fact, this is the same runbook but with different input parameters according to the targeted environment (e.g. PROD / ACC / INT). In addition, the runbook should be able to identity unmanaged events and notified Ops team who will decide either to skip it or to integrate it to runbook processing.</p>
<p>Azure alerts which can be divided in different categories including metric, log alerts and activity log alerts. The last one drew our attention because it allows getting notified for operation of specific resources by email or by generating JSON schema reusable from Azure Automation runbook. Focusing on the latter, we had come up (I believe) with what we thought was a reasonable solution. </p>
<p>Here the high-level picture of the architecture we have implemented:</p>
<p><a href="http://blog.developpez.com/mikedavem/files/2020/08/165-3-Architecture-e1598182462929.jpg"><img src="http://blog.developpez.com/mikedavem/files/2020/08/165-3-Architecture-e1598182462929.jpg" alt="165 - 3 - Architecture" width="800" height="347" class="alignnone size-full wp-image-1659" /></a></p>
<p>1-	During the creation of an Azure SQL Server or a database, corresponding alerts are added with Administrative category with a specific scope. Let&rsquo;s precise that concerned operations must be registered with Azure Resource Manager in order to be used in Activity Log and fortunately they are all including in the <a href="https://docs.microsoft.com/en-us/azure/role-based-access-control/resource-provider-operations" rel="noopener" target="_blank">Microsoft.Sql</a> resource provider in this case.<br />
2-	When an event occurs on the targeted environment, an alert is triggered as well as the concerned runbook.<br />
3-	The execution of the same runbook but with different input parameters is scheduled on weekly basis to a general configuration report of our Azure SQL environments.<br />
4-	According the event, Ops team gets notified and acts (either to update misconfigured item, or to delete the unauthorized item, or to update runbook code on Git Repo to handle the new event and so on …)</p>
<p>The skeleton of the Azure automation runbook is pretty similar to the following one:</p>
<div class="codecolorer-container text default" style="overflow:auto;white-space:nowrap;border:1px solid #9F9F9F;width:650px;height:450px;"><div class="text codecolorer" style="padding:5px;font:normal 12px/1.4em Monaco, Lucida Console, monospace;white-space:nowrap">[OutputType(&quot;PSAzureOperationResponse&quot;)]<br />
param<br />
(<br />
&nbsp; &nbsp; [Parameter (Mandatory=$false)]<br />
&nbsp; &nbsp; [object] $WebhookData<br />
&nbsp; &nbsp; ,<br />
&nbsp; &nbsp; [parameter(Mandatory=$False)]<br />
&nbsp; &nbsp; [ValidateSet(&quot;PROD&quot;,&quot;ACC&quot;,&quot;INT&quot;)]<br />
&nbsp; &nbsp; [String]$EnvTarget<br />
&nbsp; &nbsp; ,<br />
&nbsp; &nbsp; [parameter(Mandatory=$False)]<br />
&nbsp; &nbsp; [Boolean]$DebugMode = $False<br />
)<br />
<br />
<br />
<br />
<br />
If ($WebhookData)<br />
{<br />
<br />
&nbsp; &nbsp; # Logic to allow for testing in test pane<br />
&nbsp; &nbsp; If (-Not $WebhookData.RequestBody){<br />
&nbsp; &nbsp; &nbsp; &nbsp; $WebhookData = (ConvertFrom-Json -InputObject $WebhookData)<br />
&nbsp; &nbsp; }<br />
<br />
&nbsp; &nbsp; $WebhookBody = (ConvertFrom-Json -InputObject $WebhookData.RequestBody)<br />
<br />
&nbsp; &nbsp; $schemaId = $WebhookBody.schemaId<br />
<br />
&nbsp; &nbsp; If ($schemaId -eq &quot;azureMonitorCommonAlertSchema&quot;) {<br />
&nbsp; &nbsp; &nbsp; &nbsp; # This is the common Metric Alert schema (released March 2019)<br />
&nbsp; &nbsp; &nbsp; &nbsp; $Essentials = [object] ($WebhookBody.data).essentials<br />
&nbsp; &nbsp; &nbsp; &nbsp; # Get the first target only as this script doesn't handle multiple<br />
&nbsp; &nbsp; &nbsp; &nbsp; $status = $Essentials.monitorCondition<br />
<br />
&nbsp; &nbsp; &nbsp; &nbsp; # Focus only on succeeded or Fired Events<br />
&nbsp; &nbsp; &nbsp; &nbsp; If ($status -eq &quot;Succeeded&quot; -Or $Status -eq &quot;Fired&quot;)<br />
&nbsp; &nbsp; &nbsp; &nbsp; {<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; # Extract info from webook <br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; $alertTargetIdArray = (($Essentials.alertTargetIds)[0]).Split(&quot;/&quot;)<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; $SubId = ($alertTargetIdArray)[2]<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; $ResourceGroupName = ($alertTargetIdArray)[4]<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; $ResourceType = ($alertTargetIdArray)[6] + &quot;/&quot; + ($alertTargetIdArray)[7]<br />
<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; # Determine code path depending on the resourceType<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; if ($ResourceType -eq &quot;microsoft.sql/servers&quot;)<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; {<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; # DEBUG<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; Write-Output &quot;This is a SQL Server Resource.&quot;<br />
<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; $firedDate = $Essentials.firedDateTime<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; $AlertContext = [object] ($WebhookBody.data).alertContext<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; $channel = $AlertContext.channels<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; $EventSource = $AlertContext.eventSource<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; $Level = $AlertContext.level<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; $Operation = $AlertContext.operationName<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; $Properties = [object] ($WebhookBody.data).alertContext.properties<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; $EventName = $Properties.eventName<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; $EventStatus = $Properties.status<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; $Description = $Properties.description_scrubbed<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; $Caller = $Properties.caller<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; $IPAddress = $Properties.ipAddress<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; $ResourceName = ($alertTargetIdArray)[8]<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; $DatabaseName = ($alertTargetIdArray)[10]<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; $Operation_detail = $Operation.Split('/')<br />
<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; # Check firewall rules<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; If ($EventName -eq 'OverwriteFirewallRules'){<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; Write-Output &quot;Firewall Overwrite is detected ...&quot;<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; # Code to handle firewall update event<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; }<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; # Update DB =&amp;gt; No need to be monitored in real time<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; Elseif ($EventName -eq 'UpdateDatabase') {<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; # Code to handle Database config update event or skip <br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; }<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; # Create DB<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; Elseif ($EventName -eq 'CreateDatabase' -Or `<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; $Operation -eq 'Microsoft.Sql/servers/databases/write'){<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; Write-Output &quot;Azure Database creation has been detected ...&quot;<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; # Code to handle Database creation event or skip <br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; }<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; # Delete DB<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; Elseif ($EventName -eq 'DeleteDatabase' -Or `<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; $Operation -eq 'Microsoft.Sql/servers/databases/delete') {<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; Write-Output &quot;Azure Database has been deleted ...&quot;<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; # Code to handle Database deletion event or skip <br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; }<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; Elseif ($Operation -eq 'Microsoft.Sql/servers/databases/transparentDataEncryption/write') {<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; Write-Output &quot;Azure Database Encryption update has been detected ...&quot;<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; # Code to handle Database encryption update event or skip <br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; }<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; Elseif ($Operation -eq 'Microsoft.Sql/servers/databases/auditingSettings/write') {<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; Write-Output &quot;Azure Database Audit update has been detected ...&quot;<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; # Code to handle Database audit update event or skip <br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; }<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; Elseif ($Operation -eq 'Microsoft.Sql/servers/databases/securityAlertPolicies/write' -or $Operation -eq 'Microsoft.Sql/servers/databases/vulnerabilityAssessments/write') {<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; Write-Output &quot;Azure ADS update has been detected ...&quot;<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; # Code to handle ADS update event or skip <br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; }<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; ElseIf ($Operation -eq 'Microsoft.Sql/servers/databases/backupShortTermRetentionPolicies/write'){<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; Write-Output &quot;Azure Retention Backup has been modified ...&quot;<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; # Code to handle Database retention backup update event or skip <br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; }<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; # ... other ones <br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; Else {<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; Write-Output &quot;Event not managed yet &nbsp; &nbsp;&quot;<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; }<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; else {<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; # ResourceType not supported<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; Write-Error &quot;$ResourceType is not a supported resource type for this runbook.&quot;<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; }<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; }<br />
&nbsp; &nbsp; &nbsp; &nbsp; }<br />
&nbsp; &nbsp; &nbsp; &nbsp; Else {<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; # The alert status was not 'Activated' or 'Fired' so no action taken<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; Write-Verbose (&quot;No action taken. Alert status: &quot; + $status) -Verbose<br />
&nbsp; &nbsp; &nbsp; &nbsp; }<br />
&nbsp; &nbsp; }<br />
&nbsp; &nbsp; Else{<br />
&nbsp; &nbsp; &nbsp; &nbsp;# SchemaID doesn't correspond to azureMonitorCommonAlertSchema =&amp;gt;&amp;gt; Skip<br />
&nbsp; &nbsp; &nbsp; &nbsp;Write-Host &quot;Skip ...&quot; <br />
&nbsp; &nbsp; }<br />
}<br />
Else {<br />
&nbsp; &nbsp; Write-Output &quot;No Webhook detected ... switch to normal mode ...&quot;<br />
<br />
&nbsp; &nbsp; If ([String]::IsNullOrEmpty($EnvTarget)){<br />
&nbsp; &nbsp; &nbsp; &nbsp; Write-Error '$EnvTarget is mandatory in normal mode'<br />
&nbsp; &nbsp; }<br />
<br />
&nbsp; &nbsp; #########################################################<br />
&nbsp; &nbsp; # Code for a complete check of Azure SQL DB environment #<br />
&nbsp; &nbsp; #########################################################<br />
}</div></div>
<p>Some comments about the PowerShell script:</p>
<p>1)	Input parameters should include either the Webhook data or specific parameter values for a complete Azure SQL DB check.<br />
2)	The first section should include your own functions to respond to different events. In our context, currently we drew on <a href="https://github.com/sqlcollaborative/dbachecks" rel="noopener" target="_blank">DBAChecks</a> thinking to develop a derived model but why not using directly DBAChecks in a near future?<br />
3)	When an event is triggered, a JSON schema is generated and provides insight. The point here is you must navigate through different properties according to the operation type (cf. <a href="https://docs.microsoft.com/en-us/azure/azure-monitor/platform/activity-log-schema" rel="noopener" target="_blank">BOL</a>).<br />
4)	The increase of events to manage could be a potential issue making the runbook fat especially if we keep both the core functions and event processing. To mitigate this topic, we are thinking to move functions into modules in Azure automation (next step).</p>
<p><strong>Bottom line</strong></p>
<p>Thanks to Azure built-in capabilities we improved our visibility of events that occur on the Azure SQL environment (both expected and unexcepted) and we’re now able to act accordingly. But I should tell you that going this way is not a free lunch and we achieved a reasonable solution after some programming and testing efforts. If you can invest time, it is probably the kind of solution you can add to your study.</p>
<p>See you</p>
]]></content:encoded>
			<wfw:commentRss></wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>AAD user creation on behalf AAD Service Principal with Azure SQL DB</title>
		<link>https://blog.developpez.com/mikedavem/p13197/sql-azure/aad-user-creation-on-behalf-aad-service-principal-with-azure-sql-db</link>
		<comments>https://blog.developpez.com/mikedavem/p13197/sql-azure/aad-user-creation-on-behalf-aad-service-principal-with-azure-sql-db#comments</comments>
		<pubDate>Sun, 02 Aug 2020 22:28:06 +0000</pubDate>
		<dc:creator><![CDATA[mikedavem]]></dc:creator>
				<category><![CDATA[DevOps]]></category>
		<category><![CDATA[PowerShell]]></category>
		<category><![CDATA[SQL Azure]]></category>
		<category><![CDATA[Authentication]]></category>
		<category><![CDATA[Azure Automation]]></category>
		<category><![CDATA[Azure SQL Database]]></category>
		<category><![CDATA[Azure SQL DB]]></category>
		<category><![CDATA[Powershell]]></category>
		<category><![CDATA[Runbook]]></category>
		<category><![CDATA[Security]]></category>
		<category><![CDATA[Service Principal]]></category>
		<category><![CDATA[SQL Server]]></category>
		<category><![CDATA[System managed identity]]></category>

		<guid isPermaLink="false">http://blog.developpez.com/mikedavem/?p=1643</guid>
		<description><![CDATA[An interesting improvement was announced by the SQL AAD team on Monday 27th July 2020 and concerns the support for Azure AD user creation on behalf of Azure AD Applications for Azure SQL as mentioned to this Microsoft blog post. &#8230; <a href="https://blog.developpez.com/mikedavem/p13197/sql-azure/aad-user-creation-on-behalf-aad-service-principal-with-azure-sql-db">Lire la suite <span class="meta-nav">&#8594;</span></a>]]></description>
				<content:encoded><![CDATA[<p>An interesting improvement was announced by the SQL AAD team on Monday 27th July 2020 and concerns the support for Azure AD user creation on behalf of Azure AD Applications for Azure SQL as mentioned to this <a href="https://techcommunity.microsoft.com/t5/azure-sql-database/support-for-azure-ad-user-creation-on-behalf-of-azure-ad/ba-p/1491121" rel="noopener" target="_blank">Microsoft blog post</a>. </p>
<p><span id="more-1643"></span></p>
<p>In my company, this is something we were looking for a while with our database refresh process in Azure. Before talking this new feature, let me share a brief history of different considerations we had for this DB refresh process over the time with different approaches we went through. First let’s precise DB Refresh includes usually at least two steps: restoring backup / copying database – you have both ways in Azure SQL Database – and realigning security context with specific users regarding your targeted environment (ACC / INT …).  But the latter is not as trivial as you may expect if you opted to use either a SQL Login / User or a Service Principal to carry out this operation in your process. Indeed, in both cases creating an Azure AD User or Group is not supported, and if you try you will face this error message:</p>
<blockquote><p>‘’ is not a valid login or you do not have permission. </p></blockquote>
<p>All the stuff (either Azure automation runbook and PowerShell modules on-prem) done so far and described afterwards meets the same following process:</p>
<p><a href="http://blog.developpez.com/mikedavem/files/2020/08/164-1-DB-Refresh-process-e1596406580306.jpg"><img src="http://blog.developpez.com/mikedavem/files/2020/08/164-1-DB-Refresh-process-e1596406580306.jpg" alt="164 - 1 - DB Refresh process" width="800" height="566" class="alignnone size-full wp-image-1645" /></a></p>
<p>First, we used Invoke-SQCMD in Azure Automation runbook with T-SQL query to create a copy of a source database to the target server. T-SQL is mandatory in this case as per <a href="https://docs.microsoft.com/en-us/azure/azure-sql/database/database-copy?tabs=azure-powershell" rel="noopener" target="_blank">documented</a> in the Microsoft BOL because PROD and ACC or INT servers are not on the same subscription. Here a simplified sample of code:</p>
<div class="codecolorer-container text default" style="overflow:auto;white-space:nowrap;border:1px solid #9F9F9F;width:650px;"><div class="text codecolorer" style="padding:5px;font:normal 12px/1.4em Monaco, Lucida Console, monospace;white-space:nowrap">...<br />
$CopyDBCMD = @{<br />
&nbsp; &nbsp; 'Database' = 'master'<br />
&nbsp; &nbsp; 'ServerInstance' = $TargetServerName<br />
&nbsp; &nbsp; 'Username' = $SQLUser<br />
&nbsp; &nbsp; 'Password' = $SQLPWD<br />
&nbsp; &nbsp; 'Query' = 'CREATE DATABASE '+ '[' + $DatabaseName + '] ' + 'AS COPY OF ' + '[' + $SourceServerName + '].[' + $DatabaseName + ']'<br />
} <br />
<br />
Invoke-Sqlcmd @CopyDBCMD <br />
...</div></div>
<p>But as you likely know, Invoke-SQLCMD doesn’t support AAD authentication and because SQL Login authentication was the only option here, it led us dealing with an annoying issue about the security configuration step with AAD users or groups as you may imagine. </p>
<p>Then, because we based authentication mainly on trust architecture and our security rules require using it including apps with managed identities or service principals, we wanted also to introduce this concept to our database refresh process. Fortunately, service principals are supported with <a href="https://techcommunity.microsoft.com/t5/azure-sql-database/token-based-authentication-support-for-azure-sql-db-using-azure/ba-p/386091" rel="noopener" target="_blank">Azure SQL DBs since v12</a> with access token for authentication by ADALSQL. The corresponding DLL is required on your server or if you use it from Azure Automation like us, we added the ADAL.PS module but be aware it is now deprecated, and I advise you to strongly invest in moving to MSAL. Here a sample we used:</p>
<div class="codecolorer-container text default" style="overflow:auto;white-space:nowrap;border:1px solid #9F9F9F;width:650px;height:450px;"><div class="text codecolorer" style="padding:5px;font:normal 12px/1.4em Monaco, Lucida Console, monospace;white-space:nowrap">...<br />
$response = Get-ADALToken `<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; -ClientId $clientId `<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; -ClientSecret $clientSecret `<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; -Resource $resourceUri `<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; -Authority $authorityUri `<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; -TenantId $tenantName<br />
<br />
...<br />
<br />
$connectionString = &quot;Server=tcp:$SqlInstanceFQDN,1433;Initial Catalog=master;Persist Security Info=False;MultipleActiveResultSets=False;Encrypt=True;TrustServerCertificate=False;&quot;<br />
# Create the connection object<br />
$connection = New-Object System.Data.SqlClient.SqlConnection($connectionString)<br />
# Set AAD generated token to SQL connection token<br />
$connection.AccessToken = $response.AccessToken<br />
<br />
Try {<br />
&nbsp; &nbsp; $connection.Open()<br />
&nbsp; &nbsp; ...<br />
} <br />
...</div></div>
<p>But again, even if the copy or restore steps are well managed, we still got stuck with security reconfiguration, because service principals were not supported for creating AAD users or groups so far &#8230;</p>
<p>In the meantime, we found out a temporary and interesting solution based on <a href="https://dbatools.io/" rel="noopener" target="_blank">dbatools framework</a> and the <a href="https://docs.dbatools.io/#Invoke-DbaQuery" rel="noopener" target="_blank">Invoke-dbaquery command</a> which supports AAD authentication (Login + Password). As we may not rely on service principal in this case, using a dedicated AAD account was an acceptable tradeoff to manage all the database refresh process steps. But going through this way comes with some disadvantages because running Invoke-dbaquery in a full Azure automation mode is not possible with missing ADALsql.dll. Workaround may be to use hybrid-worker, but we didn’t want to add complexity to our current architecture only for this special case. Instead we decided to move the logic of the Azure automation runbook into on-prem PowerShell framework which already include logic for DB refresh for on-prem SQL Server instances. </p>
<p>Here a simplified sample of code we are using:</p>
<div class="codecolorer-container text default" style="overflow:auto;white-space:nowrap;border:1px solid #9F9F9F;width:650px;height:450px;"><div class="text codecolorer" style="padding:5px;font:normal 12px/1.4em Monaco, Lucida Console, monospace;white-space:nowrap">...<br />
Try {<br />
&nbsp; &nbsp; # Connect to get access to Key Vault info<br />
&nbsp; &nbsp; Connect-AzAccount | Out-Null<br />
<br />
&nbsp; &nbsp; [String]$user = (Get-AzKeyVaultSecret -VaultName $KeyvaultName -Name &quot;AZSQL-SQLBCKUSER&quot;).SecretValueText<br />
&nbsp; &nbsp; [System.Security.SecureString]$pwd = &nbsp;ConvertTo-SecureString (Get-AzKeyVaultSecret -VaultName $KeyvaultName -Name &quot;AZSQL-SQLBCKPWD&quot;).SecretValueText -AsPlainText -Force<br />
&nbsp; &nbsp; [String]$SourceServerName = (Get-AzKeyVaultSecret -VaultName $KeyvaultName -Name &quot;AZSQL-NAME&quot;).SecretValueText<br />
&nbsp; &nbsp; [String]$TargetServerName = (Get-AzKeyVaultSecret -VaultName $KeyvaultName -Name &quot;AZSQL-TARGETNAME&quot;).SecretValueText + '.database.windows.net'<br />
<br />
&nbsp; &nbsp; # DB Restore will be performed in the context of dedicated AAD account <br />
&nbsp; &nbsp; $pscredential = New-Object -TypeName System.Management.Automation.PSCredential($user, $pwd)<br />
<br />
&nbsp; &nbsp; Write-Host &quot;Restoring DB:$DatabaseName from Source Server: $SourceServerName to Target Server: $TargetServerName&quot;<br />
&nbsp; &nbsp; <br />
&nbsp; &nbsp; $Query = &quot;CREATE DATABASE [$DatabaseName] AS COPY OF [$SourceServerName].[$DatabaseName]&quot;<br />
&nbsp; &nbsp; Invoke-DbaQuery `<br />
&nbsp; &nbsp; &nbsp; &nbsp; -SqlInstance $TargetServerName `<br />
&nbsp; &nbsp; &nbsp; &nbsp; -Database master `<br />
&nbsp; &nbsp; &nbsp; &nbsp; -SqlCredential $pscredential `<br />
&nbsp; &nbsp; &nbsp; &nbsp; -Query $Query `<br />
&nbsp; &nbsp; &nbsp; &nbsp; -EnableException <br />
<br />
&nbsp; &nbsp; # Wait for DB online and ready ... <br />
&nbsp; &nbsp; # Code should be implemented for this check <br />
<br />
<br />
&nbsp; &nbsp; Write-Output &quot;Applying security configuration to DB: $DatabaseName on Server:$TargetServerName&quot;<br />
<br />
&nbsp; &nbsp; $Query = &quot;<br />
&nbsp; &nbsp; &nbsp; &nbsp; DROP USER [az_sql_ro];CREATE USER [az_sql_ro] FROM EXTERNAL PROVIDER;<br />
&nbsp; &nbsp; &quot;<br />
&nbsp; &nbsp; Invoke-DbaQuery `<br />
&nbsp; &nbsp; &nbsp; &nbsp; -SqlInstance $TargetServerName `<br />
&nbsp; &nbsp; &nbsp; &nbsp; -Database $DatabaseName `<br />
&nbsp; &nbsp; &nbsp; &nbsp; -SqlCredential $pscredential `<br />
&nbsp; &nbsp; &nbsp; &nbsp; -Query $Query `<br />
&nbsp; &nbsp; &nbsp; &nbsp; -EnableException<br />
<br />
}<br />
Catch {<br />
&nbsp; &nbsp; Write-Host &quot;Error encountered: $($_.Exception.Message)&quot;<br />
} <br />
...</div></div>
<p>Referring to the PowerShell code above, in the second step, we create an AAG group [az_sql_ro] on behalf of the AAD dedicated account with the CLAUSE FROM EXTERNAL PROVIDER. </p>
<p>Finally, with the latest news published by the SQL AAD team, we will likely consider using back service principal instead of dedicated Windows AAD account. <a href="https://techcommunity.microsoft.com/t5/azure-sql-database/support-for-azure-ad-user-creation-on-behalf-of-azure-ad/ba-p/1491121" rel="noopener" target="_blank">This Microsoft blog post</a> explains in details how it works and what you have to setup to make it work correctly. I don’t want to duplicate what is already explained so I will apply the new stuff to my context. </p>
<p>Referring to the above blog post, you need first to setup a server identity for your Azure SQL Server as below:</p>
<div class="codecolorer-container text default" style="overflow:auto;white-space:nowrap;border:1px solid #9F9F9F;width:650px;"><div class="text codecolorer" style="padding:5px;font:normal 12px/1.4em Monaco, Lucida Console, monospace;white-space:nowrap">Set-AzSqlServer `<br />
&nbsp; &nbsp; -ResourceGroupName sandox-rg `<br />
&nbsp; &nbsp; -ServerName a-s-sql02 `<br />
&nbsp; &nbsp; -AssignIdentity<br />
<br />
# Check server identity<br />
Get-AzSqlServer `<br />
&nbsp; &nbsp; -ResourceGroupName sandox-rg `<br />
&nbsp; &nbsp; -ServerName a-s-sql02 | `<br />
&nbsp; &nbsp; Select-Object ServerName, Identity</div></div>
<div class="codecolorer-container text default" style="overflow:auto;white-space:nowrap;border:1px solid #9F9F9F;width:650px;"><div class="text codecolorer" style="padding:5px;font:normal 12px/1.4em Monaco, Lucida Console, monospace;white-space:nowrap">ServerName Identity &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;<br />
---------- -------- &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;<br />
a-s-sql02 &nbsp;Microsoft.Azure.Management.Sql.Models.ResourceIdentity</div></div>
<p>Let&rsquo;s have a look at the server identity</p>
<div class="codecolorer-container text default" style="overflow:auto;white-space:nowrap;border:1px solid #9F9F9F;width:650px;"><div class="text codecolorer" style="padding:5px;font:normal 12px/1.4em Monaco, Lucida Console, monospace;white-space:nowrap"># Get identity details<br />
$identity = Get-AzSqlServer `<br />
&nbsp; &nbsp; &nbsp; &nbsp; -ResourceGroupName sandox-rg `<br />
&nbsp; &nbsp; &nbsp; &nbsp; -ServerName a-s-sql02<br />
<br />
$identity.identity</div></div>
<div class="codecolorer-container text default" style="overflow:auto;white-space:nowrap;border:1px solid #9F9F9F;width:650px;"><div class="text codecolorer" style="padding:5px;font:normal 12px/1.4em Monaco, Lucida Console, monospace;white-space:nowrap">PrincipalId &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;Type &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; TenantId &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;<br />
----------- &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;---- &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; -------- &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;<br />
7f0d16f7-b172-4c97-94d3-34f0f7ed93cf SystemAssigned 2fcd19a7-ab24-4aef-802b-6851ef5d1ed5</div></div>
<p>In fact, assigning a server identity means creating a system assigned managed identity in the Azure AD tenant that&rsquo;s trusted by the subscription of the instance. To keep things simple, let’s say that System Managed Identity in Azure is like to Managed Account or Group Managed Account on-prem. Those identities are self-managed by the system. Then you need to grant this identity the Azure AD &laquo;&nbsp;Directory Readers &laquo;&nbsp;permission to get rights for creating AAD Users or Groups on behalf of this identity. A PowerShell script is provided by Microsoft <a href="https://docs.microsoft.com/en-us/azure/azure-sql/database/authentication-aad-service-principal-tutorial" rel="noopener" target="_blank">here</a> a sample of code I applied in my context for testing:</p>
<div class="codecolorer-container text default" style="overflow:auto;white-space:nowrap;border:1px solid #9F9F9F;width:650px;height:450px;"><div class="text codecolorer" style="padding:5px;font:normal 12px/1.4em Monaco, Lucida Console, monospace;white-space:nowrap">...<br />
Try {<br />
&nbsp; &nbsp; $DatabaseName = &quot;test-DBA&quot; &nbsp; <br />
&nbsp; &nbsp; &nbsp; <br />
&nbsp; &nbsp; # Connect to get access to Key Vault info<br />
&nbsp; &nbsp; Connect-AzAccount | Out-Null<br />
<br />
&nbsp; &nbsp; [String]$user = (Get-AzKeyVaultSecret -VaultName $KeyvaultName -Name &quot;AZSQL-SQLBCKAPPID&quot;).SecretValueText<br />
&nbsp; &nbsp; [System.Security.SecureString]$pwd = &nbsp;ConvertTo-SecureString (Get-AzKeyVaultSecret -VaultName $KeyvaultName -Name &quot;AZSQL-SQLBCKAPPSECRET&quot;).SecretValueText -AsPlainText -Force<br />
&nbsp; &nbsp; [String]$SourceServerName = (Get-AzKeyVaultSecret -VaultName $KeyvaultName -Name &quot;AZSQL-NAME&quot;).SecretValueText<br />
&nbsp; &nbsp; [String]$TargetServerName = (Get-AzKeyVaultSecret -VaultName $KeyvaultName -Name &quot;AZSQL-TARGETNAME&quot;).SecretValueText + '.database.windows.net'<br />
<br />
&nbsp; &nbsp; # DB Restore will be performed in the context of dedicated AAD account <br />
&nbsp; &nbsp; $pscredential = New-Object -TypeName System.Management.Automation.PSCredential($user, $pwd)<br />
<br />
&nbsp; &nbsp; $adalPath &nbsp;= &quot;${env:ProgramFiles}\WindowsPowerShell\Modules\Az.Profile.7.0\PreloadAssemblies&quot;<br />
&nbsp; &nbsp; # To install the latest AzureRM.profile version execute &nbsp;-Install-Module -Name AzureRM.profile<br />
&nbsp; &nbsp; $adal &nbsp; &nbsp; &nbsp;= &quot;$adalPath\Microsoft.IdentityModel.Clients.ActiveDirectory.dll&quot;<br />
&nbsp; &nbsp; $adalforms = &quot;$adalPath\Microsoft.IdentityModel.Clients.ActiveDirectory.WindowsForms.dll&quot;<br />
&nbsp; &nbsp; [System.Reflection.Assembly]::LoadFrom($adal) | Out-Null<br />
&nbsp; &nbsp; $resourceAppIdURI = 'https://database.windows.net/'<br />
<br />
&nbsp; &nbsp; # Set Authority to Azure AD Tenant<br />
&nbsp; &nbsp; $authority = 'https://login.windows.net/' + $tenantId<br />
<br />
&nbsp; &nbsp; $ClientCred = [Microsoft.IdentityModel.Clients.ActiveDirectory.ClientCredential]::new($clientId, $clientSecret)<br />
&nbsp; &nbsp; $authContext = [Microsoft.IdentityModel.Clients.ActiveDirectory.AuthenticationContext]::new($authority)<br />
&nbsp; &nbsp; $authResult = $authContext.AcquireTokenAsync($resourceAppIdURI,$ClientCred)<br />
&nbsp; &nbsp; $Tok = $authResult.Result.CreateAuthorizationHeader()<br />
&nbsp; &nbsp; $Tok=$Tok.Replace(&quot;Bearer &quot;,&quot;&quot;)<br />
&nbsp; &nbsp; <br />
&nbsp; &nbsp; Write-host &quot;Token generated is ...&quot;<br />
&nbsp; &nbsp; $Tok<br />
&nbsp; &nbsp; Write-host &nbsp;&quot;&quot;<br />
<br />
&nbsp; &nbsp; Write-Host &quot;Create SQL connectionstring&quot;<br />
&nbsp; &nbsp; $conn = New-Object System.Data.SqlClient.SQLConnection <br />
&nbsp; &nbsp; <br />
&nbsp; &nbsp; $conn.ConnectionString = &quot;Data Source=$TargetServerName;Initial Catalog=master;Connect Timeout=30&quot;<br />
&nbsp; &nbsp; $conn.AccessToken = $Tok<br />
<br />
&nbsp; &nbsp; Write-host &quot;Connect to database and execute SQL script&quot;<br />
&nbsp; &nbsp; $conn.Open() <br />
<br />
&nbsp; &nbsp; Write-Host &quot;Check connected user ...&quot;<br />
&nbsp; &nbsp; $Query = &quot;SELECT USER_NAME() AS [user_name];&quot;<br />
&nbsp; &nbsp; $command = New-Object -TypeName System.Data.SqlClient.SqlCommand($Query, $conn)<br />
&nbsp; &nbsp; $Command.ExecuteScalar()<br />
&nbsp; &nbsp; $conn.Close()<br />
<br />
&nbsp; &nbsp; Write-Host &quot;Restoring DB:$DatabaseName from Source Server: $SourceServerName to Target Server: $TargetServerName&quot;<br />
<br />
&nbsp; &nbsp; $conn.ConnectionString = &quot;Data Source=$TargetServerName;Initial Catalog=master;Connect Timeout=30&quot;<br />
&nbsp; &nbsp; $conn.AccessToken = $Tok<br />
&nbsp; &nbsp; $conn.Open()<br />
&nbsp; &nbsp; $Query = &quot;DROP DATABASE IF EXISTS [$DatabaseName]; CREATE DATABASE [$DatabaseName] AS COPY OF [$SourceServerName].[$DatabaseName]&quot;<br />
&nbsp; &nbsp; $command = New-Object -TypeName System.Data.SqlClient.SqlCommand($Query, $conn)<br />
&nbsp; &nbsp; $command.CommandTimeout = 1200<br />
&nbsp; &nbsp; $command.ExecuteNonQuery()<br />
&nbsp; &nbsp; $conn.Close()<br />
<br />
&nbsp; &nbsp; # Wait for DB online and ready ... <br />
&nbsp; &nbsp; # Code should be implemented for this check <br />
<br />
&nbsp; &nbsp; <br />
&nbsp; &nbsp; Write-Output &quot;Applying security configuration to DB: $DatabaseName on Server:$TargetServerName&quot;<br />
<br />
&nbsp; &nbsp; $conn.ConnectionString = &quot;Data Source=$TargetServerName;Initial Catalog=$DatabaseName;Connect Timeout=30&quot;<br />
&nbsp; &nbsp; $conn.AccessToken = $Tok<br />
&nbsp; &nbsp; $conn.Open() <br />
&nbsp; &nbsp; $Query = 'CREATE USER [az_sql_ro] FROM EXTERNAL PROVIDER;'<br />
&nbsp; &nbsp; $command = New-Object -TypeName System.Data.SqlClient.SqlCommand($Query, $conn) &nbsp; &nbsp; &nbsp; <br />
&nbsp; &nbsp; $command.ExecuteNonQuery()<br />
&nbsp; &nbsp; $conn.Close()<br />
<br />
}<br />
Catch {<br />
&nbsp; &nbsp; Write-Output &quot;Error encountered: $($_.Exception.Message)&quot;<br />
} <br />
...</div></div>
<p>Using service principal required few changes in my case. I now get credentials of the service principal (ClientId and Secret) from Azure Key Vault instead of the AAD dedicated account used in previous example. I also changed the way to connect to SQL Server by relying on ADALSQL to get the access token instead of using dbatools commands. Indeed, as far as I know, dbatools doesn’t support this authentication way (yet?). </p>
<p>The authentication process becomes as follows:</p>
<p><a href="http://blog.developpez.com/mikedavem/files/2020/08/164-3-new-auth-process-e1596407082747.jpg"><img src="http://blog.developpez.com/mikedavem/files/2020/08/164-3-new-auth-process-e1596407082747.jpg" alt="164 - 3 - new auth process" width="800" height="610" class="alignnone size-full wp-image-1647" /></a></p>
<p>My first test seems to be relevant:</p>
<p><a href="http://blog.developpez.com/mikedavem/files/2020/08/164-4-test-with-SP-e1596407153885.jpg"><img src="http://blog.developpez.com/mikedavem/files/2020/08/164-4-test-with-SP-e1596407153885.jpg" alt="164 - 4 - test with SP" width="800" height="301" class="alignnone size-full wp-image-1648" /></a></p>
<p>This improvement looks promise and may cover broader scenarios as the one I described in this blog post. This feature is in preview at the moment of this write-up and I hope to see it coming soon in GA as well as a potential support of preferred PowerShell framework DBAtools <img src="https://blog.developpez.com/mikedavem/wp-includes/images/smilies/icon_smile.gif" alt=":)" class="wp-smiley" /></p>
<p>See you!</p>
]]></content:encoded>
			<wfw:commentRss></wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>SQL Server on Linux and new FUA support for XFS filesystem</title>
		<link>https://blog.developpez.com/mikedavem/p13193/sql-server-vnext/sql-server-on-linux-and-new-fua-support-for-xfs-filesystem</link>
		<comments>https://blog.developpez.com/mikedavem/p13193/sql-server-vnext/sql-server-on-linux-and-new-fua-support-for-xfs-filesystem#comments</comments>
		<pubDate>Mon, 13 Apr 2020 17:34:32 +0000</pubDate>
		<dc:creator><![CDATA[mikedavem]]></dc:creator>
				<category><![CDATA[Performance]]></category>
		<category><![CDATA[SQL Server 2017]]></category>
		<category><![CDATA[SQL Server 2019]]></category>
		<category><![CDATA[blktrace]]></category>
		<category><![CDATA[FUA]]></category>
		<category><![CDATA[iostats]]></category>
		<category><![CDATA[Linux]]></category>
		<category><![CDATA[performance]]></category>
		<category><![CDATA[SQL Server]]></category>
		<category><![CDATA[xfs]]></category>

		<guid isPermaLink="false">http://blog.developpez.com/mikedavem/?p=1568</guid>
		<description><![CDATA[I wrote a (dbi services) blog post concerning Linux and SQL Server IO behavior changes before and after SQL Server 2017 CU6. Now, I was looking forward seeing some new improvements with Force Unit Access (FUA) that was implemented with &#8230; <a href="https://blog.developpez.com/mikedavem/p13193/sql-server-vnext/sql-server-on-linux-and-new-fua-support-for-xfs-filesystem">Lire la suite <span class="meta-nav">&#8594;</span></a>]]></description>
				<content:encoded><![CDATA[<p>I wrote a (dbi services) <a href="https://blog.dbi-services.com/sql-server-on-linux-io-internal-thoughts/" rel="noopener" target="_blank">blog pos</a>t concerning Linux and SQL Server IO behavior changes before and after SQL Server 2017 CU6.  Now, I was looking forward seeing some new improvements with Force Unit Access (FUA) that was implemented with Linux XFS enhancements since the Linux Kernel 4.18.</p>
<p><span id="more-1568"></span></p>
<p>As reminder, SQL Server 2017 CU6 provides added a way to guarantee data durability by using &laquo;&nbsp;forced flush&nbsp;&raquo; mechanism explained <a href="https://support.microsoft.com/en-us/help/4131496/enable-forced-flush-mechanism-in-sql-server-2017-on-linux" rel="noopener" target="_blank">here</a>. To cut the story short, SQL Server has strict storage requirement such as Write Ordering, FUA and things go differently on Linux than Windows to achieve durability. What is FUA and why is it important for SQL Server? From <a href="https://en.wikipedia.org/wiki/Disk_buffer#Force_Unit_Access_(FUA)" rel="noopener" target="_blank">Wikipedia</a>:  Force Unit Access (aka FUA) is an I/O write command option that forces written data all the way to stable storage. FUA appeared in the SCSI command set but good news, it was later adopted by other standards over the time. SQL Server relies on it to meet WAL and ACID capabilities. </p>
<p>On the Linux world and before the Kernel 4.18, FUA was handled and optimized only for the filesystem journaling. However, data storage always used the multi-step flush process that could introduce SQL Server IO storage slowness (Issue write to block device for the data + issue block device flush to ensure durability with O_DSYNC). </p>
<p>On the Windows world, installing and using a SQL Server instance assumes you are compliant with the Microsoft storage requirements and therefore the first RTM version shipped on Linux came only with O_DIRECT assuming you already ensure that SQL Server IO are able to be written directly into a non-volatile storage through the kernel, drivers and hardware before the acknowledgement. Forced flush mechanism &#8211; based on fdatasync() &#8211;  was then introduced to address scenarios with no safe DIRECT_IO capabilities. </p>
<p>But referring to the Bob Dorr <a href="https://bobsql.com/sql-server-on-linux-forced-unit-access-fua-internals/" rel="noopener" target="_blank">article</a>, Linux Kernel 4.18 comes with XFS enhancements to handle FUA for data storage and it is obviously of benefit to SQL Server.  FUA support is intended to improve write requests by shorten the path of write requests as shown below:</p>
<p><a href="http://blog.developpez.com/mikedavem/files/2020/04/160-1-IO-worklow-e1586796506268.jpg"><img src="http://blog.developpez.com/mikedavem/files/2020/04/160-1-IO-worklow-e1586796506268.jpg" alt="160 - 1 - IO worklow" width="1000" height="539" class="alignnone size-full wp-image-1569" /></a></p>
<p><em>Picture from existing IO workflow on Bob Dorr&rsquo;s article</em></p>
<p>This is an interesting improvement for write intensive workload and it seems to be confirmed from the tests performed by Microsoft and Bob Dorr in his article. </p>
<p>Let’s the experiment begins with my lab environment based on a Centos 7 on Hyper-V with an upgraded kernel version: 5.6.3-1.e17.elrepo.x86_64.</p>
<div class="codecolorer-container text default" style="overflow:auto;white-space:nowrap;border:1px solid #9F9F9F;width:650px;"><div class="text codecolorer" style="padding:5px;font:normal 12px/1.4em Monaco, Lucida Console, monospace;white-space:nowrap">$uname -r<br />
5.6.3-1.el7.elrepo.x86_64<br />
<br />
$cat /etc/os-release | grep VERSION<br />
VERSION=&quot;7 (Core)&quot;<br />
VERSION_ID=&quot;7&quot;<br />
CENTOS_MANTISBT_PROJECT_VERSION=&quot;7&quot;<br />
REDHAT_SUPPORT_PRODUCT_VERSION=&quot;7&quot;</div></div>
<p>Let’s precise that my tests are purely experimental and instead of upgrading the Kernel to a newer version you may directly rely on RHEL 8 based distros which comes with kernel version 4.18 for example.</p>
<p>My lab environment includes 2 separate SSD disks to host the DATA + TLOG database files as follows:</p>
<p>I:\ drive : SQL Data volume (sdb – XFS filesystem)<br />
T:\ drive : SQL TLog volume (sda – XFS filesystem)</p>
<p>The general performance is not so bad <img src="https://blog.developpez.com/mikedavem/wp-includes/images/smilies/icon_smile.gif" alt=":)" class="wp-smiley" /></p>
<p><a href="http://blog.developpez.com/mikedavem/files/2020/04/160-6-diskmark-tests-storage-env-e1586796679451.jpg"><img src="http://blog.developpez.com/mikedavem/files/2020/04/160-6-diskmark-tests-storage-env-e1586796679451.jpg" alt="160 - 6 - diskmark tests storage env" width="1000" height="362" class="alignnone size-full wp-image-1571" /></a></p>
<p>Initially I just dedicated on disk for both SQL DATA and TLOG but I quickly noticed some IO waits (iostats output) leading to make me lunconfident with my test results</p>
<p><a href="http://blog.developpez.com/mikedavem/files/2020/04/160-3-iostats-before-optimization.jpg"><img src="http://blog.developpez.com/mikedavem/files/2020/04/160-3-iostats-before-optimization.jpg" alt="160 - 3 - iostats before optimization" width="975" height="447" class="alignnone size-full wp-image-1572" /></a></p>
<p>Spreading IO on physically separate volumes helped to reduce drastically these phenomena afterwards:</p>
<p><a href="http://blog.developpez.com/mikedavem/files/2020/04/160-4-iostats-after-optimization.jpg"><img src="http://blog.developpez.com/mikedavem/files/2020/04/160-4-iostats-after-optimization.jpg" alt="160 - 4 - iostats after optimization" width="984" height="531" class="alignnone size-full wp-image-1573" /></a> </p>
<p>First, I enabled FUA capabilities on Hyper-V side as follows:</p>
<div class="codecolorer-container text default" style="overflow:auto;white-space:nowrap;border:1px solid #9F9F9F;width:650px;"><div class="text codecolorer" style="padding:5px;font:normal 12px/1.4em Monaco, Lucida Console, monospace;white-space:nowrap">Set-VMHardDiskDrive -VMName CENTOS7 -ControllerType SCSI -OverrideCacheAttributes WriteCacheAndFUAEnabled<br />
<br />
Get-VMHardDiskDrive -VMName CENTOS7 | `<br />
&nbsp; &nbsp; ft VMName, ControllerType, &nbsp;ControllerLocation, Path, WriteHardeningMethod -AutoSize</div></div>
<p>Then I checked if FUA is enabled and supported from an OS perspective including sda (TLOG) and sdb (SQL DATA) disks:</p>
<div class="codecolorer-container text default" style="overflow:auto;white-space:nowrap;border:1px solid #9F9F9F;width:650px;height:450px;"><div class="text codecolorer" style="padding:5px;font:normal 12px/1.4em Monaco, Lucida Console, monospace;white-space:nowrap">$ lsblk -f<br />
NAME &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;FSTYPE &nbsp; &nbsp; &nbsp;LABEL UUID &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; MOUNTPOINT<br />
sdb<br />
└─sdb1 &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;xfs &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 06910f69-27a3-4711-9093-f8bf80d15d72 &nbsp; /sqldata<br />
sr0<br />
sda<br />
├─sda2 &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;xfs &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; f5a9bded-130f-4642-bd6f-9f27563a4e16 &nbsp; /boot<br />
├─sda3 &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;LVM2_member &nbsp; &nbsp; &nbsp; QsbKEt-28yT-lpfZ-VCbj-v5W5-vnVr-2l7nih<br />
│ ├─centos-swap swap &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;7eebbb32-cef5-42e9-87c3-7df1a0b79f11 &nbsp; [SWAP]<br />
│ └─centos-root xfs &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 90f6eb2f-dd39-4bef-a7da-67aa75d1843d &nbsp; /<br />
└─sda1 &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;vfat &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;7529-979E &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;/boot/efi<br />
<br />
$ dmesg | grep sda<br />
[ &nbsp; &nbsp;1.665478] sd 0:0:0:0: [sda] 83886080 512-byte logical blocks: (42.9 GB/40.0 GiB)<br />
[ &nbsp; &nbsp;1.665479] sd 0:0:0:0: [sda] 4096-byte physical blocks<br />
[ &nbsp; &nbsp;1.665774] sd 0:0:0:0: [sda] Write Protect is off<br />
[ &nbsp; &nbsp;1.665775] sd 0:0:0:0: [sda] Mode Sense: 0f 00 10 00<br />
[ &nbsp; &nbsp;1.670321] sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, supports DPO and FUA<br />
[ &nbsp; &nbsp;1.683833] &nbsp;sda: sda1 sda2 sda3<br />
[ &nbsp; &nbsp;1.708938] sd 0:0:0:0: [sda] Attached SCSI disk<br />
[ &nbsp; &nbsp;5.607914] EXT4-fs (sda2): mounted filesystem with ordered data mode. Opts: (null)</div></div>
<p>Finally according to the documentation, I configured the <strong>trace flag 3979</strong> and <strong>control.alternatewritethrough=0</strong> parameters at startup parameters for my SQL Server instance.</p>
<div class="codecolorer-container text default" style="overflow:auto;white-space:nowrap;border:1px solid #9F9F9F;width:650px;"><div class="text codecolorer" style="padding:5px;font:normal 12px/1.4em Monaco, Lucida Console, monospace;white-space:nowrap">$ /opt/mssql/bin/mssql-conf traceflag 3979 on<br />
<br />
$ /opt/mssql/bin/mssql-conf set control.alternatewritethrough 0<br />
<br />
$ systemctl restart mssql-server</div></div>
<p>The first I performed was pretty similar to those in my previous (dbi services) <a href="https://blog.dbi-services.com/sql-server-on-linux-io-internal-thoughts/">blog post</a>.</p>
<div class="codecolorer-container text default" style="overflow:auto;white-space:nowrap;border:1px solid #9F9F9F;width:650px;"><div class="text codecolorer" style="padding:5px;font:normal 12px/1.4em Monaco, Lucida Console, monospace;white-space:nowrap">CREATE TABLE dummy_test (<br />
&nbsp; &nbsp; id INT IDENTITY,<br />
&nbsp; &nbsp; col1 VARCHAR(2000) DEFAULT REPLICATE('T', 2000)<br />
);<br />
<br />
INSERT INTO dummy_test DEFAULT VALUES;<br />
GO 67</div></div>
<p>For a sake of curiosity, I looked at the corresponding strace output:</p>
<div class="codecolorer-container text default" style="overflow:auto;white-space:nowrap;border:1px solid #9F9F9F;width:650px;height:450px;"><div class="text codecolorer" style="padding:5px;font:normal 12px/1.4em Monaco, Lucida Console, monospace;white-space:nowrap">$ cat sql_strace_fua.txt<br />
% time &nbsp; &nbsp; seconds &nbsp;usecs/call &nbsp; &nbsp; calls &nbsp; &nbsp;errors syscall<br />
------ ----------- ----------- --------- --------- ----------------<br />
&nbsp;78.13 &nbsp;360.618066 &nbsp; &nbsp; &nbsp; 61739 &nbsp; &nbsp; &nbsp;5841 &nbsp; &nbsp; &nbsp;2219 futex<br />
&nbsp; 6.88 &nbsp; 31.731833 &nbsp; &nbsp; 1511040 &nbsp; &nbsp; &nbsp; &nbsp;21 &nbsp; &nbsp; &nbsp; &nbsp;15 restart_syscall<br />
&nbsp; 3.81 &nbsp; 17.592176 &nbsp; &nbsp; &nbsp;130312 &nbsp; &nbsp; &nbsp; 135 &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; io_getevents<br />
&nbsp; 2.95 &nbsp; 13.607314 &nbsp; &nbsp; &nbsp; 98604 &nbsp; &nbsp; &nbsp; 138 &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; epoll_wait<br />
&nbsp; 2.88 &nbsp; 13.313667 &nbsp; &nbsp; &nbsp;633984 &nbsp; &nbsp; &nbsp; &nbsp;21 &nbsp; &nbsp; &nbsp; &nbsp;21 rt_sigtimedwait<br />
&nbsp; 2.60 &nbsp; 11.997925 &nbsp; &nbsp; 1333103 &nbsp; &nbsp; &nbsp; &nbsp; 9 &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; nanosleep<br />
&nbsp; 1.79 &nbsp; &nbsp;8.279781 &nbsp; &nbsp; &nbsp; &nbsp; 242 &nbsp; &nbsp; 34256 &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; gettid<br />
&nbsp; 0.84 &nbsp; &nbsp;3.876021 &nbsp; &nbsp; &nbsp; &nbsp; 226 &nbsp; &nbsp; 17124 &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; getcpu<br />
&nbsp; 0.03 &nbsp; &nbsp;0.138836 &nbsp; &nbsp; &nbsp; &nbsp; 347 &nbsp; &nbsp; &nbsp; 400 &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; sched_yield<br />
&nbsp; 0.01 &nbsp; &nbsp;0.062348 &nbsp; &nbsp; &nbsp; &nbsp; 254 &nbsp; &nbsp; &nbsp; 245 &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; getrusage<br />
&nbsp; 0.01 &nbsp; &nbsp;0.056065 &nbsp; &nbsp; &nbsp; &nbsp; 406 &nbsp; &nbsp; &nbsp; 138 &nbsp; &nbsp; &nbsp; &nbsp;69 readv<br />
&nbsp; 0.01 &nbsp; &nbsp;0.038107 &nbsp; &nbsp; &nbsp; &nbsp; 343 &nbsp; &nbsp; &nbsp; 111 &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; read<br />
&nbsp; 0.01 &nbsp; &nbsp;0.037883 &nbsp; &nbsp; &nbsp; &nbsp; 743 &nbsp; &nbsp; &nbsp; &nbsp;51 &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; mmap<br />
&nbsp; 0.01 &nbsp; &nbsp;0.037498 &nbsp; &nbsp; &nbsp; &nbsp; 180 &nbsp; &nbsp; &nbsp; 208 &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; epoll_ctl<br />
&nbsp; 0.01 &nbsp; &nbsp;0.035654 &nbsp; &nbsp; &nbsp; &nbsp; 517 &nbsp; &nbsp; &nbsp; &nbsp;69 &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; writev<br />
&nbsp; 0.01 &nbsp; &nbsp;0.025542 &nbsp; &nbsp; &nbsp; &nbsp; 370 &nbsp; &nbsp; &nbsp; &nbsp;69 &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; io_submit<br />
&nbsp; 0.00 &nbsp; &nbsp;0.019760 &nbsp; &nbsp; &nbsp; &nbsp; 282 &nbsp; &nbsp; &nbsp; &nbsp;70 &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; write<br />
&nbsp; 0.00 &nbsp; &nbsp;0.019555 &nbsp; &nbsp; &nbsp; &nbsp; 477 &nbsp; &nbsp; &nbsp; &nbsp;41 &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; open<br />
&nbsp; 0.00 &nbsp; &nbsp;0.016285 &nbsp; &nbsp; &nbsp; &nbsp;1629 &nbsp; &nbsp; &nbsp; &nbsp;10 &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; rt_sigaction<br />
&nbsp; 0.00 &nbsp; &nbsp;0.012359 &nbsp; &nbsp; &nbsp; &nbsp; 301 &nbsp; &nbsp; &nbsp; &nbsp;41 &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; close<br />
&nbsp; 0.00 &nbsp; &nbsp;0.010069 &nbsp; &nbsp; &nbsp; &nbsp; 205 &nbsp; &nbsp; &nbsp; &nbsp;49 &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; munmap<br />
&nbsp; 0.00 &nbsp; &nbsp;0.006977 &nbsp; &nbsp; &nbsp; &nbsp; 303 &nbsp; &nbsp; &nbsp; &nbsp;23 &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; rt_sigprocmask<br />
&nbsp; 0.00 &nbsp; &nbsp;0.006256 &nbsp; &nbsp; &nbsp; &nbsp; 153 &nbsp; &nbsp; &nbsp; &nbsp;41 &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; fstat<br />
&nbsp; 0.00 &nbsp; &nbsp;0.004646 &nbsp; &nbsp; &nbsp; &nbsp; 465 &nbsp; &nbsp; &nbsp; &nbsp;10 &nbsp; &nbsp; &nbsp; &nbsp;10 stat<br />
&nbsp; 0.00 &nbsp; &nbsp;0.000860 &nbsp; &nbsp; &nbsp; &nbsp; 215 &nbsp; &nbsp; &nbsp; &nbsp; 4 &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; madvise<br />
&nbsp; 0.00 &nbsp; &nbsp;0.000321 &nbsp; &nbsp; &nbsp; &nbsp; 161 &nbsp; &nbsp; &nbsp; &nbsp; 2 &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; sched_setaffinity<br />
&nbsp; 0.00 &nbsp; &nbsp;0.000295 &nbsp; &nbsp; &nbsp; &nbsp; 148 &nbsp; &nbsp; &nbsp; &nbsp; 2 &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; set_robust_list<br />
&nbsp; 0.00 &nbsp; &nbsp;0.000281 &nbsp; &nbsp; &nbsp; &nbsp; 141 &nbsp; &nbsp; &nbsp; &nbsp; 2 &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; clone<br />
&nbsp; 0.00 &nbsp; &nbsp;0.000236 &nbsp; &nbsp; &nbsp; &nbsp; 118 &nbsp; &nbsp; &nbsp; &nbsp; 2 &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; sigaltstack<br />
&nbsp; 0.00 &nbsp; &nbsp;0.000093 &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;47 &nbsp; &nbsp; &nbsp; &nbsp; 2 &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; arch_prctl<br />
&nbsp; 0.00 &nbsp; &nbsp;0.000046 &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;23 &nbsp; &nbsp; &nbsp; &nbsp; 2 &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; sched_getaffinity<br />
------ ----------- ----------- --------- --------- ----------------<br />
100.00 &nbsp;461.546755 &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 59137 &nbsp; &nbsp; &nbsp;2334 total</div></div>
<p>… And as I expected, with FUA enabled no fsync() / fdatasync() called anymore and writing to a stable storage is achieved directly by FUA commands. Now iomap_dio_rw() is determining if REQ_FUA can be used and issuing generic_write_sync() is still necessary. To dig further to the IO layer we need to rely to another tool blktrace (mentioned to the Bob Dorr&rsquo;s article as well).</p>
<p>In my case I got to different pictures of blktrace output between forced flushed mechanism (the default) and FUA oriented IO:</p>
<p>-&gt; With forced flush</p>
<div class="codecolorer-container text default" style="overflow:auto;white-space:nowrap;border:1px solid #9F9F9F;width:650px;"><div class="text codecolorer" style="padding:5px;font:normal 12px/1.4em Monaco, Lucida Console, monospace;white-space:nowrap">34.694734500 &nbsp; &nbsp; &nbsp;14225 18425192 &nbsp; &nbsp; 8,16 &nbsp; 0 &nbsp; &nbsp;17164 &nbsp;A &nbsp;WS &nbsp; &nbsp; &nbsp; 2048 &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; sqlservr<br />
34.694735000 &nbsp; &nbsp; &nbsp;14225 18425192 &nbsp; &nbsp; 8,16 &nbsp; 0 &nbsp; &nbsp;17165 &nbsp;Q &nbsp;WS &nbsp; &nbsp; &nbsp; 2048 &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; sqlservr<br />
34.694737000 &nbsp; &nbsp; &nbsp;14225 18425192 &nbsp; &nbsp; 8,16 &nbsp; 0 &nbsp; &nbsp;17166 &nbsp;X &nbsp;WS &nbsp; &nbsp; &nbsp; 1024 &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; sqlservr<br />
34.694738100 &nbsp; &nbsp; &nbsp;14225 18425192 &nbsp; &nbsp; 8,16 &nbsp; 0 &nbsp; &nbsp;17167 &nbsp;G &nbsp;WS &nbsp; &nbsp; &nbsp; 1024 &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; sqlservr<br />
34.694739800 &nbsp; &nbsp; &nbsp;14225 18426216 &nbsp; &nbsp; 8,16 &nbsp; 0 &nbsp; &nbsp;17169 &nbsp;G &nbsp;WS &nbsp; &nbsp; &nbsp; 1024 &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; sqlservr<br />
34.694740900 &nbsp; &nbsp; &nbsp;14225 18425192 &nbsp; &nbsp; 8,16 &nbsp; 0 &nbsp; &nbsp;17171 &nbsp;D &nbsp;WS &nbsp; &nbsp; &nbsp; 1024 &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; sqlservr<br />
34.694747200 &nbsp; &nbsp; &nbsp;14225 18426216 &nbsp; &nbsp; 8,16 &nbsp; 0 &nbsp; &nbsp;17174 &nbsp;D &nbsp;WS &nbsp; &nbsp; &nbsp; 1024 &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; sqlservr<br />
34.713665000 &nbsp; &nbsp; &nbsp;14225 0 &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;8,16 &nbsp; 0 &nbsp; &nbsp;17175 &nbsp;Q FWS &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;0 &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; sqlservr<br />
34.713668100 &nbsp; &nbsp; &nbsp;14225 0 &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;8,16 &nbsp; 0 &nbsp; &nbsp;17176 &nbsp;G FWS &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;0 &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; sqlservr</div></div>
<p>WS (Write Synchronous) is performed but SQL Server still needs to go through the multi-step flush process with the additional FWS (PERFLUSH|WRITE|SYNC).</p>
<p>-&gt; FUA</p>
<div class="codecolorer-container text default" style="overflow:auto;white-space:nowrap;border:1px solid #9F9F9F;width:650px;"><div class="text codecolorer" style="padding:5px;font:normal 12px/1.4em Monaco, Lucida Console, monospace;white-space:nowrap">0.000000000 &nbsp; &nbsp; &nbsp;16305 55106536 &nbsp; &nbsp; 8,0 &nbsp; &nbsp;0 &nbsp; &nbsp; &nbsp; &nbsp;1 &nbsp;A WFS &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;8 &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; sqlservr<br />
0.000000400 &nbsp; &nbsp; &nbsp;16305 57615336 &nbsp; &nbsp; 8,0 &nbsp; &nbsp;0 &nbsp; &nbsp; &nbsp; &nbsp;2 &nbsp;A WFS &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;8 &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; sqlservr<br />
0.000001100 &nbsp; &nbsp; &nbsp;16305 57615336 &nbsp; &nbsp; 8,0 &nbsp; &nbsp;0 &nbsp; &nbsp; &nbsp; &nbsp;3 &nbsp;Q WFS &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;8 &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; sqlservr<br />
0.000005200 &nbsp; &nbsp; &nbsp;16305 57615336 &nbsp; &nbsp; 8,0 &nbsp; &nbsp;0 &nbsp; &nbsp; &nbsp; &nbsp;4 &nbsp;G WFS &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;8 &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; sqlservr<br />
0.001377800 &nbsp; &nbsp; &nbsp;16305 55106544 &nbsp; &nbsp; 8,0 &nbsp; &nbsp;0 &nbsp; &nbsp; &nbsp; &nbsp;6 &nbsp;A WFS &nbsp; &nbsp; &nbsp; &nbsp; 16 &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; sqlservr</div></div>
<p>FWS has disappeared with only WFS commands which are basically <strong>REQ_WRITE with the REQ_FUA request</strong></p>
<p>I spent some times to read some interesting discussions in addition to the Bob Dorr&rsquo;s wonderful article. Here an interesting <a href="https://lkml.org/lkml/2019/12/3/316" rel="noopener" target="_blank">pointer</a> to a a discussion about REQ_FUA for instance.</p>
<p><strong>But what about performance gain? </strong></p>
<p>I had 2 simple scenarios to play with in order to bring out FUA helpfulness including the harden the dirty pages in the BP with checkpoint process and harden the log buffer to disk during the commit phase. When forced flush method is used, each component relies on additional FlushFileBuffers() function to achieve durability. This event can be easily tracked from an XE session including <strong>flush_file_buffers</strong> and <strong>make_writes_durable</strong> events.</p>
<p><a href="http://blog.developpez.com/mikedavem/files/2020/04/160-1-1-flushfilebuffers-worklflow.jpg"><img src="http://blog.developpez.com/mikedavem/files/2020/04/160-1-1-flushfilebuffers-worklflow.jpg" alt="160 - 1 - 1 - flushfilebuffers worklflow" width="839" height="505" class="alignnone size-full wp-image-1575" /></a></p>
<p><strong>First scenario (10K inserts within a transaction and checkpoint)</strong></p>
<p>In this scenario my intention was to stress the checkpoint process with a bunch of buffers and dirty pages to flush to disk when it kicks in.</p>
<div class="codecolorer-container text default" style="overflow:auto;white-space:nowrap;border:1px solid #9F9F9F;width:650px;height:450px;"><div class="text codecolorer" style="padding:5px;font:normal 12px/1.4em Monaco, Lucida Console, monospace;white-space:nowrap">USE dummy;<br />
<br />
SET NOCOUNT ON;<br />
-- Disable checkpoint to control when it will kick in<br />
DBCC TRACEON(3505);<br />
-- Check traceflag<br />
DBCC TRACESTATUS;<br />
<br />
DECLARE @i INT = 0;<br />
DECLARE @iteration INT = 0;<br />
DECLARE @start_upd DATETIME;<br />
DECLARE @start_chkpt DATETIME;<br />
DECLARE @end_upd DATETIME;<br />
DECLARE @end_chkpt DATETIME;<br />
<br />
TRUNCATE TABLE dummy_test;<br />
<br />
WHILE @iteration &amp;lt; 251<br />
BEGIN<br />
&nbsp; &nbsp; <br />
&nbsp; &nbsp; SET @start_upd = GETDATE();<br />
<br />
&nbsp; &nbsp; BEGIN TRAN;<br />
<br />
&nbsp; &nbsp; WHILE @i &amp;lt;= 10000<br />
&nbsp; &nbsp; BEGIN<br />
&nbsp; &nbsp; &nbsp; &nbsp; INSERT INTO dummy_test DEFAULT VALUES;<br />
&nbsp; &nbsp; &nbsp; &nbsp; SET @i += 1;<br />
&nbsp; &nbsp; END<br />
&nbsp; &nbsp; <br />
&nbsp; &nbsp; COMMIT TRAN;<br />
<br />
&nbsp; &nbsp; SET @end_upd = GETDATE();<br />
<br />
&nbsp; &nbsp; SET @i = 0;<br />
&nbsp; &nbsp; <br />
&nbsp; &nbsp; SET @start_chkpt = GETDATE();<br />
&nbsp; &nbsp; CHECKPOINT;<br />
&nbsp; &nbsp; SET @end_chkpt = GETDATE();<br />
&nbsp; &nbsp; PRINT &amp;#039;INS: &amp;#039; + CAST(DATEDIFF(ms, @start_upd, @end_upd) AS VARCHAR(50)) + &amp;#039; - CHKPT: &amp;#039; + CAST(DATEDIFF(ms, @start_chkpt, @end_chkpt) AS VARCHAR(50));<br />
<br />
&nbsp; &nbsp; SET @iteration += 1;<br />
END</div></div>
<p>The result is as follows:</p>
<p><a href="http://blog.developpez.com/mikedavem/files/2020/04/160-5-test-perfs-250_10K_chkpt.jpg"><img src="http://blog.developpez.com/mikedavem/files/2020/04/160-5-test-perfs-250_10K_chkpt.jpg" alt="160 - 5 - test perfs 250_10K_chkpt" width="974" height="298" class="alignnone size-full wp-image-1576" /></a></p>
<p>In my case, I noticed ~ 17% of improvement for the checkpoint process and ~7% for the insert transaction including the commit phase with flushing data to the TLog. In parallel, looking at the extended event aggregated output confirms that FUA avoids a lot of additional operations to persist data on disk illustrated by flush_file_buffers and make_writes_durable events.</p>
<p><a href="http://blog.developpez.com/mikedavem/files/2020/04/160-6-xe-flush-file-buffers-e1586798220100.jpg"><img src="http://blog.developpez.com/mikedavem/files/2020/04/160-6-xe-flush-file-buffers-e1586798220100.jpg" alt="160 - 6 - xe flush file buffers" width="1000" height="178" class="alignnone size-full wp-image-1577" /></a></p>
<p><strong>Second scenario (100x 1 insert within a transaction and checkpoint)</strong></p>
<p>In this scenario, I wanted to stress the log writer by forcing a lot of small transactions to commit. I updated the TSQL code as shown below:</p>
<div class="codecolorer-container text default" style="overflow:auto;white-space:nowrap;border:1px solid #9F9F9F;width:650px;height:450px;"><div class="text codecolorer" style="padding:5px;font:normal 12px/1.4em Monaco, Lucida Console, monospace;white-space:nowrap">USE dummy;<br />
<br />
SET NOCOUNT ON;<br />
-- Disable checkpoint to control when it will kick in<br />
DBCC TRACEON(3505);<br />
-- Check traceflag<br />
DBCC TRACESTATUS;<br />
<br />
DECLARE @i INT = 0;<br />
DECLARE @iteration INT = 0;<br />
DECLARE @start_upd DATETIME;<br />
DECLARE @start_chkpt DATETIME;<br />
DECLARE @end_upd DATETIME;<br />
DECLARE @end_chkpt DATETIME;<br />
<br />
TRUNCATE TABLE dummy_test;<br />
<br />
WHILE @iteration &amp;lt; 251<br />
BEGIN<br />
&nbsp; &nbsp; <br />
&nbsp; &nbsp; SET @start_upd = GETDATE();<br />
<br />
&nbsp; &nbsp; WHILE @i &amp;lt;= 100<br />
&nbsp; &nbsp; BEGIN<br />
&nbsp; &nbsp; &nbsp; &nbsp; INSERT INTO dummy_test DEFAULT VALUES;<br />
&nbsp; &nbsp; &nbsp; &nbsp; SET @i += 1;<br />
&nbsp; &nbsp; END<br />
<br />
&nbsp; &nbsp; SET @end_upd = GETDATE();<br />
<br />
&nbsp; &nbsp; SET @i = 0;<br />
&nbsp; &nbsp; <br />
&nbsp; &nbsp; SET @start_chkpt = GETDATE();<br />
&nbsp; &nbsp; CHECKPOINT;<br />
&nbsp; &nbsp; SET @end_chkpt = GETDATE();<br />
&nbsp; &nbsp; PRINT &amp;#039;INS: &amp;#039; + CAST(DATEDIFF(ms, @start_upd, @end_upd) AS VARCHAR(50)) + &amp;#039; - CHKPT: &amp;#039; + CAST(DATEDIFF(ms, @start_chkpt, @end_chkpt) AS VARCHAR(50));<br />
<br />
&nbsp; &nbsp; SET @iteration += 1;<br />
END</div></div>
<p>The new picture is the following:</p>
<p><a href="http://blog.developpez.com/mikedavem/files/2020/04/160-7-test-perfs-250_100_1K_chkpt.jpg"><img src="http://blog.developpez.com/mikedavem/files/2020/04/160-7-test-perfs-250_100_1K_chkpt.jpg" alt="160 - 7 - test perfs 250_100_1K_chkpt" width="974" height="298" class="alignnone size-full wp-image-1580" /></a></p>
<p>This time the improvement is definitely more impressive with a decrease of ~80% of the execution time about the INSERT + COMMIT and ~77% concerning the checkpoint phase!!!</p>
<p>Looking at the extended event session confirms the shorten IO path has something to do with it <img src="https://blog.developpez.com/mikedavem/wp-includes/images/smilies/icon_smile.gif" alt=":)" class="wp-smiley" /></p>
<p><a href="http://blog.developpez.com/mikedavem/files/2020/04/160-7-xe-flush-file-buffers-2-e1586798367112.jpg"><img src="http://blog.developpez.com/mikedavem/files/2020/04/160-7-xe-flush-file-buffers-2-e1586798367112.jpg" alt="160 - 7 - xe flush file buffers 2" width="1000" height="170" class="alignnone size-full wp-image-1578" /></a></p>
<p>Well, shortening the IO path and relying directing on initial FUA instructions was definitely a good idea both to join performance and to meet WAL and ACID capabilities. Anyway, I’m glad to see Microsoft to contribute improving to the Linux Kernel!!!</p>
]]></content:encoded>
			<wfw:commentRss></wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Introducing SQL Server with Portworx and storage orchestration</title>
		<link>https://blog.developpez.com/mikedavem/p13184/docker/introducing-sql-server-with-portworx-and-storage-orchestration</link>
		<comments>https://blog.developpez.com/mikedavem/p13184/docker/introducing-sql-server-with-portworx-and-storage-orchestration#comments</comments>
		<pubDate>Sun, 15 Dec 2019 22:08:03 +0000</pubDate>
		<dc:creator><![CDATA[mikedavem]]></dc:creator>
				<category><![CDATA[DevOps]]></category>
		<category><![CDATA[Docker]]></category>
		<category><![CDATA[K8s]]></category>
		<category><![CDATA[Kubernetes]]></category>
		<category><![CDATA[Portworx]]></category>
		<category><![CDATA[SQL Server]]></category>
		<category><![CDATA[Storage orchestration]]></category>

		<guid isPermaLink="false">http://blog.developpez.com/mikedavem/?p=1402</guid>
		<description><![CDATA[Stateful applications like databases need special considerations on K8s world. This is because data persistence is important and we need also something at the storage layer communicating with the container orchestrator to take advantage of its scheduling capabilities. For Stateful &#8230; <a href="https://blog.developpez.com/mikedavem/p13184/docker/introducing-sql-server-with-portworx-and-storage-orchestration">Lire la suite <span class="meta-nav">&#8594;</span></a>]]></description>
				<content:encoded><![CDATA[<p>Stateful applications like databases need special considerations on K8s world. This is because data persistence is important and we need also something at the storage layer communicating with the container orchestrator to take advantage of its scheduling capabilities. For Stateful applications, StatefulSet may be only part of the solution because it primary focuses on the Pod availability and we have to rely on the application capabilities for data replication stuff. But StatefulSet doesn’t address that of the underlying storage at all. At the moment of this write-up, StatefulSet-based solutions for SQL Server such availability groups are not supported yet on production. </p>
<p><span id="more-1402"></span></p>
<p>So, with Stateful applications we may consider other solutions like GlusterFS or NFS as distributed storage spanning all the nodes of the K8s cluster, but they often don’t meet the requirements of a database workload running in production with high throughput and IOPS requirement and data migration.</p>
<p>Products exist in the market and seem to address these specific requirements and I was very curious to get a better picture of their capabilities. During my investigation, I went through a very interesting one named Portworx for a potential customer&rsquo;s project. The interesting part of Portworx consists of a container-native, orchestration-aware storage fabric including the storage operation and administration inside K8s. It aggregates underlying storage and exposes it as a software-defined, programmable block device. </p>
<p>From a high-level perspective, Portworx is using a custom scheduler – <a href="https://portworx.com/stork-storage-orchestration-kubernetes/">STORK</a> (STorage Orchestration Runtime for Kubernetes) to assist K8s in placing a Pod in the same node where the associated PVC resides. It reduces drastically some complex stuff around annotations and labeling to perfmon some affinity rules. </p>
<p>In this blog post, I will focus only on the high-availability topic which is addressed by Portworx with volume&rsquo;s content synchronization between K8s nodes and aggregated disks. Therefore Portworx requires to define the redundancy of the dataset between replicas through a replication factor value by the way. </p>
<p>I cannot expose my customer&rsquo;s architecture here but let&rsquo;s try top apply the concept to my lab environment. </p>
<p><a href="http://blog.developpez.com/mikedavem/files/2019/12/151-0-0-K-Lab-architecture.jpg"><img src="http://blog.developpez.com/mikedavem/files/2019/12/151-0-0-K-Lab-architecture.jpg" alt="151 - 0 - 0 - K Lab architecture" width="675" height="495" class="alignnone size-full wp-image-1414" /></a></p>
<p>As shown above, my lab environment includes 4 k8s nodes with 3 nodes that will act as worker. Each worker node owns its local storage based on SSD disks (One for the SQL Server data files and the another one will handle Portworx metadata activity &#8211; Journal disk). After deploying Portworx on my K8s cluster here a big picture of my configuration:</p>
<div class="codecolorer-container text default" style="overflow:auto;white-space:nowrap;border:1px solid #9F9F9F;width:650px;"><div class="text codecolorer" style="padding:5px;font:normal 12px/1.4em Monaco, Lucida Console, monospace;white-space:nowrap">$ kubectl get daemonset -n kube-system | egrep &quot;(stork|portworx|px)&quot;<br />
portworx &nbsp; &nbsp; &nbsp; 3 &nbsp; &nbsp; &nbsp; &nbsp; 3 &nbsp; &nbsp; &nbsp; &nbsp; 3 &nbsp; &nbsp; &nbsp; 3 &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;3 &nbsp; &nbsp; &nbsp; &nbsp;<br />
portworx-api &nbsp; 3 &nbsp; &nbsp; &nbsp; &nbsp; 3 &nbsp; &nbsp; &nbsp; &nbsp; 3 &nbsp; &nbsp; &nbsp; 3 &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;3</div></div>
<p>Portworx is a DaemonSet-based installation. Each Portworx node will discover the availability storage to create a container-native block storage device with:<br />
&#8211;	/dev/sdb for my SQL Server data<br />
&#8211;	/dev/sdc for hosting my journal</p>
<div class="codecolorer-container text default" style="overflow:auto;white-space:nowrap;border:1px solid #9F9F9F;width:650px;"><div class="text codecolorer" style="padding:5px;font:normal 12px/1.4em Monaco, Lucida Console, monospace;white-space:nowrap">$ kubectl get pod -n kube-system | egrep &quot;(stork|portworx|px)&quot;<br />
<br />
portworx-555wf &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;1/1 &nbsp; &nbsp; Running &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;18 &nbsp; &nbsp; &nbsp; &nbsp; 2d23h<br />
portworx-api-2pv6s &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;1/1 &nbsp; &nbsp; Running &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;8 &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;2d23h<br />
portworx-api-s8zzr &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;1/1 &nbsp; &nbsp; Running &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;8 &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;2d23h<br />
portworx-api-vnqh2 &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;1/1 &nbsp; &nbsp; Running &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;4 &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;2d23h<br />
portworx-pjxl8 &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;1/1 &nbsp; &nbsp; Running &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;17 &nbsp; &nbsp; &nbsp; &nbsp; 2d23h<br />
portworx-wrcdf &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;1/1 &nbsp; &nbsp; Running &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;389 &nbsp; &nbsp; &nbsp; &nbsp;2d10h<br />
px-lighthouse-55db75b59c-qd2nc &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;3/3 &nbsp; &nbsp; Running &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;0 &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;35h<br />
stork-5d568485bb-ghlt9 &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;1/1 &nbsp; &nbsp; Running &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;0 &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;35h<br />
stork-5d568485bb-h2sqm &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;1/1 &nbsp; &nbsp; Running &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;13 &nbsp; &nbsp; &nbsp; &nbsp; 2d23h<br />
stork-5d568485bb-xxd4b &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;1/1 &nbsp; &nbsp; Running &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;1 &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;2d4h<br />
stork-scheduler-56574cdbb5-7td6v &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;1/1 &nbsp; &nbsp; Running &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;0 &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;35h<br />
stork-scheduler-56574cdbb5-skw5f &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;1/1 &nbsp; &nbsp; Running &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;4 &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;2d4h<br />
stork-scheduler-56574cdbb5-v5slj &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;1/1 &nbsp; &nbsp; Running &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;9 &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;2d23h</div></div>
<p>The above picture shows different stork pods that may influence scheduling based on the location of volumes that a pod requires. In addition, the PX cluster (part of Portworx Enterprise Platform) includes all the Portworx pods and allows getting to monitor and performance insights of each related pod (SQL Server instance here). </p>
<p>Let’s have a look at the global configuration by using the <strong>pxctl</strong> command (first section):</p>
<div class="codecolorer-container text default" style="overflow:auto;white-space:nowrap;border:1px solid #9F9F9F;width:650px;"><div class="text codecolorer" style="padding:5px;font:normal 12px/1.4em Monaco, Lucida Console, monospace;white-space:nowrap">$ PX_POD=$(kubectl get pods -l name=portworx -n kube-system -o jsonpath='{.items[0].metadata.name}')<br />
$ kubectl exec $PX_POD -n kube-system -- /opt/pwx/bin/pxctl status<br />
Status: PX is operational<br />
License: Trial (expires in 28 days)<br />
Node ID: 590d7afd-9d30-4624-8082-5f9cb18ecbfd<br />
&nbsp; &nbsp; &nbsp; &nbsp; IP: 192.168.90.63<br />
&nbsp; &nbsp; &nbsp; &nbsp; Local Storage Pool: 1 pool<br />
&nbsp; &nbsp; &nbsp; &nbsp; POOL &nbsp; &nbsp;IO_PRIORITY &nbsp; &nbsp; RAID_LEVEL &nbsp; &nbsp; &nbsp;USABLE &nbsp;USED &nbsp; &nbsp;STATUS &nbsp;ZONE &nbsp; &nbsp;REGION<br />
&nbsp; &nbsp; &nbsp; &nbsp; 0 &nbsp; &nbsp; &nbsp; HIGH &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;raid0 &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 20 GiB &nbsp;8.5 GiB Online &nbsp;default default<br />
&nbsp; &nbsp; &nbsp; &nbsp; Local Storage Devices: 1 device<br />
&nbsp; &nbsp; &nbsp; &nbsp; Device &nbsp;Path &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;Media Type &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;Size &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;Last-Scan<br />
&nbsp; &nbsp; &nbsp; &nbsp; 0:1 &nbsp; &nbsp; /dev/sdb &nbsp; &nbsp; &nbsp; &nbsp;STORAGE_MEDIUM_MAGNETIC 20 GiB &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;08 Dec 19 21:59 UTC<br />
&nbsp; &nbsp; &nbsp; &nbsp; total &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; - &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 20 GiB<br />
&nbsp; &nbsp; &nbsp; &nbsp; Cache Devices:<br />
&nbsp; &nbsp; &nbsp; &nbsp; No cache devices<br />
&nbsp; &nbsp; &nbsp; &nbsp; Journal Device:<br />
&nbsp; &nbsp; &nbsp; &nbsp; 1 &nbsp; &nbsp; &nbsp; /dev/sdc1 &nbsp; &nbsp; &nbsp; STORAGE_MEDIUM_MAGNETIC<br />
…</div></div>
<p>Portworx has created a pool composed of my 3 replicas / Kubernetes nodes with a 20GB SSD each. I just used a default configuration without specifying any zone or region stuff for fault tolerance capabilities. This is not my focus at this moment. According to Portworx’s performance tuning documentation, I configured a journal device to improve I/O performance by offloading PX metadata writes to a separate storage. </p>
<p>Second section:</p>
<div class="codecolorer-container text default" style="overflow:auto;white-space:nowrap;border:1px solid #9F9F9F;width:650px;"><div class="text codecolorer" style="padding:5px;font:normal 12px/1.4em Monaco, Lucida Console, monospace;white-space:nowrap">…<br />
Nodes: 3 node(s) with storage (3 online)<br />
&nbsp; &nbsp; &nbsp; &nbsp; IP &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;ID &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;SchedulerNodeName &nbsp; &nbsp; &nbsp; StorageNode &nbsp; &nbsp; &nbsp;Used &nbsp; &nbsp;Capacity &nbsp; &nbsp; &nbsp; &nbsp;Status &nbsp;StorageStatus &nbsp; Version &nbsp; &nbsp; &nbsp; &nbsp; Kernel &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;OS<br />
&nbsp; &nbsp; &nbsp; &nbsp; 192.168.5.62 &nbsp; &nbsp;b0ac4fa3-29c2-40a8-9033-1d0558ec31fd &nbsp; &nbsp;k8n2.dbi-services.test &nbsp;Yes &nbsp; &nbsp; 3.1 GiB &nbsp;20 GiB &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;Online &nbsp;Up &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;2.3.0.0-103206b 3.10.0-1062.1.2.el7.x86_64 &nbsp; &nbsp; &nbsp;CentOS Linux 7 (Core)<br />
&nbsp; &nbsp; &nbsp; &nbsp; 192.168.40.61 &nbsp; 9fc5bc45-5602-4926-ab38-c74f0a8a8b2c &nbsp; &nbsp;k8n1.dbi-services.test &nbsp;Yes &nbsp; &nbsp; 8.6 GiB &nbsp;20 GiB &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;Online &nbsp;Up &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;2.3.0.0-103206b 3.10.0-1062.1.2.el7.x86_64 &nbsp; &nbsp; &nbsp;CentOS Linux 7 (Core)<br />
&nbsp; &nbsp; &nbsp; &nbsp; 192.168.80.63 &nbsp; 590d7afd-9d30-4624-8082-5f9cb18ecbfd &nbsp; &nbsp;k8n3.dbi-services.test &nbsp;Yes &nbsp; &nbsp; 8.5 GiB &nbsp;20 GiB &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;Online &nbsp;Up (This node) &nbsp;2.3.0.0-103206b 3.10.0-1062.1.2.el7.x86_64 &nbsp; &nbsp; &nbsp;CentOS Linux 7 (Core)<br />
Global Storage Pool<br />
&nbsp; &nbsp; &nbsp; &nbsp; Total Used &nbsp; &nbsp; &nbsp;: &nbsp;20 GiB<br />
&nbsp; &nbsp; &nbsp; &nbsp; Total Capacity &nbsp;: &nbsp;60 GiB</div></div>
<p>All my nodes are up for a total storage of 60 GiB. Let&rsquo;s deploy a Portworx Storage Class with the following specification:</p>
<div class="codecolorer-container text default" style="overflow:auto;white-space:nowrap;border:1px solid #9F9F9F;width:650px;"><div class="text codecolorer" style="padding:5px;font:normal 12px/1.4em Monaco, Lucida Console, monospace;white-space:nowrap">kind: StorageClass<br />
apiVersion: storage.k8s.io/v1<br />
metadata:<br />
&nbsp; name: portworx-sc<br />
provisioner: kubernetes.io/portworx-volume<br />
parameters:<br />
&nbsp; repl: &quot;3&quot;<br />
&nbsp; nodes: &quot;b0ac4fa3-29c2-40a8-9033-1d0558ec31fd,9fc5bc45-5602-4926-ab38-c74f0a8a8b2c,590d7afd-9d30-4624-8082-5f9cb18ecbfd&quot;<br />
&nbsp; label: &quot;name=mssqlvol&quot;<br />
&nbsp; fs: &quot;xfs&quot;<br />
&nbsp; io_profile: &quot;db&quot;<br />
&nbsp; priority_io: &quot;high&quot;<br />
&nbsp; journal: &quot;true&quot;<br />
allowVolumeExpansion: true</div></div>
<p>The important parameters are:</p>
<p><strong>repl: &laquo;&nbsp;3&nbsp;&raquo;</strong> =&gt; Number of replicas (K8s nodes) where data will be replicated</p>
<p><strong>nodes: &laquo;&nbsp;b0ac4fa3-29c2-40a8-9033-1d0558ec31fd,9fc5bc45-5602-4926-ab38-c74f0a8a8b2c,590d7afd-9d30-4624-8082-5f9cb18ecbfd&nbsp;&raquo;</strong> =&gt; Number of replicas used for data replication. Replicas are identified by their ID. Each write is synchronously replicated to a quorum set of nodes whereas read throughput is aggregated, where multiple nodes can service one read request in parallel streams.</p>
<p><strong>fs: &laquo;&nbsp;xfs&nbsp;&raquo;</strong> =&gt; I used a Linux FS supported by SQL Server on Linux</p>
<p><strong>io_profile: &laquo;&nbsp;db&nbsp;&raquo;</strong> =&gt; By default, Portworx is able to use some profiles according to the access pattern. Here I just forced it to use db profile that implements a write-back flush coalescing algorithm. </p>
<p><strong>priority_io: &laquo;&nbsp;high&nbsp;&raquo;</strong> =&gt; I deliberately configured the IO priority value to high for my pool in order to favor maximum throughput and low latency transactional workloads. I used SSD storage accordingly.</p>
<p><strong>journal: &laquo;&nbsp;true&nbsp;&raquo;</strong> =&gt; The volumes used by this storage class will use the journal dedicated device</p>
<p><strong>allowVolumeExpansion: true</strong> =&gt; is an interesting parameter to allow online expansion of the concerned volume(s). As an aside, it is worth noting that volume expansion capabilities is pretty new (&gt; v1.11+) on K8s word for the following in-tree volume plugins: AWS-EBS, GCE-PD, Azure Disk, Azure File, Glusterfs, Cinder, Portworx, and Ceph RBD</p>
<p>Then, let&rsquo;s use Dynamic Provisioning with the following PVC specification:</p>
<div class="codecolorer-container text default" style="overflow:auto;white-space:nowrap;border:1px solid #9F9F9F;width:650px;"><div class="text codecolorer" style="padding:5px;font:normal 12px/1.4em Monaco, Lucida Console, monospace;white-space:nowrap">kind: PersistentVolumeClaim<br />
apiVersion: v1<br />
metadata:<br />
&nbsp; name: pvcsc001<br />
&nbsp; annotations:<br />
&nbsp; &nbsp; volume.beta.kubernetes.io/storage-class: portworx-sc<br />
spec:<br />
&nbsp; accessModes:<br />
&nbsp; &nbsp; - ReadWriteOnce<br />
&nbsp; resources:<br />
&nbsp; &nbsp; requests:<br />
&nbsp; &nbsp; &nbsp; storage: 20Gi</div></div>
<p>Usual specification for a PVC &#8230; I just claimed 20Gi of storage based on my portworx storage class. After deploying both the Storage Class and PVC here the new picture of my configuration:</p>
<div class="codecolorer-container text default" style="overflow:auto;white-space:nowrap;border:1px solid #9F9F9F;width:650px;"><div class="text codecolorer" style="padding:5px;font:normal 12px/1.4em Monaco, Lucida Console, monospace;white-space:nowrap">$ kubectl get sc<br />
NAME &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; PROVISIONER &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; AGE<br />
portworx-sc &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;kubernetes.io/portworx-volume &nbsp; 3d14h<br />
stork-snapshot-sc &nbsp; &nbsp; &nbsp; &nbsp;stork-snapshot &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;3d23h<br />
<br />
$ kubectl get pvc<br />
NAME &nbsp; &nbsp; &nbsp; STATUS &nbsp; VOLUME &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; CAPACITY &nbsp; ACCESS MODES &nbsp; STORAGECLASS &nbsp; AGE<br />
pvcsc001 &nbsp; Bound &nbsp; &nbsp;pvc-98d12db5-17ff-11ea-9d3a-00155dc4b604 &nbsp; 20Gi &nbsp; &nbsp; &nbsp; RWO &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;portworx-sc &nbsp; &nbsp;3d13h</div></div>
<p>Note that there is also a special storage class implemention for snapshot capabilities, we will talk about this capability in next write-up. My PVC <strong>pvcs001</strong> is ready to be used by my Stateful application. Now it&rsquo;s time to deploy a Stateful application with my SQL Server pod and the specification below. Let&rsquo;s say that Portworx volumes are usable for non-root execution containers when specifying fsGroup parameter (securityContext section). So,this is a good fit with the non-root execution capabilities shipped with SQL Server pod <img src="https://blog.developpez.com/mikedavem/wp-includes/images/smilies/icon_smile.gif" alt=":)" class="wp-smiley" /> You will also notice there is no special labeling or affinity stuff between my pod and the PVC. I just defined the volume mount, the corresponding PVC and that&rsquo;s it!</p>
<div class="codecolorer-container text default" style="overflow:auto;white-space:nowrap;border:1px solid #9F9F9F;width:650px;height:450px;"><div class="text codecolorer" style="padding:5px;font:normal 12px/1.4em Monaco, Lucida Console, monospace;white-space:nowrap">apiVersion: apps/v1beta1<br />
kind: Deployment<br />
metadata:<br />
&nbsp; name: mssql-deployment<br />
spec:<br />
&nbsp; replicas: 1<br />
&nbsp; template:<br />
&nbsp; &nbsp; metadata:<br />
&nbsp; &nbsp; &nbsp; labels:<br />
&nbsp; &nbsp; &nbsp; &nbsp; app: mssql<br />
&nbsp; &nbsp; spec:<br />
&nbsp; &nbsp; &nbsp; securityContext:<br />
&nbsp; &nbsp; &nbsp; &nbsp; runAsUser: 10001<br />
&nbsp; &nbsp; &nbsp; &nbsp; runAsGroup: 10001<br />
&nbsp; &nbsp; &nbsp; &nbsp; fsGroup: 10001<br />
&nbsp; &nbsp; &nbsp; terminationGracePeriodSeconds: 10<br />
&nbsp; &nbsp; &nbsp; containers:<br />
&nbsp; &nbsp; &nbsp; - name: mssql<br />
&nbsp; &nbsp; &nbsp; &nbsp; image: mcr.microsoft.com/mssql/server:2019-GA-ubuntu-16.04<br />
&nbsp; &nbsp; &nbsp; &nbsp; ports:<br />
&nbsp; &nbsp; &nbsp; &nbsp; - containerPort: 1433<br />
&nbsp; &nbsp; &nbsp; &nbsp; env:<br />
&nbsp; &nbsp; &nbsp; &nbsp; - name: MSSQL_PID<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; value: &quot;Developer&quot;<br />
&nbsp; &nbsp; &nbsp; &nbsp; - name: ACCEPT_EULA<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; value: &quot;Y&quot;<br />
&nbsp; &nbsp; &nbsp; &nbsp; - name: MSSQL_SA_PASSWORD<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; valueFrom:<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; secretKeyRef:<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; name: sql-secrets<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; key: sapassword<br />
&nbsp; &nbsp; &nbsp; &nbsp; volumeMounts:<br />
&nbsp; &nbsp; &nbsp; &nbsp; - name: mssqldb<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; mountPath: /var/opt/mssql<br />
&nbsp; &nbsp; &nbsp; &nbsp; resources:<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; limits:<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; cpu: &quot;3500m&quot;<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; requests:<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; cpu: &quot;2000m&quot;<br />
&nbsp; &nbsp; &nbsp; volumes:<br />
&nbsp; &nbsp; &nbsp; - name: mssqldb<br />
&nbsp; &nbsp; &nbsp; &nbsp; persistentVolumeClaim:<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; claimName: pvcsc001<br />
<br />
---<br />
apiVersion: v1<br />
kind: Service<br />
metadata:<br />
&nbsp; name: mssql-deployment<br />
spec:<br />
&nbsp; selector:<br />
&nbsp; &nbsp; app: mssql<br />
&nbsp; ports:<br />
&nbsp; &nbsp; - protocol: TCP<br />
&nbsp; &nbsp; &nbsp; port: 1470<br />
&nbsp; &nbsp; &nbsp; targetPort: 1433<br />
&nbsp; type: LoadBalancer</div></div>
<p>Let&rsquo;s take a look at the deployment status:</p>
<div class="codecolorer-container text default" style="overflow:auto;white-space:nowrap;border:1px solid #9F9F9F;width:650px;"><div class="text codecolorer" style="padding:5px;font:normal 12px/1.4em Monaco, Lucida Console, monospace;white-space:nowrap">$ kubectl get deployment,pod,svc<br />
NAME &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; READY &nbsp; UP-TO-DATE &nbsp; AVAILABLE &nbsp; AGE<br />
deployment.extensions/mssql-deployment &nbsp; 1/1 &nbsp; &nbsp; 1 &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;1 &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 3d7h<br />
<br />
NAME &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; READY &nbsp; STATUS &nbsp; &nbsp;RESTARTS &nbsp; AGE<br />
pod/mssql-deployment-67fdd4759-vtzmz &nbsp; 1/1 &nbsp; &nbsp; Running &nbsp; 0 &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;45m<br />
<br />
NAME &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; TYPE &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; CLUSTER-IP &nbsp; &nbsp; &nbsp;EXTERNAL-IP &nbsp; &nbsp; PORT(S) &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;AGE<br />
service/kubernetes &nbsp; &nbsp; &nbsp; &nbsp; ClusterIP &nbsp; &nbsp; &nbsp;10.96.0.1 &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; 443/TCP &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;4d<br />
service/mssql-deployment &nbsp; LoadBalancer &nbsp; 10.98.246.160 &nbsp; 192.168.40.61 &nbsp; 1470:32374/TCP &nbsp; 3d7h</div></div>
<p>We&rsquo;re now ready to test the HA capabilities of Portworx! Let&rsquo;s see how STORK influences the scheduling to get my SQL Server pod on the same node where my PVC resides. The <strong>pxctl</strong> command provides different options to get information about the PX cluster and volumes as well as configuration and management capabilities. Here a picture of my volumes:</p>
<div class="codecolorer-container text default" style="overflow:auto;white-space:nowrap;border:1px solid #9F9F9F;width:650px;"><div class="text codecolorer" style="padding:5px;font:normal 12px/1.4em Monaco, Lucida Console, monospace;white-space:nowrap">$ kubectl exec $PX_POD -n kube-system -- /opt/pwx/bin/pxctl volume list<br />
ID &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;NAME &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;SIZE &nbsp; &nbsp;HA &nbsp; &nbsp; &nbsp;SHARED &nbsp;ENCRYPTED &nbsp; &nbsp; &nbsp; &nbsp;IO_PRIORITY &nbsp; &nbsp; STATUS &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;SNAP-ENABLED<br />
675137742462835449 &nbsp; &nbsp; &nbsp;pvc-98d12db5-17ff-11ea-9d3a-00155dc4b604 &nbsp; &nbsp; &nbsp; &nbsp;20 GiB &nbsp;2 &nbsp; &nbsp; &nbsp; no &nbsp; &nbsp; &nbsp;no &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; HIGH &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;up - attached on 192.168.40.61 &nbsp;no<br />
$ kubectl get pod -o wide<br />
NAME &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; READY &nbsp; STATUS &nbsp; &nbsp;RESTARTS &nbsp; AGE &nbsp; IP &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;NODE &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; NOMINATED NODE &nbsp; READINESS GATES<br />
mssql-deployment-67fdd4759-vtzmz &nbsp; 1/1 &nbsp; &nbsp; Running &nbsp; 0 &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;48m &nbsp; 172.16.160.54 &nbsp; k8n1.dbi-services.test</div></div>
<p>My SQL Server pod and my Portworx storage sit together on the K8n1.dbi-services.test node. The PX web console is also available and provides the same kind of information as pxctl command does. </p>
<p><a href="http://blog.developpez.com/mikedavem/files/2019/12/151-1-PX-web-console-volume.jpg"><img src="http://blog.developpez.com/mikedavem/files/2019/12/151-1-PX-web-console-volume.jpg" alt="151 - 1 - PX web console volume" width="1795" height="913" class="alignnone size-full wp-image-1408" /></a></p>
<p>Let&rsquo;s now simulate the K8n1.dbi-services.test node failure. In this scenario both my PVC and my SQL Server pod are going to move to the next available &#8211; K8n2 (192.168.20.62). This is where STORK comes into play to stick my pod with my PVC location. </p>
<p><a href="http://blog.developpez.com/mikedavem/files/2019/12/151-2-PX-web-console-volume-after-failover.jpg"><img src="http://blog.developpez.com/mikedavem/files/2019/12/151-2-PX-web-console-volume-after-failover.jpg" alt="151 - 2 - PX web console volume after failover" width="1835" height="941" class="alignnone size-full wp-image-1410" /></a></p>
<p>&#8230;</p>
<div class="codecolorer-container text default" style="overflow:auto;white-space:nowrap;border:1px solid #9F9F9F;width:650px;"><div class="text codecolorer" style="padding:5px;font:normal 12px/1.4em Monaco, Lucida Console, monospace;white-space:nowrap">$ kubectl get pod -o wide<br />
NAME &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; READY &nbsp; STATUS &nbsp; &nbsp;RESTARTS &nbsp; AGE &nbsp; IP &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; NODE &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; NOMINATED NODE &nbsp; READINESS GATES<br />
mssql-deployment-67fdd4759-rbxcb &nbsp; 1/1 &nbsp; &nbsp; Running &nbsp; 0 &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;31m &nbsp; 172.16.197.157 &nbsp; k8n2.dbi-services.test</div></div>
<p>Another important point, my SQL Server data survived to my pod restart and remained available through my SQL Server instance as expected !! It was a short introduction to Portworx capabilities here and I will continue to share about it in a near future!</p>
<p>See you !</p>
<p>David Barbarin</p>
]]></content:encoded>
			<wfw:commentRss></wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>SQL Server sur Docker Swarm</title>
		<link>https://blog.developpez.com/mikedavem/p13172/docker/sql-server-sur-docker-swarm</link>
		<comments>https://blog.developpez.com/mikedavem/p13172/docker/sql-server-sur-docker-swarm#comments</comments>
		<pubDate>Mon, 12 Feb 2018 17:51:48 +0000</pubDate>
		<dc:creator><![CDATA[mikedavem]]></dc:creator>
				<category><![CDATA[Docker]]></category>
		<category><![CDATA[Linux]]></category>
		<category><![CDATA[SQL Server]]></category>
		<category><![CDATA[Swarm]]></category>

		<guid isPermaLink="false">http://blog.developpez.com/mikedavem/?p=1387</guid>
		<description><![CDATA[SQL Server 2017 est disponible sur de multiples plateformes: Windows, Linux et Docker. La dernière plateforme fournit des fonctionnalités de containerisation avec setup rapide et sans prérequis spécifiques avant d&#8217;exécuter des bases de données SQL Server qui sont probablement la &#8230; <a href="https://blog.developpez.com/mikedavem/p13172/docker/sql-server-sur-docker-swarm">Lire la suite <span class="meta-nav">&#8594;</span></a>]]></description>
				<content:encoded><![CDATA[<p>SQL Server 2017 est disponible sur de multiples plateformes: Windows, Linux et Docker. La dernière plateforme fournit des fonctionnalités de containerisation avec setup rapide et sans prérequis spécifiques avant d&rsquo;exécuter des bases de données SQL Server qui sont probablement la clé du succès pour les développeurs.</p>
<p>&gt; <a href="https://blog.dbi-services.com/introducing-sql-server-on-docker-swarm-orchestrator/" rel="noopener" target="_blank">Lire la suite</a> (en anglais)</p>
<p>David Barbarin<br />
MVP &amp; MCM SQL Server</p>
]]></content:encoded>
			<wfw:commentRss></wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Prochaine édition des 24 HOP 2017 francophone</title>
		<link>https://blog.developpez.com/mikedavem/p13162/evenements/prochaine-edition-des-24-hop-2017-francophone</link>
		<comments>https://blog.developpez.com/mikedavem/p13162/evenements/prochaine-edition-des-24-hop-2017-francophone#comments</comments>
		<pubDate>Tue, 02 Jan 2018 17:36:07 +0000</pubDate>
		<dc:creator><![CDATA[mikedavem]]></dc:creator>
				<category><![CDATA[Evénements]]></category>
		<category><![CDATA[SQL Server 2017]]></category>
		<category><![CDATA[24HOP]]></category>
		<category><![CDATA[AlwaysOn;groupes de disponibilité;availability groups]]></category>
		<category><![CDATA[haute disponibilité]]></category>
		<category><![CDATA[Linux]]></category>
		<category><![CDATA[SQL Server]]></category>
		<category><![CDATA[SQLPass]]></category>

		<guid isPermaLink="false">http://blog.developpez.com/mikedavem/?p=1353</guid>
		<description><![CDATA[La prochaine édition du 24 Hours of PASS 2017 edition francophone se déroulera les 29-30 juin prochain. Pour rappel le format est simple: 24 webinars gratuits répartis sur 2 jours de 07:00 à 18h00 GMT et en Français. La seule &#8230; <a href="https://blog.developpez.com/mikedavem/p13162/evenements/prochaine-edition-des-24-hop-2017-francophone">Lire la suite <span class="meta-nav">&#8594;</span></a>]]></description>
				<content:encoded><![CDATA[<p>La prochaine édition du 24 Hours of PASS 2017 edition francophone se déroulera les 29-30 juin prochain.</p>
<p>Pour rappel le format est simple: 24 webinars gratuits répartis sur 2 jours de 07:00 à 18h00 GMT et en Français. La seule obligation: <a href="http://www.pass.org/24hours/2017/french/About.aspx" rel="noopener" target="_blank">s’inscrire</a> aux sessions auxquelles vous assisterez. Cela vous permettra également de récupérer l’enregistrement vidéo si vous voulez la visionner à nouveau par la suite.</p>
<p>Cette année il y en aura encore pour tous les goûts. Du monitoring, de la performance, de l’Azure, de la BI, du BigData et machine learning, de la modélisation, de la haute disponibilité, de l’open source et des nouveautés concernant la prochaine version de SQL Server!</p>
<p>Pour ma part j’aurai le privilège de présenter une <a href="http://www.pass.org/24hours/2017/french/Sessions/Details.aspx?sid=64426" rel="noopener" target="_blank">session</a> concernant les nouvelles possibilités en terme de haute disponibilité avec SQL Server dans un monde mixte (Windows et Linux) et un monde “full Linux”.</p>
<p><a href="http://blog.developpez.com/mikedavem/files/2018/01/24HOP-Website-Banner-French-e1496143231714.jpg"><img src="http://blog.developpez.com/mikedavem/files/2018/01/24HOP-Website-Banner-French-e1496143231714.jpg" alt="24HOP-Website-Banner-French-e1496143231714" width="800" height="217" class="alignnone size-full wp-image-1354" /></a></p>
<p>Au plaisir de vous y retrouver!</p>
<p>David Barbarin<br />
MVP &amp; MCM SQL Server</p>
]]></content:encoded>
			<wfw:commentRss></wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>SQL Server 2017 &#8211; Linux et scénarios log shipping</title>
		<link>https://blog.developpez.com/mikedavem/p13161/sql-server-vnext/sql-server-2017-linux-et-scenarios-log-shipping</link>
		<comments>https://blog.developpez.com/mikedavem/p13161/sql-server-vnext/sql-server-2017-linux-et-scenarios-log-shipping#comments</comments>
		<pubDate>Tue, 02 Jan 2018 17:30:24 +0000</pubDate>
		<dc:creator><![CDATA[mikedavem]]></dc:creator>
				<category><![CDATA[SQL Server 2017]]></category>
		<category><![CDATA[haute disponibilité]]></category>
		<category><![CDATA[high availability]]></category>
		<category><![CDATA[Linux]]></category>
		<category><![CDATA[logshipping]]></category>
		<category><![CDATA[SQL Server]]></category>

		<guid isPermaLink="false">http://blog.developpez.com/mikedavem/?p=1349</guid>
		<description><![CDATA[Dans ce billet, nous aborderons la fonctionnalité logshipping disponible depuis SQL Server CTP 2.0. Précisons d&#8217;abord que c&#8217;est une fonctionnalité HA OS-agnostic et qu&#8217;il est possible de créer sa propre solution custom même sous Linux (via des jobs cron par &#8230; <a href="https://blog.developpez.com/mikedavem/p13161/sql-server-vnext/sql-server-2017-linux-et-scenarios-log-shipping">Lire la suite <span class="meta-nav">&#8594;</span></a>]]></description>
				<content:encoded><![CDATA[<p>Dans ce billet, nous aborderons la fonctionnalité logshipping disponible depuis SQL Server CTP 2.0. Précisons d&rsquo;abord que c&rsquo;est une fonctionnalité HA OS-agnostic et qu&rsquo;il est possible de créer sa propre solution custom même sous Linux (via des jobs cron par exemple). Mais il s&rsquo;agit ici de parler de la solution out-of-the-box disponible maintenant sous Linux &#8230;</p>
<p>&gt; <a href="https://blog.dbi-services.com/sql-server-2017-on-linux-and-some-log-shipping-scenarios/" rel="noopener" target="_blank">Lire la suite</a> (en anglais)</p>
<p>David Barbarin<br />
MVP &amp; MCM SQL Server</p>
]]></content:encoded>
			<wfw:commentRss></wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
