<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>David Barbarin &#187; dbatools</title>
	<atom:link href="https://blog.developpez.com/mikedavem/ptag/dbatools/feed" rel="self" type="application/rss+xml" />
	<link>https://blog.developpez.com/mikedavem</link>
	<description>MVP DataPlatform - MCM SQL Server</description>
	<lastBuildDate>Thu, 09 Sep 2021 21:19:50 +0000</lastBuildDate>
	<language>fr-FR</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>https://wordpress.org/?v=4.1.42</generator>
	<item>
		<title>Curious case of locking scenario with SQL Server audits</title>
		<link>https://blog.developpez.com/mikedavem/p13200/sql-server-vnext/curious-case-of-locking-scenario-including-sql-server-audits</link>
		<comments>https://blog.developpez.com/mikedavem/p13200/sql-server-vnext/curious-case-of-locking-scenario-including-sql-server-audits#comments</comments>
		<pubDate>Mon, 05 Oct 2020 19:25:47 +0000</pubDate>
		<dc:creator><![CDATA[mikedavem]]></dc:creator>
				<category><![CDATA[SQL Server 2017]]></category>
		<category><![CDATA[blocking]]></category>
		<category><![CDATA[dbatools]]></category>
		<category><![CDATA[performance]]></category>
		<category><![CDATA[SQL Server audit]]></category>

		<guid isPermaLink="false">http://blog.developpez.com/mikedavem/?p=1673</guid>
		<description><![CDATA[In high mission-critical environments, ensuring high level of availability is a prerequisite and usually IT department addresses required SLAs (the famous 9’s) with high available architecture solutions. As stated by Wikipedia: availability measurement is subject to some degree of interpretation. &#8230; <a href="https://blog.developpez.com/mikedavem/p13200/sql-server-vnext/curious-case-of-locking-scenario-including-sql-server-audits">Lire la suite <span class="meta-nav">&#8594;</span></a>]]></description>
				<content:encoded><![CDATA[<p>In high mission-critical environments, ensuring high level of availability is a prerequisite and usually IT department addresses required SLAs (the famous 9’s) with high available architecture solutions. As stated by <a href="https://en.wikipedia.org/wiki/High_availability" rel="noopener" target="_blank">Wikipedia</a>: <strong><em>availability measurement is subject to some degree of interpretation</em></strong>. Thus, IT department generally focus on uptime metric whereas for other departments availability is often related to application response time or tied to slowness / unresponsiveness complains. The latter is about application throughput and database locks may contribute to reduce it. This is something we are constantly monitoring in addition of the uptime in my company. </p>
<p><span id="more-1673"></span></p>
<p>A couple of weeks ago, we began to experience suddenly some unexpected blocking issues that included some specific query patterns and SQL Server audit feature. This is all more important as this specific scenario began from one specific database and led to create a long hierarchy tree of blocked processes with blocked SQL Server audit operation first and then propagated to all databases on the SQL Server instance. A very bad scenario we definitely want to avoid … Here a sample of the blocking processes tree:</p>
<p><a href="http://blog.developpez.com/mikedavem/files/2020/10/167-1-blocking-scenarios-e1601924652500.jpg"><img src="http://blog.developpez.com/mikedavem/files/2020/10/167-1-blocking-scenarios-e1601924652500.jpg" alt="167 - 1 - blocking scenarios" width="800" height="56" class="alignnone size-full wp-image-1674" /></a></p>
<p>First, let’s set the context :</p>
<p>We are using SQL Server audit for different purposes since the SQL Server 2014 version and we actually running on SQL Server 2017 CU21 at the moment of this write-up. The obvious one is for security regulatory compliance with login events. We also rely on SQL Server audits to extend the observability of our monitoring system (based on Prometheus and Grafana). Configuration changes are audited with specific events and we link concerned events with annotations in our SQL Server Grafana dashboards. Thus, we are able to quickly correlate events with some behavior changes that may occur on the database side. The high-level of the audit infrastructure is as follows:</p>
<p><a href="http://blog.developpez.com/mikedavem/files/2020/10/167-0-audit-architecture-e1601924728531.jpg"><img src="http://blog.developpez.com/mikedavem/files/2020/10/167-0-audit-architecture-e1601924728531.jpg" alt="167 - 0 - audit architecture" width="800" height="417" class="alignnone size-full wp-image-1675" /></a></p>
<p>As shown in the picture above, a PowerShell script carries out stopping and restarting the audit target and then we use the archive audit file to import related data to a dedicated database.<br />
Let’s precise we use this process without any issues since a couple of years and we were surprised to experience such behavior at this moment. Enough surprising for me to write a blog post <img src="https://blog.developpez.com/mikedavem/wp-includes/images/smilies/icon_smile.gif" alt=":)" class="wp-smiley" /> &#8230; Digging further to the root cause, we pointed out to a specific pattern that seemed to be the root cause of our specific issue:</p>
<p><strong><br />
1.	Open transaction<br />
2.	Foreach row in a file execute an UPSERT statement<br />
3.	Commit transaction<br />
</strong></p>
<p>This is a <a href="https://www.red-gate.com/simple-talk/sql/t-sql-programming/rbar-row-by-agonizing-row/" rel="noopener" target="_blank">RBAR pattern</a> and it may become slow according the number of lines it has to deal with. In addition, the logic is encapsulated within a single transaction leading to accumulate locks during all the transaction duration. Thinking about it, we didn’t face the specific locking issue with other queries so far because they are executed within short transactions by design. </p>
<p>This point is important because enabling SQL Server audits implies also extra metadata locks. We decided to mimic this behavior on a TEST environment in order to figure out what happened exactly.</p>
<p>Here the scripts we used for that purpose:</p>
<p><strong>TSQL script:</strong></p>
<div class="codecolorer-container text default" style="overflow:auto;white-space:nowrap;border:1px solid #9F9F9F;width:650px;height:450px;"><div class="text codecolorer" style="padding:5px;font:normal 12px/1.4em Monaco, Lucida Console, monospace;white-space:nowrap">- Create audit<br />
USE [master]<br />
GO<br />
<br />
CREATE SERVER AUDIT [Audit-Target-Login]<br />
TO FILE <br />
( &nbsp; FILEPATH = N'/var/opt/mssql/log/'<br />
&nbsp; &nbsp; ,MAXSIZE = 0 MB<br />
&nbsp; &nbsp; ,MAX_ROLLOVER_FILES = 2147483647<br />
&nbsp; &nbsp; ,RESERVE_DISK_SPACE = OFF<br />
)<br />
WITH<br />
( &nbsp; QUEUE_DELAY = 1000<br />
&nbsp; &nbsp; ,ON_FAILURE = CONTINUE<br />
)<br />
WHERE (<br />
&nbsp; &nbsp; [server_principal_name] like '%\%' <br />
&nbsp; &nbsp; AND NOT [server_principal_name] like '%\svc%' <br />
&nbsp; &nbsp; AND NOT [server_principal_name] like 'NT SERVICE\%' <br />
&nbsp; &nbsp; AND NOT [server_principal_name] like 'NT AUTHORITY\%' <br />
&nbsp; &nbsp; AND NOT [server_principal_name] like '%XDCP%'<br />
);<br />
<br />
ALTER SERVER AUDIT [Audit-Target-Login] WITH (STATE = ON);<br />
GO<br />
<br />
CREATE SERVER AUDIT SPECIFICATION [Server-Audit-Target-Login]<br />
FOR SERVER AUDIT [Audit-Target-Login]<br />
ADD (FAILED_DATABASE_AUTHENTICATION_GROUP),<br />
ADD (SUCCESSFUL_DATABASE_AUTHENTICATION_GROUP),<br />
ADD (FAILED_LOGIN_GROUP),<br />
ADD (SUCCESSFUL_LOGIN_GROUP),<br />
ADD (LOGOUT_GROUP)<br />
WITH (STATE = ON)<br />
GO<br />
<br />
USE [DBA] <br />
GO <br />
<br />
-- Tables to simulate the scenario<br />
CREATE TABLE dbo.T ( <br />
&nbsp; &nbsp; id INT, <br />
&nbsp; &nbsp; col1 VARCHAR(50) <br />
);<br />
<br />
CREATE TABLE dbo.T2 ( <br />
&nbsp; &nbsp; id INT, <br />
&nbsp; &nbsp; col1 VARCHAR(50) <br />
); <br />
<br />
INSERT INTO dbo.T VALUES (1, REPLICATE('T',20));<br />
INSERT INTO dbo.T2 VALUES (1, REPLICATE('T',20));</div></div>
<p><strong>PowerShell scripts:</strong></p>
<p>Session 1: Simulating SQL pattern</p>
<div class="codecolorer-container text default" style="overflow:auto;white-space:nowrap;border:1px solid #9F9F9F;width:650px;height:450px;"><div class="text codecolorer" style="padding:5px;font:normal 12px/1.4em Monaco, Lucida Console, monospace;white-space:nowrap"># Scenario simulation &nbsp;<br />
$server ='127.0.0.1' <br />
$Database ='DBA' <br />
<br />
$Connection =New-Object System.Data.SQLClient.SQLConnection <br />
$Connection.ConnectionString = &quot;Server=$server;Initial Catalog=$Database;Integrated Security=false;User ID=sa;Password=P@SSw0rd1;Application Name=TESTLOCK&quot; <br />
$Connection.Open() <br />
<br />
$Command = New-Object System.Data.SQLClient.SQLCommand <br />
$Command.Connection = $Connection <br />
$Command.CommandTimeout = 500<br />
<br />
$sql = <br />
&quot; <br />
MERGE T AS T <br />
USING T2 AS S ON T.id = S.id <br />
WHEN MATCHED THEN UPDATE SET T.col1 = 'TT' <br />
WHEN NOT MATCHED THEN INSERT (col1) VALUES ('TT'); <br />
<br />
WAITFOR DELAY '00:00:03' &nbsp;<br />
&quot; &nbsp;<br />
<br />
#Begin Transaction <br />
$command.Transaction = $connection.BeginTransaction() <br />
<br />
# Simulate for each file =&amp;gt; Execute merge statement<br />
while(1 -eq 1){<br />
<br />
&nbsp; &nbsp; $Command.CommandText =$sql <br />
&nbsp; &nbsp; $Result =$Command.ExecuteNonQuery() <br />
<br />
}<br />
&nbsp; &nbsp; &nbsp;<br />
$command.Transaction.Commit() <br />
$Connection.Close()</div></div>
<p>Session 2: Simulating stopping / starting SQL Server audit for archiving purpose</p>
<div class="codecolorer-container text default" style="overflow:auto;white-space:nowrap;border:1px solid #9F9F9F;width:650px;"><div class="text codecolorer" style="padding:5px;font:normal 12px/1.4em Monaco, Lucida Console, monospace;white-space:nowrap">$creds = New-Object System.Management.Automation.PSCredential -ArgumentList ($user, $password)<br />
<br />
$Query = &quot;<br />
&nbsp; &nbsp; USE master;<br />
&nbsp; &nbsp; ALTER SERVER AUDIT [Audit-Target-Login]<br />
&nbsp; &nbsp; WITH ( STATE = OFF );<br />
<br />
&nbsp; &nbsp; ALTER SERVER AUDIT [Audit-Target-Login]<br />
&nbsp; &nbsp; WITH ( STATE = ON );<br />
&quot;<br />
<br />
Invoke-DbaQuery `<br />
&nbsp; &nbsp; -SqlInstance $server `<br />
&nbsp; &nbsp; -Database $Database `<br />
&nbsp; &nbsp; -SqlCredential $creds `<br />
&nbsp; &nbsp; -Query $Query</div></div>
<p>First, we wanted to get a comprehensive picture of locks acquired during the execution of this specific SQL pattern by with an extended event session and lock_acquired event as follows:</p>
<div class="codecolorer-container text default" style="overflow:auto;white-space:nowrap;border:1px solid #9F9F9F;width:650px;height:450px;"><div class="text codecolorer" style="padding:5px;font:normal 12px/1.4em Monaco, Lucida Console, monospace;white-space:nowrap">CREATE EVENT SESSION [locks] <br />
ON SERVER <br />
ADD EVENT sqlserver.lock_acquired<br />
(<br />
&nbsp; &nbsp; ACTION(sqlserver.client_app_name,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;sqlserver.session_id,<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;sqlserver.transaction_id)<br />
&nbsp; &nbsp; WHERE ([sqlserver].[client_app_name]=N'TESTLOCK'))<br />
ADD TARGET package0.histogram<br />
(<br />
&nbsp; &nbsp; SET filtering_event_name=N'sqlserver.lock_acquired',<br />
&nbsp; &nbsp; source=N'resource_type',source_type=(0)<br />
)<br />
WITH <br />
(<br />
&nbsp; &nbsp; MAX_MEMORY=4096 KB,<br />
&nbsp; &nbsp; EVENT_RETENTION_MODE=ALLOW_SINGLE_EVENT_LOSS,<br />
&nbsp; &nbsp; MAX_DISPATCH_LATENCY=30 SECONDS,<br />
&nbsp; &nbsp; MAX_EVENT_SIZE=0 KB,<br />
&nbsp; &nbsp; MEMORY_PARTITION_MODE=NONE,<br />
&nbsp; &nbsp; TRACK_CAUSALITY=OFF,<br />
&nbsp; &nbsp; STARTUP_STATE=OFF<br />
)<br />
GO</div></div>
<p>Here the output we got after running the first PowerShell session:</p>
<p><a href="http://blog.developpez.com/mikedavem/files/2020/10/167-2-xe-lock-output.jpg"><img src="http://blog.developpez.com/mikedavem/files/2020/10/167-2-xe-lock-output.jpg" alt="167 - 2 - xe lock output" width="327" height="158" class="alignnone size-full wp-image-1676" /></a></p>
<p>We confirm METADATA locks in addition to usual locks acquired to the concerned structures. We correlated this output with sp_WhoIsActive (and @get_locks = 1) after running the second PowerShell session. Let’s precise that you may likely have to run the 2nd query several times to reproduce the initial issue.  </p>
<p>Here a picture of locks respectively acquired by session 1 and in waiting state by session 2:</p>
<p><a href="http://blog.developpez.com/mikedavem/files/2020/10/167-3-sp_WhoIsActiveGetLocks-e1601925071999.jpg"><img src="http://blog.developpez.com/mikedavem/files/2020/10/167-3-sp_WhoIsActiveGetLocks-e1601925071999.jpg" alt="167 - 3 - sp_WhoIsActiveGetLocks" width="800" height="344" class="alignnone size-full wp-image-1677" /></a></p>
<p>&#8230;</p>
<p><a href="http://blog.developpez.com/mikedavem/files/2020/10/167-4-sp_WhoIsActiveGetLocks2-e1601925104990.jpg"><img src="http://blog.developpez.com/mikedavem/files/2020/10/167-4-sp_WhoIsActiveGetLocks2-e1601925104990.jpg" alt="167 - 4 - sp_WhoIsActiveGetLocks2" width="800" height="122" class="alignnone size-full wp-image-1678" /></a></p>
<p>We may identify clearly metadata locks acquired on the SQL Server audit itself (METDATA.AUDIT_ACTIONS with Sch-S) and the second query with ALTER SERVER AUDIT … WITH (STATE = OFF) statement that is waiting on the same resource (Sch-M). Unfortunately, my google Fu didn’t provide any relevant information on this topic excepted the documentation related to <a href="https://docs.microsoft.com/en-us/sql/relational-databases/system-dynamic-management-views/sys-dm-tran-locks-transact-sql?view=sql-server-ver15" rel="noopener" target="_blank">sys.dm_tran_locks</a> DMV. My guess is writing events to audits requires a stable the underlying infrastructure and SQL Server needs to protect concerned components (with Sch-S) against concurrent modifications (Sch-M). Anyway, it is easy to figure out that subsequent queries could be blocked (with incompatible Sch-S on the audit resource) while the previous ones are running.  </p>
<p>The query pattern exposed previously (unlike short transactions) is a good catalyst for such blocking scenario due to the accumulation and duration of locks within one single transaction. It may be confirmed by the XE’s output:</p>
<p><a href="http://blog.developpez.com/mikedavem/files/2020/10/167-5-lock_sch_s_same_transaction-e1601925276612.jpg"><img src="http://blog.developpez.com/mikedavem/files/2020/10/167-5-lock_sch_s_same_transaction-e1601925276612.jpg" alt="167 - 5 - lock_sch_s_same_transaction" width="800" height="543" class="alignnone size-full wp-image-1681" /></a></p>
<p>We managed to get a reproductible scenario with TSQL and PowerShell scripts. In addition, I also ran queries from other databases to confirm it may compromise responsiveness of the entire workload on the same instance (respectively DBA3 and DBA4 databases in my test). </p>
<p><a href="http://blog.developpez.com/mikedavem/files/2020/10/167-6-lock_tree-e1601925310889.jpg"><img src="http://blog.developpez.com/mikedavem/files/2020/10/167-6-lock_tree-e1601925310889.jpg" alt="167 - 6 - lock_tree" width="800" height="78" class="alignnone size-full wp-image-1682" /></a></p>
<p><strong>How we fixed this issue?</strong></p>
<p>Even it is only one part of the solution, I’m a strong believer this pattern remains a performance killer and using a set-bases approach may help to reduce drastically number and duration of locks and implicitly chances to make this blocking scenario happen again. Let&rsquo;s precise it is not only about MERGE statement because I managed to reproduce the same issue with INSERT and UPDATE statements as well.</p>
<p>Then, this scenario really made us think about a long-term solution because we cannot guarantee this pattern will not be used by other teams in the future. Looking further at the PowerShell script which carries out steps of archiving the audit file and inserting data to the audit database, we finally added a QueryTimeout parameter value to 10s to the concerned Invoke-DbaQuery command as follows:</p>
<div class="codecolorer-container text default" style="overflow:auto;white-space:nowrap;border:1px solid #9F9F9F;width:650px;height:450px;"><div class="text codecolorer" style="padding:5px;font:normal 12px/1.4em Monaco, Lucida Console, monospace;white-space:nowrap">...<br />
<br />
$query = &quot;<br />
&nbsp; &nbsp; USE [master];<br />
<br />
&nbsp; &nbsp; IF EXISTS (SELECT 1<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; FROM &nbsp;sys.dm_server_audit_status<br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; WHERE [name] = '$InstanceAuditPrefix-$AuditName')<br />
&nbsp; &nbsp; BEGIN<br />
&nbsp; &nbsp; &nbsp; &nbsp; ALTER SERVER AUDIT [$InstanceAuditPrefix-$AuditName]<br />
&nbsp; &nbsp; &nbsp; &nbsp; WITH (STATE = OFF);<br />
&nbsp; &nbsp; END<br />
<br />
&nbsp; &nbsp; ALTER SERVER AUDIT [$InstanceAuditPrefix-$AuditName]<br />
&nbsp; &nbsp; WITH (STATE = ON);<br />
&quot;<br />
<br />
Invoke-DbaQuery `<br />
&nbsp; &nbsp; -SqlInstance $Instance `<br />
&nbsp; &nbsp; -SqlCredential $SqlCredential `<br />
&nbsp; &nbsp; -Database master `<br />
&nbsp; &nbsp; -Query $query `<br />
&nbsp; &nbsp; -EnableException `<br />
&nbsp; &nbsp; -QueryTimeout 5 <br />
<br />
...</div></div>
<p>Therefore, because we want to prioritize the business workload over the SQL Server audit operation, if such situation occurs again, stopping the SQL Server audit will timeout after reaching 5s which was relevant in our context. The next iteration of the PowerShell is able to restart at the last stage executed previously. </p>
<p>Hope this blog post helps.</p>
<p>See you!</p>
]]></content:encoded>
			<wfw:commentRss></wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>dbachecks and AlwaysOn availability group checks</title>
		<link>https://blog.developpez.com/mikedavem/p13194/sql-server-2012/dbachecks-and-alwayson-availability-group-checks</link>
		<comments>https://blog.developpez.com/mikedavem/p13194/sql-server-2012/dbachecks-and-alwayson-availability-group-checks#comments</comments>
		<pubDate>Mon, 20 Apr 2020 19:57:31 +0000</pubDate>
		<dc:creator><![CDATA[mikedavem]]></dc:creator>
				<category><![CDATA[DevOps]]></category>
		<category><![CDATA[SQL Server 2012]]></category>
		<category><![CDATA[SQL Server 2014]]></category>
		<category><![CDATA[SQL Server 2016]]></category>
		<category><![CDATA[SQL Server 2017]]></category>
		<category><![CDATA[SQL Server 2019]]></category>
		<category><![CDATA[automation]]></category>
		<category><![CDATA[dbachecks]]></category>
		<category><![CDATA[dbatools]]></category>
		<category><![CDATA[monitoring]]></category>
		<category><![CDATA[open source]]></category>
		<category><![CDATA[Powershell]]></category>
		<category><![CDATA[sqlserver]]></category>

		<guid isPermaLink="false">http://blog.developpez.com/mikedavem/?p=1591</guid>
		<description><![CDATA[When I started my DBA position in my new company, I was looking for a tool that was able to check periodically the SQL Server database environments for several reasons. First, as DBA one of my main concern is about &#8230; <a href="https://blog.developpez.com/mikedavem/p13194/sql-server-2012/dbachecks-and-alwayson-availability-group-checks">Lire la suite <span class="meta-nav">&#8594;</span></a>]]></description>
				<content:encoded><![CDATA[<p>When I started my DBA position in my new company, I was looking for a tool that was able to check periodically the SQL Server database environments for several reasons. First, as DBA one of my main concern is about maintaining and keeping the different mssql environments well-configured against an initial standard. It is also worth noting I’m not the only person to interact with databases and anyone in my team, which is member of sysadmin server role as well, is able to change any server-level configuration settings at any moment. In this case, chances are that having environments shifting from our initial standard over the time and my team and I need to keep confident by checking periodically the current mssql environment configurations, be alerting if configuration drifts exist and obviously fix it as faster as possible.  </p>
<p><span id="more-1591"></span></p>
<p>A while ago, I relied on SQL Server Policy Based Management feature (PBM) to carry out this task at one of my former customers and I have to say it did the job but with some limitations. Indeed, PBM is the instance-scope feature and doesn’t allow to check configuration settings outside the SQL Server instance for example. During my investigation, <a href="https://dbachecks.readthedocs.io/en/latest/" rel="noopener" target="_blank">dbachecks</a> framework drew my attention for several reasons:</p>
<p>&#8211;	It allows to check different settings at different scopes including Operating System and SQL Server instance items<br />
&#8211;	It is an open source project and keeps evolving with SQL / PowerShell community contributions.<br />
&#8211;	It is extensible, and we may include custom checks to the list of predefined checks shipped with the targeted version.<br />
&#8211;	It is based on PowerShell, Pester framework and fits well with existing automation and GitOps process in my company</p>
<p><a href="http://blog.developpez.com/mikedavem/files/2020/04/161-0-dbachecks-process.jpg"><img src="http://blog.developpez.com/mikedavem/files/2020/04/161-0-dbachecks-process.jpg" alt="161 - 0 - dbachecks process" width="1003" height="395" class="alignnone size-full wp-image-1592" /></a></p>
<p>The first dbacheck version we deployed in production a couple of month ago was 1.2.24 and unfortunately it didn’t include reliable tests for availability groups. It was the starting point of my first contributions to open source projects and I felt proud and got honored when I noticed my 2 PRs validated for the dbacheck tool including Test Disk Allocation Unit and Availability Group checks:</p>
<p><a href="http://blog.developpez.com/mikedavem/files/2020/04/161-1-release-note.jpg"><img src="http://blog.developpez.com/mikedavem/files/2020/04/161-1-release-note.jpg" alt="161 - 1 - release note" width="798" height="348" class="alignnone size-full wp-image-1593" /></a></p>
<p>Obviously, this is just an humble contribution and to be clear, I didn’t write the existing tests for AGs but I spent some times to apply fixes for a better detection of all AG environments including their replicas in a simple and complex topologies (several replicas on the same server and non-default ports for example). </p>
<p>So, here the current list of AG checks in the version 1.2.29 at the moment of this write-up:</p>
<p>&#8211;	Cluster node should be up<br />
&#8211;	AG resource + IP Address in the cluster should be online<br />
&#8211;	Cluster private and public network should be up<br />
&#8211;	HADR should be enabled on each AG replica<br />
&#8211;	AG Listener + AG replicas should be pingable and reachable from client connections<br />
&#8211;	AG replica should be in the correct domain name<br />
&#8211;	AG replica port number should be equal to the port specified in your standard<br />
&#8211;	AG availability mode should not be in unknown state and should be in synchronized or synchronizing state regarding the replication type<br />
&#8211;	Each high available database (member of an AG) should be in synchronized / synchronizing state, ready for failover, joined to the AG and not in suspended state<br />
&#8211;	Each AG replica should have an extended event session called AlwaysOn_health which is in running state and configured in auto start mode</p>
<p>Mandatory parameters are <strong>app.cluster</strong> and <strong>domain.name</strong>.</p>
<div class="codecolorer-container text default" style="overflow:auto;white-space:nowrap;border:1px solid #9F9F9F;width:650px;"><div class="text codecolorer" style="padding:5px;font:normal 12px/1.4em Monaco, Lucida Console, monospace;white-space:nowrap">Get-DbcCheck -Tag HADR | ft Group, Type, AllTags, Config -AutoSize<br />
<br />
Group Type &nbsp; &nbsp; &nbsp; &nbsp;AllTags &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; Config<br />
----- ---- &nbsp; &nbsp; &nbsp; &nbsp;------- &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; ------<br />
HADR &nbsp;ClusterNode ClusterHealth, HADR app.sqlinstance app.cluster skip.hadr.listener.pingcheck domain.name policy...</div></div>
<p>The starting point of the HADR checks is the Windows Failover Cluster component and hierarchically other tests are performed on each sub component including availability group, AG replicas and AG databases. </p>
<p>Then you may change the behavior on the HADR check process according to your context by using the following parameters:</p>
<p>&#8211;	skip.hadr.listener.pingcheck =&gt; Skip ping check of hadr listener<br />
&#8211;	skip.hadr.listener.tcpport   =&gt; Skip check of standard tcp port about  AG listerners<br />
&#8211;	skip.hadr.replica.tcpport    =&gt; Skip check of standard tcp port about AG replicas</p>
<p>For instance, in my context, I configured the <strong>hadr.replica.tcpport</strong> parameter to skip checks on replica ports because we own different environments that including several replicas on the same server and which listen on a non-default port.</p>
<div class="codecolorer-container text default" style="overflow:auto;white-space:nowrap;border:1px solid #9F9F9F;width:650px;"><div class="text codecolorer" style="padding:5px;font:normal 12px/1.4em Monaco, Lucida Console, monospace;white-space:nowrap">Get-DbcConfig skip.hadr.*<br />
Name &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; Value Description<br />
---- &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; ----- -----------<br />
skip.hadr.listener.pingcheck False Skip the HADR listener ping test (especially useful for Azure and AWS)<br />
skip.hadr.listener.tcpport &nbsp; False Skip the HADR AG Listener TCP port number (If port number is not standard acro...<br />
skip.hadr.replica.tcpport &nbsp; &nbsp; True Skip the HADR Replica TCP port number (If port number is not standard across t...</div></div>
<p>Running the HADR check can be simply run by using HADR tag as follows:</p>
<div class="codecolorer-container text default" style="overflow:auto;white-space:nowrap;border:1px solid #9F9F9F;width:650px;"><div class="text codecolorer" style="padding:5px;font:normal 12px/1.4em Monaco, Lucida Console, monospace;white-space:nowrap">Invoke-DbcCheck -Tag HADR &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;<br />
Pester v4.10.1 &nbsp;Executing all tests in 'C:\Program Files\WindowsPowerShell\Modules\dbachecks\1.2.29\checks\HADR.Tests.ps1' with Tags HADR &nbsp; <br />
...</div></div>
<p><a href="http://blog.developpez.com/mikedavem/files/2020/04/161-2-hadr-checks-e1587413256656.jpg"><img src="http://blog.developpez.com/mikedavem/files/2020/04/161-2-hadr-checks-e1587413256656.jpg" alt="161 - 2 - hadr checks" width="1000" height="426" class="alignnone size-full wp-image-1599" /></a>                               </p>
<p>Well, this a good start but I think some almost of checks are state-oriented and some configuration checks are missing. I’m already willing to add some of them in a near the future or/and feel free to add your own contribution as well <img src="https://blog.developpez.com/mikedavem/wp-includes/images/smilies/icon_smile.gif" alt=":)" class="wp-smiley" /> </p>
<p>Stay tuned! </p>
]]></content:encoded>
			<wfw:commentRss></wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
	</channel>
</rss>
