dbachecks and AlwaysOn availability group checks

When I started my DBA position in my new company, I was looking for a tool that was able to check periodically the SQL Server database environments for several reasons. First, as DBA one of my main concern is about maintaining and keeping the different mssql environments well-configured against an initial standard. It is also worth noting I’m not the only person to interact with databases and anyone in my team, which is member of sysadmin server role as well, is able to change any server-level configuration settings at any moment. In this case, chances are that having environments shifting from our initial standard over the time and my team and I need to keep confident by checking periodically the current mssql environment configurations, be alerting if configuration drifts exist and obviously fix it as faster as possible.

A while ago, I relied on SQL Server Policy Based Management feature (PBM) to carry out this task at one of my former customers and I have to say it did the job but with some limitations. Indeed, PBM is the instance-scope feature and doesn’t allow to check configuration settings outside the SQL Server instance for example. During my investigation, dbachecks framework drew my attention for several reasons:

– It allows to check different settings at different scopes including Operating System and SQL Server instance items
– It is an open source project and keeps evolving with SQL / PowerShell community contributions.
– It is extensible, and we may include custom checks to the list of predefined checks shipped with the targeted version.
– It is based on PowerShell, Pester framework and fits well with existing automation and GitOps process in my company

161 - 0 - dbachecks process

The first dbacheck version we deployed in production a couple of month ago was 1.2.24 and unfortunately it didn’t include reliable tests for availability groups. It was the starting point of my first contributions to open source projects and I felt proud and got honored when I noticed my 2 PRs validated for the dbacheck tool including Test Disk Allocation Unit and Availability Group checks:

161 - 1 - release note

Obviously, this is just an humble contribution and to be clear, I didn’t write the existing tests for AGs but I spent some times to apply fixes for a better detection of all AG environments including their replicas in a simple and complex topologies (several replicas on the same server and non-default ports for example).

So, here the current list of AG checks in the version 1.2.29 at the moment of this write-up:

– Cluster node should be up
– AG resource + IP Address in the cluster should be online
– Cluster private and public network should be up
– HADR should be enabled on each AG replica
– AG Listener + AG replicas should be pingable and reachable from client connections
– AG replica should be in the correct domain name
– AG replica port number should be equal to the port specified in your standard
– AG availability mode should not be in unknown state and should be in synchronized or synchronizing state regarding the replication type
– Each high available database (member of an AG) should be in synchronized / synchronizing state, ready for failover, joined to the AG and not in suspended state
– Each AG replica should have an extended event session called AlwaysOn_health which is in running state and configured in auto start mode

Mandatory parameters are app.cluster and domain.name.

Get-DbcCheck -Tag HADR | ft Group, Type, AllTags, Config -AutoSize

Group Type        AllTags             Config
----- ----        -------             ------
HADR  ClusterNode ClusterHealth, HADR app.sqlinstance app.cluster skip.hadr.listener.pingcheck domain.name policy...

The starting point of the HADR checks is the Windows Failover Cluster component and hierarchically other tests are performed on each sub component including availability group, AG replicas and AG databases.

Then you may change the behavior on the HADR check process according to your context by using the following parameters:

– skip.hadr.listener.pingcheck => Skip ping check of hadr listener
– skip.hadr.listener.tcpport => Skip check of standard tcp port about AG listerners
– skip.hadr.replica.tcpport => Skip check of standard tcp port about AG replicas

For instance, in my context, I configured the hadr.replica.tcpport parameter to skip checks on replica ports because we own different environments that including several replicas on the same server and which listen on a non-default port.

Get-DbcConfig skip.hadr.*
Name                         Value Description
----                         ----- -----------
skip.hadr.listener.pingcheck False Skip the HADR listener ping test (especially useful for Azure and AWS)
skip.hadr.listener.tcpport   False Skip the HADR AG Listener TCP port number (If port number is not standard acro...
skip.hadr.replica.tcpport     True Skip the HADR Replica TCP port number (If port number is not standard across t...

Running the HADR check can be simply run by using HADR tag as follows:

Invoke-DbcCheck -Tag HADR                                                                    
Pester v4.10.1  Executing all tests in 'C:\Program Files\WindowsPowerShell\Modules\dbachecks\1.2.29\checks\HADR.Tests.ps1' with Tags HADR  

161 - 2 - hadr checks

Well, this a good start but I think some almost of checks are state-oriented and some configuration checks are missing. I’m already willing to add some of them in a near the future or/and feel free to add your own contribution as well :)

Stay tuned!

Laisser un commentaire