Graphing SQL Server wait stats on Prometheus and Grafana

Mis en avant

Wait stats are essential performance metrics for diagnosing SQL Server Performance problems. Related metrics can be monitored from different DMVs including sys.dm_os_wait_stats and sys.dm_db_wait_stats (Azure).

As you probably know, there are 2 categories of DMVs in SQL Server: Point in time versus cumulative and DMVs mentioned previously are in the second category. It means data in these DMVs are accumulative and incremented every time wait events occur. Values reset only when SQL Server restarts or when you intentionally run DBCC SQLPERF command. Baselining these metric values require taking snapshots to compare day-to-day activity or maybe simply trends for a given timeline. Paul Randal kindly provided a TSQL script for trend analysis in a specified time range in this blog post. The interesting part of this script is the focus of most relevant wait types and corresponding statistics. This is basically the kind of scripts I used for many years when I performed SQL Server audits at customer shops but today working as database administrator for a company, I can rely on our observability stack that includes Telegraf / Prometheus and Grafana to do the job.

Lire la suite

Creating dynamic Grafana dashboard for SQL Server

Mis en avant

A couple of months ago I wrote about “Why we moved SQL Server monitoring to Prometheus and Grafana”. I talked about the creation of two dashboards. The first one is blackbox monitoring-oriented and aims to spot in (near) real-time resource pressure / saturation issues with self-explained gauges, numbers and colors indicating healthy (green) or unhealthy resources (orange / red). We also include availability group synchronization health metric in the dashboard. We will focus on it in this write-up.

Lire la suite

Why we moved SQL Server monitoring on Prometheus and Grafana

Mis en avant

During this year, I spent a part of my job on understanding the processes and concepts around monitoring in my company. The DevOps mindset mainly drove the idea to move our SQL Server monitoring to the existing Prometheus and Grafana infrastructure. Obviously, there were some technical decisions behind the scene, but the most important part of this write-up is dedicated to explaining other and likely most important reasons of this decision.

Lire la suite