Graphing SQL Server wait stats on Prometheus and Grafana

Mis en avant

Wait stats are essential performance metrics for diagnosing SQL Server Performance problems. Related metrics can be monitored from different DMVs including sys.dm_os_wait_stats and sys.dm_db_wait_stats (Azure).

As you probably know, there are 2 categories of DMVs in SQL Server: Point in time versus cumulative and DMVs mentioned previously are in the second category. It means data in these DMVs are accumulative and incremented every time wait events occur. Values reset only when SQL Server restarts or when you intentionally run DBCC SQLPERF command. Baselining these metric values require taking snapshots to compare day-to-day activity or maybe simply trends for a given timeline. Paul Randal kindly provided a TSQL script for trend analysis in a specified time range in this blog post. The interesting part of this script is the focus of most relevant wait types and corresponding statistics. This is basically the kind of scripts I used for many years when I performed SQL Server audits at customer shops but today working as database administrator for a company, I can rely on our observability stack that includes Telegraf / Prometheus and Grafana to do the job.

Lire la suite

FinOps with Azure Cost management and Azure Log Analytics

Mis en avant

In a previous blog post, I surfaced Azure monitor capabilities for extending observability of Azure SQL databases. We managed to correlate different metrics and SQL logs to identify new execution patterns against our Azure SQL DB, and we finally go through a new compute tier model that fits better with our new context. In this blog post, I would like to share some new experiences about combining Azure cost analysis and Azure log analytics to spot “abnormal” trend and to fix it.

Lire la suite

Creating dynamic Grafana dashboard for SQL Server

Mis en avant

A couple of months ago I wrote about “Why we moved SQL Server monitoring to Prometheus and Grafana”. I talked about the creation of two dashboards. The first one is blackbox monitoring-oriented and aims to spot in (near) real-time resource pressure / saturation issues with self-explained gauges, numbers and colors indicating healthy (green) or unhealthy resources (orange / red). We also include availability group synchronization health metric in the dashboard. We will focus on it in this write-up.

Lire la suite

Extending SQL Server monitoring with Raspberry PI and Lametric

Mis en avant

First blog of this new year 2021 and I will start with a fancy and How-To Geek topic

In my last blog post, I discussed about monitoring and how it should help to address quickly a situation that is going degrading. Alerts are probably the first way to raise your attention and, in my case, they are often in the form of emails in a dedicated folder. That remains a good thing, at least if you’re not focusing too long in other daily tasks or projects. In work office, I know I would probably better focus on new alerts but as I said previously, telework changed definitely the game.

Lire la suite

Why we moved SQL Server monitoring on Prometheus and Grafana

Mis en avant

During this year, I spent a part of my job on understanding the processes and concepts around monitoring in my company. The DevOps mindset mainly drove the idea to move our SQL Server monitoring to the existing Prometheus and Grafana infrastructure. Obviously, there were some technical decisions behind the scene, but the most important part of this write-up is dedicated to explaining other and likely most important reasons of this decision.

Lire la suite

AAD user creation on behalf AAD Service Principal with Azure SQL DB

Mis en avant

An interesting improvement was announced by the SQL AAD team on Monday 27th July 2020 and concerns the support for Azure AD user creation on behalf of Azure AD Applications for Azure SQL as mentioned to this Microsoft blog post.

Lire la suite

dbachecks and AlwaysOn availability group checks

Mis en avant

When I started my DBA position in my new company, I was looking for a tool that was able to check periodically the SQL Server database environments for several reasons. First, as DBA one of my main concern is about maintaining and keeping the different mssql environments well-configured against an initial standard. It is also worth noting I’m not the only person to interact with databases and anyone in my team, which is member of sysadmin server role as well, is able to change any server-level configuration settings at any moment. In this case, chances are that having environments shifting from our initial standard over the time and my team and I need to keep confident by checking periodically the current mssql environment configurations, be alerting if configuration drifts exist and obviously fix it as faster as possible.

Lire la suite

Database maintenance thoughts with Azure SQL databases

Mis en avant

As DBA, your priority is to ensure your data are consistent, safely backed up and you get steady performance of your database. In on-prem environments, these tasks are generally performed through scheduled jobs including backups, check integrity and index / statistics maintenance tasks.

But moving databases to the cloud in Azure (and others) tells a different story. Indeed, even if the same concern and tasks remain, some of them are under the responsibility of the Cloud provider and some other ones not. If you’re working with Azure SQL databases – like me – some questions raise very quickly on this topic and it was my motivation to write this write-up. I would like to share with you some new experiences by digging into the different maintenance items. If you have a different story to tell, please feel free to comment and to share your own experience!

Lire la suite

Collaborative way and tooling to debug SQL Server blocked processes scenarios

Mis en avant

A quick blog post to show how helpful an extended event and few other tools can be to help fixing orphan transactions in a real use case scenario. I often gave training with customers about SQL Server performance and tools, but I noticed how difficult it can be to explain the importance of a tool if you only explain theory without any illustration with a real customer case.
Well, let’s start my own story that began a couple of days ago with an SQL alert to indicate a blocking scenario issue. Looking at our SQL dashboard (below), we were able to confirm quickly we are running into an annoying issue and it would be getting worse over if we do nothing.

Lire la suite