Graphing SQL Server wait stats on Prometheus and Grafana

Mis en avant

Wait stats are essential performance metrics for diagnosing SQL Server Performance problems. Related metrics can be monitored from different DMVs including sys.dm_os_wait_stats and sys.dm_db_wait_stats (Azure).

As you probably know, there are 2 categories of DMVs in SQL Server: Point in time versus cumulative and DMVs mentioned previously are in the second category. It means data in these DMVs are accumulative and incremented every time wait events occur. Values reset only when SQL Server restarts or when you intentionally run DBCC SQLPERF command. Baselining these metric values require taking snapshots to compare day-to-day activity or maybe simply trends for a given timeline. Paul Randal kindly provided a TSQL script for trend analysis in a specified time range in this blog post. The interesting part of this script is the focus of most relevant wait types and corresponding statistics. This is basically the kind of scripts I used for many years when I performed SQL Server audits at customer shops but today working as database administrator for a company, I can rely on our observability stack that includes Telegraf / Prometheus and Grafana to do the job.

Lire la suite

Why we moved SQL Server monitoring on Prometheus and Grafana

Mis en avant

During this year, I spent a part of my job on understanding the processes and concepts around monitoring in my company. The DevOps mindset mainly drove the idea to move our SQL Server monitoring to the existing Prometheus and Grafana infrastructure. Obviously, there were some technical decisions behind the scene, but the most important part of this write-up is dedicated to explaining other and likely most important reasons of this decision.

Lire la suite

Interesting use case of using dummy columnstore indexes and temp tables

Mis en avant

Columnstore indexes are a very nice feature and well-suited for analytics queries. Using them for our datawarehouse helped to accelerate some big ETL processing and to reduce resource footprint such as CPU, IO and memory as well. In addition, SQL Server 2016 takes columnstore index to a new level and allows a fully updateable non-clustered columnstore index on a rowstore table making possible operational and analytics workloads. Non-clustered columnstore index are a different beast to manage with OLTP workload and we got both good and bad experiences on it. In this blog post, let’s talk about good effects and an interesting case where we use them for reducing CPU consumption of a big reporting query.

Lire la suite

Universal usage of NVARCHAR type and performance impact

Mis en avant

A couple of weeks, I read an article from Brent Ozar about using NVARCHAR as a universal parameter. It was a good reminder and from my experience, I confirm this habit has never been a good idea. Although it depends on the context, chances are you will almost find an exception that proves the rule.

Lire la suite

SQL Server on Linux and new FUA support for XFS filesystem

Mis en avant

I wrote a (dbi services) blog post concerning Linux and SQL Server IO behavior changes before and after SQL Server 2017 CU6. Now, I was looking forward seeing some new improvements with Force Unit Access (FUA) that was implemented with Linux XFS enhancements since the Linux Kernel 4.18.

Lire la suite

Mitigating Scalar UDF’s procedural code performance with SQL 2019 and Scalar UDF Inlining capabilities

Mis en avant

A couple of days ago, I read the write-up of my former colleague @FranckPachot about refactoring procedural code to SQL. This is recurrent subject in the database world and I was interested in transposing this article to SQL Server because it was about refactoring a Scalar-Valued function to a SQL view. The latter one is a great alternative when it comes performance but something new was shipped with SQL Server 2019 and could address (or at least could mitigate) this recurrent scenario.

Lire la suite

Collaborative way and tooling to debug SQL Server blocked processes scenarios

Mis en avant

A quick blog post to show how helpful an extended event and few other tools can be to help fixing orphan transactions in a real use case scenario. I often gave training with customers about SQL Server performance and tools, but I noticed how difficult it can be to explain the importance of a tool if you only explain theory without any illustration with a real customer case.
Well, let’s start my own story that began a couple of days ago with an SQL alert to indicate a blocking scenario issue. Looking at our SQL dashboard (below), we were able to confirm quickly we are running into an annoying issue and it would be getting worse over if we do nothing.

Lire la suite