SQL Server service broker and curious duplicate messages case

One of my last write-up to finish the year 2019. A couple of days ago, I ran into an interesting issue concerning a service broker architecture. Let’s introduce quickly the context: this is a start-based architecture with one target and more than 80 initiators as shown below:

152 - 0 - SB architecture

All messages are sent from the initiators to a single central target. Windows authentication for endpoint’s connection and messages are encrypted before to be sent to the remote service by using certificates mapped to database users.

152 - 1 - SB architecture

Installing such infrastructure is not trivial and during this year, we often had to deploy and reinitialize service broker configurations because we added new initiators, or some initiators were replaced by other ones. Therefore, on the DBA side we worked on an automatic way to interact with service broker configuration and it turns out that the natural way for us was to write a PowerShell module to address it.

A couple of days, for a sake of curiosity I implemented a service broker event based XE session and I noticed a bunch of broker_message_undeliverable messages with the reason: The message could not be delivered because it is a duplicate. See below:

152 - 2 - xe result

What does it mean exactly? Well, according to my Google-fu and very-explained blog from Remus Rusanu, it seems that the messages are accepted by the target, but the acks don’t reach back to the sender and therefore the sender is retrying the message again and again. But in my case, the sender ended up managing sending back the acknowledgement.

First step was to identify a misconfigured route between the target and the corresponding initiator. From broker instance value corresponds to the service broker id of the initiator. But surprisingly I didn’t find any misconfigured item. But I remembered the campaign of server rollout that had been going on for a few months where some servers were renamed following a new named convention. Furthermore, in our deployment process, dealing with new or renamed servers is exactly the same and we just initialize a new initiator and install a new route on the target regardless the scenario. But we forget one important point: because all routes are based on server name renaming a server leads to potentially to duplicate routes in this case. I verified my assumption by counting the number of existing routes on the target site:

SELECT COUNT(*) FROM sys.routes

The result was 148 and we expect to get 85 routes in the current architecture confirming we have duplicate items here. Next step consisted in identifying and fixing duplicate routes:

152 - 30 - duplicate routes query

Here a sample of duplicate routes I founded. The renaming server kept its service broker id and I easily identified duplicates by grouping by the broker instance.

152 - 3 - duplicate routes

After fixing the issue, messages related to duplicates disappear from the XE output.
Hope this helps!

David Barbarin

Laisser un commentaire