Error and Transaction Handling in SQL Server
Part Three – Implementation

An SQL text by Erland Sommarskog, SQL Server MVP. Latest revision: 2024-08-04.
Copyright applies to this text. See here for font conventions used in this article.

Introduction

This article is the third and last in a series of articles about error and transaction handling in SQL Server. Part One was a short article which gave you a jumpstart on how to implement error handling without going into the details on why. But details were exactly the topic of Part Two: this article described error- and transaction-related commands and functions in SQL Server. It also covered what actions SQL Server can take in case of an error, and you learnt that there is a quite some variation and inconsistency. In Part Three, we go back to the theme of implementing error handling and equipped with the knowledge from Part Two, this time we not only look at how, but also at why.

The first chapter after this introduction is a philosophical discussion on error handling, where I start to talk in general terms and then narrow in on SQL Server. In the next chapter I present SqlEventLog, a versatile error-logging and error-handling facility which you can use in your own code. This chapter is a tad long, since not only do I present the interface, but I also discuss the implementation, since you may want to adapt it to for you environment. In chapter four, I cover some common patterns in error handling. In chapter five, I take this one step further with a case study of an application scenario in order to show some useful techniques. The case study expands into Appendix 3 where I use the same scenario to discuss error and transaction handling with Service Broker.

Chapter six covers error handling in triggers, and what considerations you should (not) make in a trigger. Chapters seven and eight covers error handling with administrative command like BACKUP/RESTORE and DDL respectively and I introduce the CMD Catch Handler, an extension of SqlEventLog that attempts to cover the particular problems with error handling for these commands. The last chapter gives a very brief discussion of error handling in client code.

This Part is a couple of pages longer than Part Two, but, I hope, less confusing. The length is to a great extent due to the generous code samples, which I hope you will find useful. Please beware that I assume in this article you have read the first two parts, so if you arrived here directly from the web, I recommend you to read Parts One and Two first. Beside these three main parts, there are three appendixes that cover special areas in SQL Server: linked servers, the CLR and Service Broker.

Table of Contents

Introduction

Index of All Error-Handling Articles

Strategies and Principles

A General Philosophy on Error Handling

An Ideal Handling of Transactions?

Throw Your Ideals Away – This is SQL Server

Raising Your Own Errors

SqlEventLog – A Versatile Solution for Error Handling

Installing SqlEventLog

The slog schema and the slog.sqleventlog Table

The Procedure slog.sqleventlog_sp

The Procedure slog.catchhandler_sp

Localisation

Internals of SqlEventLog

Doing the Loopback

Doing the Loopback with a Linked Server

SqlEventLog on Azure SQL Database

Permissions

Conclusion

Common Patterns for Error Handling

The Standard Routine

Handling Anticipated Errors

Raising Your Own Errors

Retry Logic

Error Handling with Dynamic SQL

Case Study: the Simple Order Database

Overview

The Basic Tables

The Interface Procedures

The Validation Procedures

The Imported XML File

The Import-File Tables

The Procedure AddOrderImportFile

The Procedure ProcessImportFiles

The Procedure ProcessOneImportFile

Loops vs. Set-Based

Testing It Out

Service Broker

Error and Transaction Handling in Triggers

Always in a Transaction

XACT_ABORT On by Default

Ending Transactions in Triggers

Triggers and Error 266

xact_state()

Pattern for Error Handling in Triggers

Communicating Errors from Triggers

Can I Prevent the Trigger from Rolling Back the Transaction?

BACKUP/RESTORE and Other Administrative Commands

The Presumptions

DBCC CHECKDB

Alternatives for Error Handling

Introducing the CMD Catch Handler

Using Extended Events

Other Approaches

Error Handling with DDL

What Are the Challenges?

The DDL_Demo Database

Using the Table Designer

Trying the Script from the Table Designer

Using SQLCMD

Running Change Scripts in SSMS

Single-Batch Scripts

The Final Solution

Error Handling in Client Code

Overview of Error-Handling Mechanisms

What You Should Do in Case of an Error

Query Timeouts

ADO .Net and SqlClient

Other APIs

The Virtue of Getting All Result Sets

Conclusion, Feedback and Acknowledgements

Revision History

Index of All Error-Handling Articles

Here follows a list of all articles in this series:

Part One – Jumpstart Error Handling.

Part Two – Commands and Mechanisms.

Part Three – Implementation. (This article)

Appendix 1 – Linked Servers. (Extends Part Two.)

Appendix 2 – CLR. (Extends both Parts Two and Three.)

Appendix 3 – Service Broker. (Extends Part Three.)

Strategies and Principles

In Part Two we looked at how SQL Server acts in case of an error, and we found that there are a multitude of possibilities. Now is the time to piece all this together to try to find a general strategy for how to handle errors. This chapter is more of a philosophical nature, but it is important, because without a clear principle, we cannot implement error handling in a good way.

A General Philosophy on Error Handling

No matter which environment you program in, there are some general principles for error handling that you need to have a good understanding of, and I will cover these briefly.

The first point is the distinction between anticipated and unanticipated errors. An anticipated error is when you perform an action and you know very well that it can go wrong. For instance, you want to delete a row in a table which is referenced by umpteen foreign-key constraints. Rather than checking all these tables for references, you decide to attempt to delete, and if the deletion fails with error 547, your plan B is to set the column isactive to 0. A more general notion of anticipated errors is when you import data from an untrusted source which can have all sorts of flaws. You can proactively check for errors in advance, but you can also try to process the rows one by one well-knowingly that this may fail, and in this case send the row to an error table and continue with the next.

An unanticipated error is just that: an error that occurred in a piece of code where you did not expect an error to occur. The error could be due to a bug in your code, incorrect data from a caller you had reason to trust, or it could also be a concurrency issue like a deadlock. And you should not overlook the possibility that someone three years from now makes a change to a table that breaks your code. For this reason, you need to have a general preparedness for that things could go wrong. Once you have a framework for unanticipated errors, you should be able to extend it to handle anticipated errors in the situations where you need to do this.

An important question is: Why do we have error handling at all? If we naïvely assume that our code can never fail and we ignore an unanticipated error, we could go on and update a table incorrectly, return incorrect data to the user, or send an incorrect signal to a system. These errors could in their turn lure people to make wrong decisions or have some other grave consequence.

Error handling should include diagnostics. The error message should be as clear as possible, so that we can understand the error and correct it. Of course, in many cases, the error message comes from SQL Server, and we can't do much better than this. But we should not do any worse.

It cannot be stressed enough that your error-handling code must be tested and tried. No matter whether you handle anticipated or unanticipated errors, it's pointless to write complex code to handle specific errors, unless you actually perform tests where you provoke these errors to occur. Unfortunately, error-handling code often becomes a white spot on the test map, despite that failing error-handling code can be very costly. If there is an error in the system, and you get the original error message, you may be able to quickly deduce the cause and perform corrective actions. But if all you have is an error message from the error handling itself, all you know is "something went wrong", and the time to resolve the issue may manifold. For one thing, it may take you considerable time to understand that the message you see is due to bad error handling and unrelated to the root cause.

The point about testing leads us to the next one: simplicity. Your code for handling unanticipated should be short and simple. The simpler the error-handling code is, the easier it will be to test, and it is the less likely that it will fail. You may call a stored procedure like error_handler_sp or the more sophisticated SqlEventLog, that I will present later in this article. And that is alright, since that code is in one single place for which you easily can write a set of unit tests. But the error-handling code you have in your stored procedure should be short and non-intrusive. That is, your code for handling unanticipated errors should not dwarf the main meat of your stored procedure, and TRY-CATCH permits us to do that. When you work with anticipated errors, the scenario may be different. If you work with importing data from an untrusted source, your code may be peppered with error checks and error handling, but hopefully in this case, you also have the time to test that your code actually works. And for all you care, the main meat of such a procedure may be exactly all that error handling.

Not only should your error-handling code be short and simple, it should also be robust, solid and consistent. That is, there is little use in writing error-handling code that only works for some situations.

Our next point is finding safe ground. That is, when an unanticipated error occurs, you are in a situation where you don't know what is going on, and you need to back out to a place where you again have full control, and where you know that you can continue executing without the error causing further accidents down the road. In many situations this means clean up, log and terminate. In the context of database code, "clean up" typically means to roll back, something I will discuss in more detail in the next section. Logging is not mandatory, but it helps forensic investigations. And terminate means just that – your code stops executing. Termination can be done in different ways. Typically you should reraise the error, so that the caller can make its own decisions. Maybe the caller anticipated your failure and has a plan B. Or it hasn't, and the caller will apply the same measures and so the call stack unwinds, and maybe eventually all execution will terminate – there was simply no safe ground to find. Which in the case of stored procedure called from an application is quite a common case. (And then it is up to the application is to find safe ground. For a GUI application that would be to wait for user input in the form from which the database call was made.)

The final point is communicate. Almost always you should communicate the error one way or another. For unanticipated errors this is normally done by reraising the error. Never must you silently terminate without explaining why. When it comes to anticipated errors, it is a little different. In the example above where you try to delete and in case of a foreign-key error set an isactive column to 0, there is rarely reason to make a noise about this. On the other hand, if you are processing data from an untrusted source, any errors should be raised to the caller, logged into an error table or whatever that fits into the context.

An Ideal Handling of Transactions?

If you program a database application, and you write error-handling code, you need to consider the possibility that there may be a transaction in progress. How to deal with this? Let's first look at this from a general perspective. Ideally, a stored procedure should only meddle with its own work. That is, if an error occurs inside a transaction, the procedure should roll back all work the procedure has performed inside that transaction. Or more exactly, it should roll back to the starting point for the piece of atomic work. The procedure may have performed other units of atomic work before that point, which it sees little reason to roll back.

But if the caller has an outer transaction, the procedure should not roll back that outer transaction, but only communicate the error to the caller, and then it is up to the caller what to do with the error. Maybe roll back, maybe try a plan B. Note that the caller here does not have to be an outer stored procedure, but it could also be a client program. What we are aiming for here is a strictly modular transaction scope.

Can this be achieved in SQL Server? The answer is largely negative. True, we could code your procedures this way (for brevity, I use the syntax introduced in SQL 2008 where variables can be initiated in the DECLARE statement):

CREATE PROCEDURE SinisterFootwear AS
   DECLARE @havetran bit = CASE @@trancount = 0 THEN 0 ELSE 1 END,
           @tranname varchar(32) = replace(convert(char(36), newid()), '-','')
   BEGIN TRY
      IF @havetran = 0
         BEGIN TRANSACTION @tranname
      ELSE
         SAVE TRANSACTION @tranname
      -- Do work nere
      IF @havetran = 0
         COMMIT TRANSACTION
   END TRY
   BEGIN CATCH
      ROLLBACK TRANSACTION @tranname
      EXEC error_handler_sp
   END CATCH

That is, if there already is an active transaction when the procedure commences executing, we don't issue a new BEGIN TRANSACTION, but instead we create a savepoint with SAVE TRANSACTION. @tranname is a GUID (with the dashes stripped, to fit in 32 characters), which we can trust to be unique. At the end of the TRY block, we commit only if we own the transaction ourselves. In the CATCH block we roll back, but only our own work. That is, only if we started the transaction ourselves we roll back it all, else we only roll back to the savepoint.

What's wrong with this model? There are several conflicts with the principles I presented in the previous section.

It's not robust and consistent – recall that many errors doom the transaction, in which case you have no choice but to roll back the entire transaction. To cover this, the CATCH handler would have to read:
```
IF xact_state() > 0
   ROLLBACK TRANSACTION @tranname
ELSE if xact_state() < 0
   ROLLBACK TRANSACTION
```
But then you have an eloquent scheme that only works sometimes. And if you adhere to the best practice and use SET XACT_ABORT ON, the transaction will always be doomed. Recall also that xact_state() is not wholly reliable in SQL 2005 and SQL 2008.
Another problem is that our code may experience a compilation error which aborts the procedure without firing the CATCH handler. The caller will not know our @tranname and cannot clean up after us. It may have its own @tranname, but it may also be entirely ignorant of the fact that we have a transaction.
The code violates the principle of simplicity that I outlined in the previous section. The code is complex, and takes focus from the actual logic in the procedure. If you have a specific situation where you must have this behaviour – and where you anticipate errors that do not doom the transaction – you can certainly implement logic like the above. But as a general pattern that we would use in the major bulk of our stored procedures, of which we may have several hundreds or thousands, this pattern is not acceptable. While you can create a template that developers can use in SSMS; there is still risk that the code gets mutilated as edits are made, not the least when the developers need to deviate from the template to cater for special sitautions.

So while the above is noble, and may even be implementable in Oracle or DB2, this is SQL Server, and we need to find something simpler – and something more simple-minded. SQL Server itself is a horrible place when it comes to error-handling. Please don't get me wrong, I think the above is highly desirable, but to achieve this in SQL Server, there are a couple of things that Microsoft would need to address. Here is a quick outline of the most important points:

An end to doomed transactions (even when XACT_ABORT is ON); transactions should only be doomed where there is a reason to, for instance with deadlocks or resource errors. I have a feedback item for this.
Catch compilation errors in the TRY block where they occur. I have a feedback item for this as well.
Savepoints must work in distributed transactions – or at least there must be a way to detect that you are in a distributed transaction. I submitted a feedback item for the latter.
It would also help with a better syntax where this behaviour is built in, so you don't have to write this logic with checking @@trancount in every procedure. ANSI SQL has a construct BEGIN ATOMIC, which defines a block and if you leave the block in a normal way this commits your transaction. But if you exit with an error you roll back. This is something I have not mustered the energy to submit a feedback item for.

All this said, there is a corner of SQL Server where all points but the third are covered: natively compiled stored procedures which I presented in Part Two. But since their feature set is so limited, it is not much of help for our daily chores where the main bulk of our code will be traditional stored procedures even if we embrace In-Memory OLTP.

Throw Your Ideals Away – This is SQL Server

If the conclusion from the previous section is that we cannot roll back to a savepoint, what do we do in case of an unanticipated error? The answer is: roll back the entire transaction. Something went wrong, what we don't know, but we cannot take responsibility for that data produced by the transaction is consistent. Since we can't fulfil our contract, we must roll back it all.

We have seen that different errors in SQL Server have different outcomes and trying to understand all these possibilities can be quite confusing. But we can summarise the situation in a couple of points. As long as you run with XACT_ABORT OFF and there is no TRY-CATCH, the following applies:

You cannot rely on that an error will abort the execution; execution may continue with the next statement.
You cannot rely on that you will be able to process or log the error; the error may abort the execution.
You cannot rely on that any open transaction will survive the error.
You cannot rely on that the transaction will be rolled back because of the error.
You cannot rely on that you will be able to roll back the transaction yourself in case of an error – execution may be aborted and leave the transaction orphaned.
Even if you don't start a transaction in your procedure yourself, and nor is one active when your procedure starts executing, you cannot rely on that there is no active transaction when your procedure exits – a procedure you call may start a transaction and then run into a scope-aborting compilation error.
If you have an open transaction and call another stored procedure, that procedure may have rolled back the transaction when it returns.

Or with other words, you have very little control.

If you run with XACT_ABORT ON, but you don't use TRY-CATCH, this changes for the points above:

Still applicable, but the number of errors for which this can happen is largely reduced.
Fully applicable.
Fully applicable.
Still applicable, but the number of errors for which this can happen is largely reduced.
Entirely voided. If execution is aborted, the transaction is always rolled back – recall that this also applies to attention signals.
Entirely voided. With XACT_ABORT ON, these errors abort the batch.
Still applicable, but the number of errors for which this can happen is largely reduced.

This is quite an improvement; you have almost a consistent and safe behaviour. But there are a few situations where you have to be careful:

RAISERROR – but since you add this yourself, you can add ROLLBACK and a RETURN statement.
Syntax errors when using dynamic SQL.
Calling other stored procedures: they could communicate errors with RAISERROR or produce error 266 (trancount mismatch) because of improper use of BEGIN/COMMIT TRANSACTION.

Despite these holes, using SET XACT_ABORT ON is good enough as your sole error handling in a simple system where you only have a handful of procedures that don't nest, or for something you assemble fairly quickly for a proof-of-concept. The main concern in the list above is RAISERROR in nested procedures, so one could argue that in SQL 2012 and later, it's better to use ;THROW, since ;THROW always aborts the execution and rolls back the transaction when XACT_ABORT is ON.

What if you use TRY-CATCH, but do not use XACT_ABORT ON?

Voided. Execution is diverted to the CATCH block.
Still applicable, if reduced. But you cannot catch compilation errors in the TRY block.
Still applicable; the transaction may be doomed.
Still applicable, but reduced. Compilation errors are still a problem.
Still applicable, only reduced for ;THROW.
Voided, as you can control this in the CATCH block.
Voided. Execution is diverted to the CATCH block.

While better than the first list, it is not good. Particularly, the risk for orphaned transactions still exists. Thus, the answer is that to get proper error handling in your stored procedure, it should include the command SET XACT_ABORT ON in the beginning and there should be a TRY block that covers all executable statements in the procedure. (Depending on what you do in the procedure, you may also need nested TRY blocks, but we leave that for now.) The CATCH handler should perform three actions:

Roll back any open transaction.
Reraise the error. This is very important. A CATCH handler which just drops the error into the bit bucket is generally evil. (There are situations where you want to do this, but they are not the norm.)
Issue a RETURN statement with a non-zero return value.

The first point is simple to implement, but on SQL 2005 and SQL 2008, the second point is more difficult – if you insist on doing it directly in the CATCH handler. But there is no reason to, you should use a stored procedure like error_handler_sp which accesses error_message() & co and formats the error.

The last point is something I have covered only superficially, so now it's the time to dwell on it a bit. One important thing I did not discuss in the general error-handling philosophy is that you should not rely on that the procedure that calls you have the same error handling as you have. The caller may be a stored procedure that was written in the days of SQL 2000 (or by a programmer whose mind-set is still in that era), so the call to your SP may look like this:

EXEC @err = your_sp @par1, @par2
SELECT @err = coalesce(nullif(@err, 0), @@error)
IF @err <> 0 BEGIN IF @@trancount > 0 ROLLBACK TRANSACTION RETURN @err END

Yes, in the days of SQL 2000 we had to write code like that. After every procedure call. (And after every INSERT, UPDATE and DELETE for that matter.) A different case is when the caller is on a different server; in this case errors may go directly to the caller's client without a possibility for the caller to catch them with TRY-CATCH. (I discuss this in more detail in the appendix on linked servers.) For this reason, the last statement in your CATCH handler should be RETURN with some number. Which number is not terribly important, as long as it isn't 0.

Bringing these three points together, we get this that works in all versions of SQL Server from SQL 2005:

BEGIN CATCH
   IF @@trancount > 0 ROLLBACK TRANSACTION
   EXEC error_handler_sp
   RETURN 55555
END CATCH

On SQL 2012 or later, you can get a shorter error handler with ;THROW:

BEGIN CATCH
   IF @@trancount > 0 ROLLBACK TRANSACTION
   ; THROW
END CATCH

But don't forget that semicolon! (And that alone is for me enough to prefer the solution with error_handler_sp.) Since ;THROW aborts the batch, there is no need for RETURN.

We have now arrived at a general simple pattern to handle unanticipated errors that protects us against orphaned transactions as well as incomplete transactions. But we do rely on that our caller also have some sort of error handling – if not the caller could continue executing and persist the second half of its transaction. This is the best we can do in SQL Server.

While the pattern is general, it is not applicable to all situations, and later we will explore cases where we need to refine or modify the pattern. But first we will expand our model to look at how we raise our own errors, and not the least how we can log the errors.

Raising Your Own Errors

When designing tables you should add constraints to enforce business rules and not the least to state what you support and what you do not. For instance, say that an order can be either from a customer or another warehouse. In such case, both CustomerID and WarehouseID should be nullable, but there should be a CHECK constraint that requires that exactly one of them is non-NULL. But constraints only take you so far and for more complex rules, you will need to raise an explicit error yourself and in this section I will briefly cover what sort of validations you may add.

Constraint-like validations are often performed in triggers. Here is one example: we have an OrderDetails table and a Products table. Some products cannot be ordered, because they are no longer carried. They still exist in the Products table, since there are old orders with these products. But it would be an error to add a discontinued product to an order, and the INSERT trigger for OrderDetails may include this check:

IF EXISTS (SELECT *
           FROM   inserted i
           JOIN   dbo.Products P ON i.ProductID = P.ProductID
           WHERE  P.IsDiscontinued = 1)
BEGIN
   ROLLBACK TRANSACTION
   RAISERROR('One or more products on the order is discontinued', 16, 1)
   RETURN
END

In many cases the purpose of constraints and trigger checks is primarily to protect the database from the application and not so much from errors in user entry. You expect the application to one way or another stop the user from entering incorrect data, for instance by not offering discontinued products in the first place. For this reason, the error message does not have to be crystal clear to the end user, but rather it can speak in database terms, nor is there any need for localisation. You may argue that this is a duplication of logic, but it is inevitable. The application should attempt to guide the user away from errors by means of drop-downs, radio-buttons etc. But the database must never trust the application to be the sole actor to enforce business rules. If nothing else, someone may bypass the application and perform updates directly through SQL. Occasionally, there are complex validations which are difficult for the application to implement without duplicating the exact check that is in the trigger. That is not very smart, and in this case it is better that the application simply attempts the operation. In such a situation, your trigger must make a better effort to explain the error. To take the example above, you may walk an extra mile and inform the user exactly which product that has been discontinued.

Should checks of this type be in triggers or in stored procedures? When I do database design, I typically put them in triggers. For me, the stored procedures are application code, and in many cases they are written by the same person who implements the client parts. That is, persons that I as a DBA should not trust. I also want to protect the database against ad-hoc updates performed by users or support staff who may not really know what they are doing. This is not to say that you should not also have validations in your stored procedures. For a stored procedure that serves a single input form and makes a plain insert or update, there is not much reason to add validations, as long as there are constraints and triggers in place. On the other hand, in a general-purpose stored procedure called from many places, you would typically have lots of validations.

Not all validations serve to protect the database. They can enforce a call sequence of some sort, or validate that certain parameters cannot be combined. Another example is a stored procedure that includes a long transaction which you decide to break up in several procedures, maybe because there are parts that are performed more than once. These subprocedures are useless to run on their own, and that is why you should add this assertion:

IF @@trancount = 0 OR @@nestlevel = 1
BEGIN
   RAISERROR('This is a subprocedure that must not be called directly', 16, 1)
   RETURN 1
END

Note: Even better would be to validate that the caller is the main procedure, but there is no good way to do this.

And speaking of assertions, not only should you validate because you mistrust others to do things right – you have no reason to trust yourself either. If you find that your logic relies on some assumption, you should add an assertion that raises an error if your assumption is proven wrong. For instance, when computing FIFO values for an inventory, you may have little reason to expect a negative quantity at any time, so you don't plan to handle that. But what if? Thus, you add this check:

IF @qty_in_stock < 0
BEGIN
   RAISERROR('On %s the quantity for product %s was negative, %d.', 16, 1,
             @date, @product, @qty_in_stock)
   RETURN 1
END

If the unthinkable happens you will be waving the white flag and not produce an incorrect result.

In all these examples I have used RAISERROR to raise errors, but starting with SQL 2012, there is also the ;THROW command. Which of them should you use to communicate your errors? The answer might be neither! It is a matter of ambition, but in the system I work with – this is a big enterprise application – we used RAISERROR initially, but we identified three important shortcomings:

We wanted errors to be logged in the database, both the errors we raise ourselves and errors from SQL Server.
We did not want to use ad-hoc texts like above, but rather we wanted some code that we could use for the same error in several places. RAISERROR permits you to use error numbers added with sp_addmessage, but as I discussed in the section on RAISERROR in Part Two, sp_addmessage is not recommendable for a number of reasons.
Since our system runs in several countries, we wanted localisation for user errors.

This lead us to implement our own facility to log and raise errors and in the next chapter I will present SqlEventLog which is drawn from our solution.

SqlEventLog – A Versatile Solution for Error Handling

What I will present here is a general framework for raising errors, capturing errors and logging errors. By using a loopback, the log records remain even if they are logged in a transaction which is later rolled back. There are three main features of SqlEventLog:

The table sqleventlog which holds the logged messages.
The stored procedure sqleventlog_sp that you use in place of RAISERROR or ;THROW to raise errors and at the same time log them to sqleventlog.
The stored procedure catchhandler_sp which you call from your CATCH handlers to log and reraise errors

To this come two more tables to support localisation, a couple of internal stored procedures and a CLR assembly to implement the loopback.

I like to stress that while you could incorporate the solution I present here as-is into your system, I want you to see this as an idea that you can refine. If you find that I am overly ambitious in places, I invite you to simplify. Or if you think that I cut corners too narrowly, please feel free to extend the solution.

I will start with giving you download and installation instructions, in case you want to play with the examples as you read the text. The actual description begins with the presentation of slog schema and the table sqleventlog. I go on with describing the interfaces of sqleventlog_sp and catchhandler_sp. Next, I cover the localisation parts of SqlEventLog. I then switch over to the internals to discuss how the procedures look on the inside and not the least how the loopback is implemented. The chapter closes with a discussion on permissions. If you find that the internals stuff is too much of detail, please feel free to skip to the short Permissions sections at the end of the chapter.

Installing SqlEventLog

All files for SqlEventLog are included in sqleventlog.zip. Download this file and extract the contents to a folder where your SQL Server instance has read access.

If you have Visual Studio installed, open a Visual Studio command prompt. If not, open a regular command prompt and run this command to get a C# compiler in your path:

PATH=C:\Windows\Microsoft.NET\Framework\v2.0.50727;%PATH%

Yes, this is a very old version of the C# compiler. Feel free to use a newer one. But you will need this old one, if you want to load the assembly on SQL 2005 or SQL 2008 that only supports .NET 2.0.

Navigate to the folder where you extracted the files. In the case you plan to install SqlEventLog on an SQL Server instance which is not on your local machine, you need to make the navigation so that the current directory is a UNC path even if you have a drive letter mapped for the network share. This is required for the loopback assembly to load.

The zip file includes three installation files: build_sqleventlog.bat (for SQL 2008 and up), build_sqleventlog_2005.bat (for SQL 2005) and build_sqleventlog_azure.bat (for Azure SQL Database). In this section, I'm only covering the first two. I will return to Azure SQL Database in a separate section. Before you run can the installation file you need review its configuration. Open the file in the text editor of your choice. At the top of the file, there are two lines to set the server and database where you want to install SqlEventLog.

SET SERVER=.
SET DATABASE=tempdb

The dot means "the default instance on the local server". Change as you please. Keep in mind that if you have an instance of Express Edition installed, you should typically enter .\SQLEXPRESS for the server. I recommend that you stick to tempdb, unless you want to test SqlEventLog in your own application.

Next there is a line that defines how to use SQLCMD:

SET SQLCMD=SQLCMD -S %SERVER% -d %DATABASE% -I

To log on to your instance with Windows authentication, keep the line as-is. If you use SQL authentication, you need to add ‑U and ‑P to specify username and password. The ‑I option tells SQLCMD to run with QUOTED_IDENTIFIER ON; you should always run SQLCMD with this option. You will need to have sysadmin permission on the instance, as the script creates an asymmetric key in the master database and a login from that key to be able to create a privileged assembly. (For more details on this and why, see the section Implementing the Loopback later in this chapter.)

The build file includes a command to create a key to sign the loopback assembly:

SN -k keypair.snk

Typically, you only have SN installed if you have Visual Studio (or the .NET Framework SDK) on your machine. If you do have not have SN, just comment out this line. There is a keypair.snk in the zip file, so you will get by anyway.

Now you are all set and can run the .bat file to install. If you want to make changes to the code and test these, just rerun the .bat file. The first file, slog_schema.sql, will drop all objects in the slog schema and also drop the assembly and the key and the login in the master database.

Note: The .zip file includes some files that are not part of the SqlEventLog proper, and they are not installed when you run the .bat file. These files are part of the CMD Catch Handler that I will introduce in chapter seven.

The slog schema and the slog.sqleventlog Table

To keep SqlEventLog as a single group, all objects in SqlEventLog (that can belong to a schema) are in the slog schema. So why is the name of the schema slog and not SqlEventLog? I've come to the conclusion that schema names should be short and non-intrusive. When you use schema names, you tend to use them over and over again. Long schema names would litter the code. In this text, I'm leaving out the schema name in the running text most of the time for the sake of even more brevity.

Here is the definition (for SQL 2008 and later) for the table sqleventlog:

CREATE TABLE slog.sqleventlog (
   logid     bigint         NOT NULL IDENTITY,
   logdate   datetime2(3)   NOT NULL  
      CONSTRAINT default_slog_logdate DEFAULT sysdatetime(),
   msgid     varchar(36)    NULL,
   errno     int            NULL,
   severity  tinyint        NOT NULL,
   logproc   nvarchar(257)  NULL,
   msgtext   nvarchar(2048) NOT NULL,
   errproc   sysname        NULL,
   linenum   int            NULL,
   username  sysname        NOT NULL,
   appname   nvarchar(128)  NULL,
   hostname  nvarchar(128)  NULL,
   CONSTRAINT pk_sqleventlog PRIMARY KEY NONCLUSTERED (logid)
)
go
CREATE CLUSTERED INDEX logdate_ix ON slog.sqleventlog (logdate)

logid is just a surrogate key, since a natural key is not to be found in a table like this one.

logdate is when the message was logged. In the SQL 2005 version of this table the column is datetime, but for later versions I have opted to use datetime2(3) to get millisecond resolution. (datetime only has a resolution of 3.33 ms). Why not datetime2(7) and a resolution of 100 ns? Because the output from sysdatetime() only changes about once a millisecond.

msgid is a mnemonic code for the message. It is always NULL for SQL Server messages. It may have a value for messages logged with sqleventlog_sp. I will return the purpose of this column when I discuss the localisation feature.

errno is the error number raised by SQL Server and logged by catchhandler_sp. It is typically NULL for messages logged directly with sqleventlog_sp.

severity is just that: the severity for the message. For an error raised by SQL Server and logged by catchhandler_sp, it is a value between 11 and 19. (Recall that errors with higher severity level cannot be caught.) For errors logged with sqleventlog_sp, it can be any value between 0 and 255, although there is little reason you would use a value > 16.

logproc is the stored procedure that called sqleventlog_sp or catchhandler_sp to log the message. NULL would indicate that the message was logged from a script – or that SqlEventLog was not called correctly. Exactly what I mean by this will be apparent in the next section. If logproc is in the dbo schema, the schema portion is left out from the name to reduce noise. If the procedure is in another schema, the schema part is present.

msgtext is the actual message logged: the error message from SQL Server or the message you logged with sqleventlog_sp. It is restricted to nvarchar(2048), since this is the general maximum length for error messages in SQL Server.

errproc normally only has a value for error messages from SQL Server. This is the name of the stored procedure where the error was raised as returned by the function error_procedure(). If errproc is different from logproc, this means that the error was caught in an outer procedure.

linenum is also for SQL Server errors and holds the value from error_line().

username holds the name of the user that was running the code that logged the message. This is normally the output from SYSTEM_USER. SYSTEM_USER may change during a session due to impersonation with EXECUTE AS. In this case, the name of the user who actually logged in, taken from original_login(), is shown in parentheses. (For more information about impersonation, original_login() etc see my article Packaging Permissions through Stored Procedures.)

appname is the name of the application running, taken from app_name(). (Beware that the application freely can set any name in the connection string.)

hostname is presumably the name of the machine the user connected from. I say presumably, because as with the application name, the application can set anything it wishes in the connection string.

The Procedure slog.sqleventlog_sp

You use this procedure to log and raise a message of any severity. You can also opt to only log the message. If you call sqleventlog_sp in an open transaction, sqleventlog_sp makes a loopback connection to log the message, so that it is persisted even if the transaction is rolled back.

The procedure has this signature:

CREATE PROCEDURE slog.sqleventlog_sp
                 @procid    int,
                 @msgtext   nvarchar(2048),
                 @severity  tinyint       = 16,
                 @msgid     varchar(36)   = NULL,
                 @raiserror bit           = NULL,
                 @errno     int           = NULL,
                 @errproc   sysname       = NULL,
                 @linenum   int           = NULL,
                 @p1        sql_variant   = NULL,
                 @p2        sql_variant   = NULL,
                 @p3        sql_variant   = NULL,
                 @p4        sql_variant   = NULL,
                 @p5        sql_variant   = NULL,
                 @p6        sql_variant   = NULL,
                 @logid     bigint        = NULL OUTPUT AS

@procid – for this parameter, you should always pass @@procid which holds the object id of the currently executing stored procedure. The one exception to this rule is a more specialised logging procedure that sits on top of sqleventlog_sp and which itself accepts @@procid as a parameter and passes the bucket.

@msgtext is the message to log. The message text can be parameterised with %1 to %6 as the parameter markers. You pass the values for the markers in the parameters @p1 to @p6. @procid and @msgtext are the only mandatory parameters.

@severity is the severity for the message. It defaults to 16. You can pass any value from 0 up to 255 if you are so inclined. sqleventlog_sp will log that severity level, but it will never raise the error with a higher severity level than 16.

@msgid maps to the column msgid in the sqleventlog table, and I will talk more about it in the localisation section.

@raiserror is a tri-state flag. With the default NULL, the message is printed if @severity is ≥ 11, else not. When you pass @raiserror as 0, the message is not printed, only logged. If you pass 1, the message is always printed.

@errno, @errproc and @linenum map to their namesake columns in the sqleventlog table. They are intended to be used by catchhandler_sp, but if you find a use for them when calling sqleventlog_sp, be my guest.

@p1 to @p6 provide the values for %1 to %6 in @msgtext. Since they are of the type sql_variant, you can pass almost any SQL Server type without having to convert the value. Here is a list of the data types you cannot pass:

LOB types, that is MAX, xml etc.
CLR types (this includes the built-in CLR types like hierarchyid).
timestamp.

@logid is an output parameter that returns the id for the logged message, in case you want to retrieve the message later.

Here is an example where we call sqleventlog_sp with a parameterised string and the default settings. We capture @logid so that we can look what is in the table.

CREATE PROCEDURE FilthyHabits AS 
   DECLARE @dec decimal(12, 2), @now datetime, @logid bigint
   SELECT @dec = 12.10, @now = getdate()
   EXEC slog.sqleventlog_sp @@procid, 
                            '%1 % voted for %2. The time is %3',
                            @p1 = @dec, @p2 = NULL, @p3 = @now,
                            @logid = @logid OUTPUT
   SELECT logproc, msgtext FROM slog.sqleventlog WHERE logid = @logid
go
EXEC FilthyHabits

When I ran this, I got this output:

Msg 50000, Level 16, State 1, Procedure sqleventlog_sp, Line 184

12.10 % voted for (null). The time is 2014-06-03 23:01:43.693

logproc msgtext

--------------- -------------------------------------------------------------

FilthyHabits 12.10 % voted for (null). The time is 2014-06-03 23:01:43.693

A couple of observations: The error message is said to have been raised in sqleventlog_sp – there is not much we can do about this. Beside the parameter holders, the message text includes a lone % character. Such a lone % causes problems with RAISERROR and ;THROW, but not with sqleventlog_sp. You can see how NULL values are presented: (null). (RAISERROR does it the same way). You can also see that the message has been logged to the table in the same way as it was printed.

Let's look at one more example. This time we have a procedure is in a different schema (the guest schema, which exists in all databases for legacy reasons), and we specify the @severity to 0, because we don't want to raise an error, only log it.

CREATE PROCEDURE guest.Flambay AS
   DECLARE @d datetime, @i int, @place varchar(20), @logid bigint
   SELECT @d = '19031217', @i = 4, @place = 'Kitty Hawk'
   BEGIN TRANSACTION
   EXEC slog.sqleventlog_sp @@procid, 
                            'On %1 there were %2 flights conducted at %3.',
                            @severity = 0, @p1 = @d, @p2 = @i, @p3 = @place,
                            @logid = @logid OUTPUT
   ROLLBACK TRANSACTION
   SELECT logproc, msgtext FROM slog.sqleventlog WHERE logid = @logid
go
EXEC guest.Flambay

This is the output:

logproc msgtext

--------------- -------------------------------------------------------------

guest.Flambay On 1903-12-17 there were 4 flights conducted at Kitty Hawk.

Note that the logproc column includes the schema name, since the procedure is not in the dbo schema. Also, pay attention to that sqleventlog_sp drops the time portion from @p1; datetime values where the time portion is midnight are very common. Observe the transaction: the procedure starts a transaction which is rolled back. Nevertheless the message is persisted in sqleventlog thanks to the loopback.

The Procedure slog.catchhandler_sp

This procedure serves the same purpose as error_handler_sp which I introduced in Part One. That is, you call it from a CATCH handler and it interrogates the error_xxx() functions for information about the error and reraises the error, augmenting it so that you can see where the error occurred originally. But unlike error_handler_sp, it also logs the error to the sqleventlog table. There are also some parameters to control its exact behaviour.

Let's first look at a plain-vanilla example:

CREATE PROCEDURE AybeSea AS
   SET NOCOUNT ON
   BEGIN TRY
      DECLARE @WPLJ TABLE (a varchar(3) NOT NULL)
      INSERT @WPLJ (a) VALUES('Too long')
   END TRY
   BEGIN CATCH
      EXEC slog.catchhandler_sp @@procid
   END CATCH
go
EXEC AybeSea
go
SELECT TOP 1 logproc, severity, errproc, linenum, msgtext
FROM   slog.sqleventlog
ORDER  BY logid DESC

As you can see, just like sqleventlog_sp, we pass @@procid as the first parameter. The output is:

Msg 50000, Level 16, State 14, Procedure catchhandler_sp, Line 125

{8152} Procedure AybeSea, Line 5

String or binary data would be truncated.

logproc severity errproc linenum msgtext

-------- -------- -------- ------- -----------------------------------------

AybeSea 16 AybeSea 5 String or binary data would be truncated.

The first line of the error message is the standard one produced by RAISERROR. Error number, procedure name and line number will always be the same, while the severity and the state are the original ones. (Except in the rare case when the severity of the message is 19; in this case the error is reraised with level 18, since level 19 requires WITH LOG.) On line two, you find the original error number in braces, procedure name (if there is any) and line number. Line three is the actual error message. In the example output we also see some of the information logged to sqleventlog.

Note: this particular error message was improved in SQL 2019, but I'm keeping the old version for brevity.

Let's look at what happens if we add an outer procedure which traps the reraised error:

CREATE PROCEDURE HolidayInBerlin AS
   BEGIN TRY
      EXEC AybeSea
   END TRY
   BEGIN CATCH
      EXEC slog.catchhandler_sp @@procid
   END CATCH
go
EXEC HolidayInBerlin
go
SELECT TOP 1 logproc, severity, errproc, linenum, msgtext
FROM   slog.sqleventlog
ORDER  BY logid DESC

The output is exactly the same as above. That is, catchhandler_sp sees that this is a message it has been reraised once already, and does not augment it further. Nor does it log the message a second time to avoid littering in the log table. (Although, you could argue that logging a reraised error could serve as a stack trace.)

What happens if the error caught was logged and raised with sqleventlog_sp? This example illustrates:

CREATE PROCEDURE MudShark AS
   SET NOCOUNT ON
   BEGIN TRY
      EXEC slog.sqleventlog_sp @@procid, 'The Edgewater Inn, Seattle, Washington'
   END TRY
   BEGIN CATCH
      EXEC slog.catchhandler_sp @@procid
   END CATCH
go
EXEC MudShark
go
SELECT TOP 2 logproc, severity, errproc, linenum, msgtext
FROM   slog.sqleventlog
ORDER  BY logid DESC

The output is:

Msg 50000, Level 16, State 1, Procedure catchhandler_sp, Line 125

The Edgewater Inn, Seattle, Washington

logproc severity errproc linenum msgtext

--------- -------- -------- ------- -----------------------------------------

MudShark 16 NULL NULL The Edgewater Inn, Seattle, Washington

AybeSea 16 AybeSea 5 String or binary data would be truncated.

catchhandler_sp sees that the message was raised by sqleventlog_sp, and realises that there is little point in augmenting the message or logging it a second time.

After these examples, let's look at the full parameter list:

CREATE PROCEDURE slog.catchhandler_sp
                 @procid        int = NULL,
                 @reraise       bit = 1,
                 @errno         int = NULL OUTPUT,
                 @errmsg        nvarchar(2048) = NULL OUTPUT,
                 @errmsg_aug    nvarchar(2048) = NULL OUTPUT AS

@procid. You should always pass @@procid. It is nullable, but this is only because I want the procedure to be as forgivable as possible. Error-handling code should not fail just because you were programming half-asleep and left out a parameter.

@reraise controls whether catchhandler_sp should reraise the message or only log it. The default is to reraise, but you can pass @reraise as 0 to tell catchhandler_sp to keep quiet. You would do this is if you catch an error in a loop and want to continue looping. This is something we will see an example of in chapter five. When you set @reraise to 0, you often use one of the output parameters to retrieve information about the error message.

@errno returns the original error number. That is, for a reraised error, this is the number in braces in the error message, and not 50000 which is what error_number() returns in case of a reraise.

@errmsg and @errmsg_aug – these output parameters both return the error message. @errmsg returns the original error message as-is, without the addition of original the error number, procedure name and line number. @errmsg_aug includes this augmented information as far it is available.

One important thing to be aware of is that if you call catchhandler_sp in a transaction, the procedure will always roll back the transaction. This aligns with the philosophy presented in the previous chapter: always roll back the transaction in case of an error. Despite this, your CATCH handler should still include the line:

IF @@trancount > 0 ROLLBACK TRANSACTION

If you rely on catchhandler_sp rolling back the transaction, error 266 will be raised when catchhandler_sp exits, unless the reraised error is caught anew further up the stack. Thus, to avoid this noise message, roll back yourself.

You may not like that catchhandler_sp unconditionally rolls back an open transaction, but beside the philosophical aspect, this is also driven by necessity. To wit, there is a funny restriction, which I discuss a little further in the CLR appendix, that prohibits you from calling CLR procedure in a doomed transaction, and the loopback is a CLR procedure. Thus, to be able to perform its task, catchhandler_sp needs to roll back the transaction.

Localisation

As I mentioned, SqlEventLog is drawn from a solution we have in a system I have worked with for many years. Many of our messages are of the type "internal error" and for these it is perfectly OK to have them in English only. But some messages are validations directed to end users or they are messages from background tasks that end up in monitoring functions. There are installations of the system in Sweden, Finland, Norway and a few more countries. Users generally expect the application to talk to them in their native tongue, so we need to be able to produce the same message in different languages.

Since I had this solution and this is something dear to me, I decided to include a localisation feature in SqlEventLog. At the same time, I did not want to take up too much space with it. Therefore, I decided to cut corners to determine the preferred language of the user. In our system we have a users table which includes a languageid column. When the user logs in, the application updates the column from the current regional settings in Windows so that Windows is the sole master of the language for the application as well as the database.

I did not want to add a users table to SqlEventLog, since such a table is likely to be application-specific. Instead I decided to rely on the language settings inside SQL Server. However, this is something you should never do for a real-world solution. There are at least two major issues with the language settings in SQL Server:

SQL Server supports only 33 languages, mainly European languages, with a few Asian languages thrown in. Major languages like Hindi, Vietnamese and Ukrainian are missing.
The language setting is a cache key, meaning that users with different language settings will get different cache entries. This leads to cache bloat, and maybe more importantly, users with different languages may get different execution plans due to parameter sniffing, which can lead to confusing performance problems. (For a detailed discussion on this topic, please see my article Slow in the Application, Fast in SSMS?)

Thus, if you plan to embrace SqlEventLog in your system, you should modify it so that it picks up the language setting from an application table. (If you don't have any need for localisation, you could keep the code as-is, although I would recommend you to rip out the parts you don't need.)

All this said, let's now look at what SqlEventLog offers for localisation. The basis is the table slog.usermessages which looks like this:

CREATE TABLE slog.usermessages (
   msgid    varchar(36)    NOT NULL,
   lcid     smallint       NOT NULL,
   msgtext  nvarchar(1960) NOT NULL,
   CONSTRAINT pk_usermessages PRIMARY KEY (msgid, lcid)
)

msgid is the id for the message, which a mnemonic code rather than a numeric value to make the source code easier to understand and to reduce the risk for mix-ups which easily could happen with numeric ids. This is the same msgid that appears in the sqleventlog table. The parameter @msgid to sqleventlog_sp could be a value in this table, more about this later.

lcid is the language id. LCID ("Locale ID") is a Windows concept, and the LCID I use here are the same as in Windows. Yes, above I said that I use the language setting in SQL Server, but I still use Windows LCIDs, and this part you should keep. The only question is whether you should use the full LCID, or only the lower part. To wit, an LCID in Windows consists of two bytes. The lower byte identifies the language, for instance 9 is English and 29 is Swedish. The upper byte identifies the variety. E.g., 1033 (0x0409) is English (United States) and 2057 (0x0809) is English (United Kingdom). In our system, we use only the lower byte, but if you think that it is important that right-ponders see colour and left-ponders see color, you will need to use the full LCID. In this article I have opted to use the latter, but that is mainly because it made my code simpler.

msgtext is the message to present to the user for the language in question. The message can have parameter markers %1 to %6 just like the @msgtext parameter to sqleventlog_sp.

When you pass a value for the @msgid parameter to sqleventlog_sp, the procedure retrieves the current language for the session (which is returned by the system function @@langid) and translates it to an LCID is sys.syslanguages. (And again, in real life, the LCID would be read from a users table.) Since there may not be translations for all languages for a message id, the procedure also looks up the default language for the server from sys.configurations to use this as a fallback. Ah, yes, the fallback, this is worth dwelling on for a few moments. In our system, the fallback language is installation-dependent and defined in our system-parameter table. You may want to do something like that. Or simply hardcode to always use English, Russian or whatever depending on your market. You can also have multiple fallbacks. Say that you install a system in Barcelona, and you don't (yet) have all Catalan translations in place, so for Catalan users, you first fall back to Spanish and then to English which is your own base language.

sqleventlog_sp looks up the message text in usermessages, and this message, with parameters expanded, is presented to the user. Note that sqleventlog_sp always stores what is in @msgtext in the sqleventlog table; @msgid only matters for the message raised by sqleventlog_sp. If @msgid yields no hit in usermessages for neither language, sqleventlog_sp uses @msgtext as the error message. That is, sqleventlog_sp never returns a navel-gazing error message saying that the message id was not found, but @msgtext is the ultimate fallback. sqleventlog_sp always stores what you pass in @msgid in the msgid column in the sqleventlog table, even if @msgid is absent from usermessages. (That is, there is no foreign-key constraint to usermessages.) This permits you to use msgid for aggregation or other analysis of the log, even if you don't use the usermessages table.

Here are a couple of examples, using the two sample messages included in usermessages_tbl.sql. Of these the first message has five translations, including English, Italian and French, whereas the second message exists in English and Italian only. The last message uses a message id which is not among the sample messages:

SET LANGUAGE us_english
EXEC slog.sqleventlog_sp @@procid, 
     'This is my right hand in English', @msgid = 'RightHand'

SET LANGUAGE Italian
EXEC slog.sqleventlog_sp @@procid, 
     'This is my right hand in Italian', @msgid = 'RightHand'

SET LANGUAGE Français
EXEC slog.sqleventlog_sp @@procid, 
     'This is my right hand in French', @msgid = 'RightHand'

SET LANGUAGE us_english
EXEC slog.sqleventlog_sp @@procid, 
     'Customer "%1" not found (English)', @msgid = 'NoCust', 
     @p1 = 'Ricky Lancelotti'

SET LANGUAGE Italiano
EXEC slog.sqleventlog_sp @@procid, 
     'Customer "%1" not found (Italian)', @msgid = 'NoCust', 
     @p1 = 'Ricky Lancelotti'

SET LANGUAGE French
EXEC slog.sqleventlog_sp @@procid, 
     'Customer "%1" not found (French)', @msgid = 'NoCust', 
     @p1 = 'Ricky Lancelotti'

EXEC slog.sqleventlog_sp @@procid, 
     'This is our extra undefined message', @msgid = 'Extra'

SELECT TOP (7) logid, msgid, msgtext
FROM   slog.sqleventlog
ORDER  BY logid DESC

The output is:

Changed language setting to us_english.

Msg 50000, Level 16, State 1, Procedure sqleventlog_sp, Line 183

This is my right hand

L'impostazione della lingua è stata sostituita con Italiano.

Msg 50000, Level 16, State 1, Procedure sqleventlog_sp, Line 183

Questo è la mia mano destra

Le paramètre de langue est passé à Français.

Msg 50000, Level 16, State 1, Procedure sqleventlog_sp, Line 183

Ceci est ma main droite

Changed language setting to us_english.

Msg 50000, Level 16, State 1, Procedure sqleventlog_sp, Line 183

Customer "Ricky Lancelotti" not found.

L'impostazione della lingua è stata sostituita con Italiano.

Msg 50000, Level 16, State 1, Procedure sqleventlog_sp, Line 183

Cliente "Ricky Lancelotti" non c'è.

Le paramètre de langue est passé à Français.

Msg 50000, Level 16, State 1, Procedure sqleventlog_sp, Line 183

Customer "Ricky Lancelotti" not found.

Msg 50000, Level 16, State 1, Procedure sqleventlog_sp, Line 183

This is our extra undefined message

logid msgid msgtext

-------- ----------- ----------------------------------------

7 Extra This is our extra undefined message

6 NoCust Customer "Ricky Lancelotti" not found (French)

5 NoCust Customer "Ricky Lancelotti" not found (Italian)

4 NoCust Customer "Ricky Lancelotti" not found (English)

3 RightHand This is my right hand in French

2 RightHand This is my right hand in Italian

1 RightHand This is my right hand in English

In these examples, @msgtext was close to the English text for the message id. But this does not always have to be the case. You may have a generic message id for which the text in usermessages reads Internal error. Call the support desk in various languages, while the real error message is passed in @msgtext so that developers and support staff can see what happened, while the user is saved from what may be incomprehensible mumbo-jumbo to them.

Note also the last two messages in the example. The message NoCust is not available in French, which is why the English version is displayed. There is no message Extra at all in usermessages, but the id is stored in the msgid column nevertheless.

The localisation part of SqlEventLog includes one more table. Say that you have a function where users can monitor the table sqleventlog to see errors in the system. The same thing may apply here: users want to see a message in their own language, not whatever the developer used in the call to sqleventlog_sp. But obviously, they want to see the message with %1 etc replaced by the actual parameter values. To this end, SqlEventLog stores the parameter values in this table:

CREATE TABLE slog.sqleventlogparameters(
   logid    bigint        NOT NULL,
   paramno  tinyint       NOT NULL,
   value    nvarchar(400) NOT NULL,
   CONSTRAINT pk_sqleventlogparameters PRIMARY KEY (logid, paramno),
   CONSTRAINT fk_sqleventlogparameters
      FOREIGN KEY (logid) REFERENCES slog.sqleventlog(logid)
)

logid is of course a reference to the message in sqleventlog. paramno is the parameter, corresponding to %1 to %6 and value is the parameter value. Since @p1 to @p6 are sql_variant, it is surprising to see the values are stored as nvarchar(400). This is due to bugs in SQL Server which I will come back to when I describe the loopback. It's not a big deal, but it would have been preferable to use sql_variant in sqleventlogparameters, since this would make things simpler for a monitoring function that wants to display parameter values that are dates according to the user's regional settings.

If you look at the INSERT statements in usermessages_tbl.sql, you quickly realise that maintaining such a file to make sure that all translations are correct and no translations are missing is a daunting task. And indeed, you should not maintain such a file manually but generate it from a manageable source. In our system we have an application where we can enter these messages (this app is also used for localising of UI strings). That may be over the top for you, but at very least you should use something like Excel to maintain the messages and generate the INSERT statements from that source.

Internals of SqlEventLog

So far we have looked at the functionality in SqlEventLog. I will now make a tour of what's on the inside, working from top to bottom, starting with catchhandler_sp. I'm not showing the full code here, but you find the complete listing in catchhandler_sp.sql. (This link, as well as all other links to code in this article, opens in a separate window/tab.)

The first thing to note is this line:

IF @@trancount > 0 ROLLBACK TRANSACTION

That is, if there is a active transaction, catchhandler_sp rolls back that transaction.

Next, the procedure retrieves data from the error_xxx() functions followed by this piece of code:

IF isnull(@errproc, '') NOT IN (N'catchhandler_sp', N'sqleventlog_sp',
                                N'slog.catchhandler_sp', N'slog.sqleventlog_sp') OR
   @linenum = 0
BEGIN
   EXEC slog.sqleventlog_sp @procid, @msg, @raiserror = 0, @msgid = NULL,
                            @severity = @errsev, @errno = @errno,
                            @errproc = @errproc, @linenum = @linenum

That is, this is where catchhandler_sp logs the message to sqleventlog, unless the error was raised by catchhandler_sp itself (that is, it was a reraise) or by sqleventlog_sp (in which case it has been logged already). But there is an exception to the exception: if the error occurred on line 0, this is indicates an error that occurred when a stored procedure was called or exited. For instance, this would happen if you call sqleventlog_sp and leave out any of the mandatory parameters. Since this is an error of its own, it should be logged in full.

(The reason for the check both with and without schema name is that depending on SQL Server version error_procedure() may or not include the schema. See the section error_message() & co in Part Two.)

Note that when catchhandler_sp calls sqleventlog_sp, it passes @raiserror as 0, because catchhandler_sp raises the message itself, since it augments the message. You can also see that @msgid is explicitly passed as NULL. But if you prefer, you could change this and use a generic message id like SqlError.

The rest of this IF branch is the code that augments @msg with the procedure name and the line number, which I don't show here. A little more interesting is the code for the ELSE to the IF above where I extract the value for @errno as an output parameter:

ELSE IF @msg LIKE '{[0-9]%}%' + @crlf + '%' AND
        charindex('}', @msg) BETWEEN 3 AND 11
BEGIN
   SELECT @temperrno = substring(@msg, 2, charindex('}', @msg) - 2)
   IF @temperrno NOT LIKE '%[^0-9]%'
      SELECT @errno = convert(int, @temperrno)

The idea is that if a message starts with a brace followed by a digit and there is then a closing brace and later a line break, the odds are very good that this is a reraised error message from which I want to extract the error number. I have further safety precautions to avoid a conversion error. There is also code to extract the error message, that is, the text after the line break.

Note: the above does not use try_convert, as I wanted the code to run on SQL 2005/2008. But for SQL 2012 and later the ultimate safety net would be:

SELECT @errno = isnull(try_convert(int, @temperrno), @errno)

Next part in the code is a CATCH handler, which I hope will never fire. But in case it does, I attempt to retain the original error message as well as the error that caused the CATCH handler to fire:

SELECT @newerr = error_message(), @reraise = 1

SELECT @msg = CASE WHEN @newerr LIKE 'slog.sqleventlog_sp%'
                   THEN @newerr
                   ELSE 'slog.catchhandler_sp failed with ' + @newerr +
                        @crlf + 'Original message: ' + @msg
              END

There is a similar CATCH handler in sqleventlog_sp which explains why I check for that procedure name. Note here that I force @reraise to 1, because this error should absolutely be displayed as it is unanticipated.

In Part One, I said that you should never have any code after the final END CATCH in your procedure and I emphasised never. But SqlEventLog is the exception that proves the rule. After all, it is quite pointless to reraise the error inside the TRY block... So here is the code after END CATCH:

IF @reraise = 1
BEGIN
   IF @errsev > 18 SELECT @errsev = 18
   RAISERROR('%s', @errsev, @state, @msg)
END

The severity is adjusted, since severity 19 requires WITH LOG that plain users do not have permission to use. To make sure that per-cent characters in the error message do not cause any problem, I use the technique where the message argument to RAISERROR is a single parameter marker, %s, and I pass the actual message as a parameter to that marker.

Here I used RAISERROR. But for SQL 2012 and later, you may ask: why not ;THROW? It would certainly have been nice to be able to use the parameterless version of ;THROW here, as in that case there would have been no need to augment the error message, and in the case of multiple error messages, no message had been lost. Alas, ;THROW without parameters is syntactically only permitted between BEGIN CATCH and END CATCH. So the choice is between RAISERROR and ;THROW with parameters.

The chief reason that I chose RAISERROR for the implementation in this article is that I wanted to have the same code for all versions of SQL Server as far as possible. Additionally, there are two other miniscule advantages with RAISERROR over ;THROW:

With ;THROW, I would have to augment the message with original severity level as well, since ;THROW always uses severity level 16.
With ;THROW, I would have to find a different way to handle per-cent characters in the message. (Which is not too difficult to handle with the replace() function.)

On the other hand, you could argue that there is a big advantage with using ;THROW, since it always aborts the batch, and since catchhandler_sp always rolls back any transaction, there is no risk for orphaned transactions. Furthermore, since the batch is aborted, there will never be any noise messages about trancount mismatch. That is, effectively your CATCH handlers could be reduced to read:

BEGIN CATCH
   EXEC slog.catchhandler_sp @@procid
END CATCH

Caveat: I have not actually tested how well this works!

Let's now turn to sqleventlog_sp, of which the full listing is in sqleventlog_sp.sql. The first task is to expand the parameter holders in @msgtext with the values in @p1 to @p6. There is a fallback if the caller has accidently passed a NULL value:

IF @msgtext IS NOT NULL
BEGIN
   EXEC slog.expand_parameter_sp @msgtext OUTPUT, 1, @p1, @str1 OUTPUT
   EXEC slog.expand_parameter_sp @msgtext OUTPUT, 2, @p2, @str2 OUTPUT
   EXEC slog.expand_parameter_sp @msgtext OUTPUT, 3, @p3, @str3 OUTPUT
   EXEC slog.expand_parameter_sp @msgtext OUTPUT, 4, @p4, @str4 OUTPUT
   EXEC slog.expand_parameter_sp @msgtext OUTPUT, 5, @p5, @str5 OUTPUT
   EXEC slog.expand_parameter_sp @msgtext OUTPUT, 6, @p6, @str6 OUTPUT
END 
ELSE
   SELECT @msgtext = 'NO MESSAGE PROVIDED!'

The procedure expand_parameter_sp is mainly an exercise in using the convert() function in T‑SQL and I don't show the code here, but you can find it in the file expand_parameter_sp.sql. As this procedure makes use of features added in SQL 2008, there is a separate version for SQL 2005 in expand_parameter_sp_2005.sql.

After this step, I retrieve the values for the appname, hostname and username columns in sqleventlog, which is not particularly exciting. Instead we jump to the code where the bucket is passed to the next procedure:

IF @@trancount = 0
BEGIN
   EXEC slog.log_insert_sp @logid OUTPUT, @msgid, @errno, @severity,
                           @procid, @msgtext, @errproc, @linenum,
                           @username, @appname, @hostname,
                           @str1, @str2, @str3, @str4, @str5, @str6
END

log_insert_sp is the procedure that actually inserts data into sqleventlog and sqleventlogparameters. If there is no transaction in progress, sqleventlog_sp calls log_insert_sp directly, else it uses the loopback mechanism which I will discuss in the next section. I will return to log_insert_sp in a moment, but let's first complete our journey through sqleventlog_sp.

The next step is to handle the @msgid parameter, which is only needed if we are to raise any error at all. Whence this code:

IF @raiserror = 1 OR (@raiserror IS NULL AND @severity >= 11)
BEGIN
   DECLARE @userlang smallint,
           @syslang  smallint,
           @usermsg  nvarchar(2048)

   SELECT @raiserror = 1

   IF @msgid IS NOT NULL
   BEGIN

I'm not showing the code to determine @userlang and @syslang; as I discussed above, I do this in a makeshift way to keep the demo down in size. Instead we jump to the lookup in usermessages, so that you can see how I have implemented the fallback to get the message in the system language, if there is no message in the user's own language:

SELECT TOP 1 @usermsg = msgtext
FROM   usermessages
WHERE  msgid = @msgid
  AND  lcid IN (@userlang, @syslang)
ORDER  BY CASE lcid WHEN @userlang THEN 1 WHEN @syslang THEN 2 END

Once I have retrieved a message for the user, I go on a new round of parameter expansion with expand_parameter_sp and the TRY block closes with the mother of all fallbacks:

IF @usermsg IS NULL
   SELECT @usermsg = @msgtext

That is, if @msgid did not map to any message at all, use the contents of the @msgtext parameter.

The procedure then closes in the same way as catchhandler_sp. There is a CATCH block for the unthinkable and the RAISERROR statement comes after the CATCH block. In sqleventlog_sp, the severity is capped at 16, since 17 and up is for SQL Server resource errors only. sqleventlog_sp uses RAISERROR WITH NOWAIT, in case you want to use sqleventlog_sp for diagnostic messages for something long-running. This is not very common, so you can argue that this should be a parameter. Then again, it is not likely to cause any problems to always have WITH NOWAIT.

This takes us to log_insert_sp, for which you find the full listing in log_insert_sp.sql. This is a very plain procedure that performs the insertion into the two tables:

INSERT slog.sqleventlog(msgid, errno, severity, logproc,
                        msgtext, errproc, linenum,
                        username, appname, hostname)
   VALUES(@msgid, @errno, coalesce(@severity, 16), @logproc,
          coalesce(@msgtext, 'NO MESSAGE PROVIDED'), @errproc, @linenum,
          coalesce(@username, SYSTEM_USER), @appname, @hostname)

SELECT @logid = scope_identity()

INSERT slog.sqleventlogparameters(logid, paramno, value)
   SELECT @logid, i, p
   FROM   (SELECT 1, @p1 UNION ALL SELECT 2, @p2 UNION ALL
           SELECT 3, @p3 UNION ALL SELECT 4, @p4 UNION ALL
           SELECT 5, @p5 UNION ALL SELECT 6, @p6) AS V(i, p)
   WHERE  p IS NOT NULL

The procedure takes precautions to never insert a NULL value in non-nullable columns. The procedure nevertheless has a CATCH handler, but hopefully it will never be executed. Just before there END TRY, there is a mysterious IF statement with a SELECT of @logid. We will come back to that when we look at the loopback for Azure SQL Database.

Now, if you have paid very close attention, you may wonder what that variable @logproc comes from. Yes, I skipped the first statement in log_insert_sp:

DECLARE @logproc nvarchar(257)
EXEC slog.translate_procid_sp @logprocid, @logproc OUTPUT

The procedure translate_procid_sp looks up @logprocid (which corresponds to the @procid parameter to sqleventlog_sp) and returns the procedure name. The tricky part with this procedure is not so much the code itself:

SELECT @procname = CASE WHEN s.schema_id > 1
                        THEN s.name + '.'
                        ELSE ''
                   END + o.name
FROM   sys.objects o
JOIN   sys.schemas s ON o.schema_id = s.schema_id
WHERE  o.object_id = @procid

No, what we need to keep in mind is that plain users will be calling SqlEventLog, and they may not have permission to see all metadata. They do have access to sys.objects and sys.schemas as such, but there is no guarantee that they can read all rows in these catalog views. For this reason translate_procid_sp.sql also includes code to sign the procedure with a certificate, create a user from that certificate and grant that user VIEW DEFINITION on database level. This ensures that the procedure is always able to read the data. For a detailed description on using certificate signing to attach permissions to stored procedures, see the article Packaging Permissions through Stored Procedures on my website.

Doing the Loopback

One of the key features of SqlEventLog is that if you log a message inside a transaction, the message is persisted even if the transaction is rolled back. This can be useful if you want to log debug messages during a long-running transaction. It also helps to diagnose a problem if an error was logged before the transaction was rolled back.

The ideal solution for this would be autonomous transactions, something which is available in competing products, but not in SQL Server. (There is a long-standing enhancement request from Anthony Bloesch calling for this feature.) However, you can emulate an autonomous transaction by making a new connection back to SQL Server (i.e. a loopback), and this way you can enter data outside your transaction. In this section I will describe how to implement the loopback through a CLR stored procedure. I will also discuss an alternate solution that uses a linked server and an alternative one for Azure SQL Database. I will start with the CLR solution which in my opinion is the best choice.

Note: it's a challenge to write about using the CLR, i.e. .NET, in SQL Server in an article which mainly is about T‑SQL. Some readers are well-acquainted with SQLCLR, whereas others hardly know about it at all. This is not the place to give a complete introduction to write .NET code in SQL Server, but I will try to describe as much as needed to keep the novice readers afloat. However, I have no choice but to assume that you have some knowledge of C#.

In sqleventlog_sp, there is this code which is executed if @@trancount is > 0:

DECLARE @dbname sysname
SELECT @dbname = db_name()

EXEC slog.loopback_sp @@servername, @dbname,
                      @logid OUTPUT, @msgid, @errno, @severity,
                      @procid, @msgtext, @errproc, @linenum,
                      @username, @appname, @hostname,
                      @str1, @str2, @str3, @str4, @str5, @str6

The parameter list is the same as for log_insert_sp, except that there are two more parameters in the beginning, @@servername and @dbname. loopback_sp is implemented in C#, and you find the source code in the file slog_loopback.cs. The top of the file looks like this:

using System;
using System.Data;
using System.Data.SqlClient;
using System.Data.SqlTypes;

[assembly: System.Security.AllowPartiallyTrustedCallers]

public partial class SqlEventLog {
   [System.Security.SecuritySafeCritical]
   [Microsoft.SqlServer.Server.SqlProcedure]
   public static void slog_loopback(
          String     server,
          String     dbname,
      out SqlInt64   logid,
          SqlString  msgid,
          SqlInt32   errno,
          ...
          SqlString  p6)
   {

First there is the regular set of using clauses and a class declaration that you can expect in a C# file. There is an assembly attribute and two method attributes. The method attribute

 [Microsoft.SqlServer.Server.SqlProcedure]

states that this method is a stored procedure in SQL Server. This is followed by the declaration of the method itself. For the first two parameters server and dbname, I use the standard C# type. For all other parameters I use the special Sqlxxx classes from the System.Data.SqlTypes namespace that are able to handle NULL values.

You can see here that p1 to p6 are SqlString. So why aren't they anything that maps to sql_variant? There is no specific class in for sql_variant in SqlTypes, but according to Books Online you can use System.Object for sql_variant. However, when testing on SQL 2008, I found that passing money and smallmoney values produced a TDS error while binary values were always passed as NULL. When testing on SQL 2012, I found that the issue with money and smallmoney had been addressed, but binary values were still passed as NULL. I filed a bug for the problem with binary values. I have not investigated what the situation is in SQL 2014 and later versions, but the bug was closed as External.

The first piece of code is where I set up the connection, and here are some important things to note:

SqlConnectionStringBuilder cstring = new SqlConnectionStringBuilder();
cstring.DataSource = server;
cstring.InitialCatalog = dbname;
cstring.IntegratedSecurity = true;
cstring.Enlist = false;
using (SqlConnection cnn = new SqlConnection(cstring.ConnectionString))

First thing to observe is that I use the class SqlConnectionStringBuilder rather than just forming the string by concatenation. This is a cleaner way to build a connection string, and it's essential to use this class when things like server and database are parameters. While very unlikely, they could include a semicolon in the name which would cause problems.

You may ask why server and dbname are parameters? Doesn't the C# stored procedure know in which database it is executing? And, by the way, the server could be left out, doesn't it default to "(local)"? Yes, if you leave out the server, the connection will be made to the local default instance, but you may be running in a named instance... And while the C# code could retrieve the name of the server and the database by running SELECT @@servername, db_name() on the context connection, it seemed simpler to have sqleventlog_sp to retrieve those values before calling loopback_sp.

I use integrated security for the loopback, and the net effect of this is that the loopback connection is always performed by the service account for SQL Server.

This line is extremely critical:

cstring.Enlist = false;

Without it, SqlClient will extend the current transaction to the new connection, thinking that we want a distributed transaction – the exact opposite of what we want.

The rest of the C# code sets up the call and defines all the parameters to log_insert_sp and this nothing I take up space with here.

The C# code has to be added to SQL Server, and the way to do that is to create an assembly. An assembly in SQL Server is always created with a certain permission set. The default is SAFE which you use for assemblies that perform all operations in .NET and do not access resources outside SQL Server. This may sound as a fit for the loopback procedure, but to make the loopback connection, we have to go outside SQL Server if only to come back. For this reason, the loopback assembly requires the permission set EXTERNAL_ACCESS and creating an assembly with this permission calls for some extra steps. The first is that when you compile the C# file you need to sign the assembly with a key, which you do like this:

SN -k keypair.snk
CSC /target:library /keyfile:keypair.snk slog_loopback.cs

SN presents itself as .NET Framework Strong Name Utility, and SN -? reveals that it can do quite a few things, but the only option of interest to us is ‑k which creates a pair of a public and a private key. By including the /keyfile option when compiling the DLL, you sign the assembly with the key. (In .NET parlance this creates a strong name, a term which I find more confusing than helpful.)

All modern Windows installations ship with CSC, the C# compiler (to be found in C:\Windows\Microsoft.NET\Framework\v2.0.50727), but to get access to SN you need to install Visual Studio or the .NET Framework SDK, the latter is a free download.

Note: Rather than compiling on the command line, you can build and deploy the assembly from Visual Studio. However, I am not position to describe how to do this, because Visual Studio goes over my head.

Next step is to create the assembly, and the code for this is in the file slog_loopback_assem.sql. This file consists of two parts. The first part operates in the master database and the purpose is to load the key in keypair.snk as an asymmetric key in SQL Server:

USE master
go
DECLARE @sql      nvarchar(MAX),
        @password char(40)
SELECT @password = convert(char(36), newid()) + 'Ab4?'

SELECT @sql = 'CREATE ASYMMETRIC KEY slog_loopback FROM FILE = ' +
              '''$(CD)\keypair.snk'' ' +
              'ENCRYPTION BY PASSWORD = ''' + @password + ''''

PRINT @sql
EXEC(@sql)

When you create a key in SQL Server it must be protected by something, another key or a password. We use a password, but in this case there is no reason to track the password – if you were to replace the assembly, you would generate a new key and drop the old one. For this reason I create the password as a GUID and add some extra characters from different groups to ensure that I pass the complexity test for passwords in Windows. Because CREATE ASYMMETRIC KEY does not accept a variable for the password, I need to use dynamic SQL. The notation $(CD) looks sort of odd here, but this is an SQLCMD variable for the current directory and it is set in build_sqleventlog.bat.

Once the key has been created, I create a login from the key and grant this key the permissions EXTERNAL ACCESS ASSEMBLY and UNSAFE ASSEMBLY.

CREATE LOGIN slog_loopback$asymkey FROM ASYMMETRIC KEY slog_loopback
GRANT EXTERNAL ACCESS ASSEMBLY TO slog_loopback$asymkey
GRANT UNSAFE ASSEMBLY TO slog_loopback$asymkey

You may be concerned by a login created on your server, but don't worry. It is not a login that can log in; it is only a container where to put the permissions, as keys cannot be granted permissions directly. The use of a dollar character in the login name is just a private convention I have embraced; there is no secret meaning with it.

In the second part of the file I return to the target database and load the assembly itself:

USE $(SQLCMDDBNAME)
go
CREATE ASSEMBLY slog_loopback FROM '$(CD)\slog_loopback.dll'
WITH PERMISSION_SET = EXTERNAL_ACCESS

Because the assembly is signed with the key in master, the assembly is granted the access level. If you would try to create the assembly without the key, creation would fail with this error:

Msg 10327, Level 14, State 1, Line 1

CREATE ASSEMBLY for assembly 'slog_loopback' failed because assembly 'slog_loopback' is not trusted. The assembly is trusted when either of the following is true: the assembly is signed with a certificate or an asymmetric key that has a corresponding login with UNSAFE ASSEMBLY permission, or the assembly is trusted using sp_add_trusted_assembly.

On SQL 2016 and earlier the text is different, but the gist is more or less the same. When the server configuration option clr strict security is on, and this is the default starting with SQL 2017, all assemblies are considered unsafe, and thus the permission UNSAFE ASSMEBLY is needed. With CLR strict security turned off, the permission EXTERNAL ACCESS ASSEMBLY is suffiicient, as that the permission set of the assembly is EXTERNAL ACCESS. To keep things simple, I have included both permissions in the script file.

To complete the creation of loopback_sp, there is one more file, loopback_sp.sql, of which the essential parts look like this.

CREATE PROCEDURE slog.loopback_sp
                 @server    sysname,
                 @dbname    sysname,
                 @logid     bigint OUTPUT,
                 ...
                 @p6        nvarchar(400)
AS EXTERNAL NAME slog_loopback.SqlEventLog.slog_loopback

That is, a parameter list that matches the C# method, and instead of the body there is an EXTERNAL NAME clause which consists of a three-part name: assembly, class and method.

Finally, to be able to run user-created CLR code, the server must be configured to permit this:

EXEC sp_configure 'clr enabled', 1
RECONFIGURE

The default setting for this setting is 0, but there rarely any reason to disallow execution of CLR code.

Before we leave the theme of the CLR, I need to mention that there is one restriction with the CLR loopback: If the SQL code that calls the loopback itself has been invoked from a CLR stored procedure in a SAFE assembly, the loopback will fail. I discuss this issue further and suggest a workaround in the CLR Appendix.

Doing the Loopback with a Linked Server

In sqleventlog_sp.sql there is this code that is commented out:

DECLARE @spname nvarchar(200) = 'LOOPBACK.' + quotename(@dbname) +
                                '.slog.log_insert_sp'
EXEC @spname @logid OUTPUT, @msgid, @errno, @severity,
             @procid, @msgtext, @errproc, @linenum,
             @username, @appname, @hostname,
             @str1, @str2, @str3, @str4, @str5, @str6

This is the second way to do the loopback: use a linked server, and here I have hardcoded the name to be LOOPBACK. Since I need to make sure that I come back to the same database, I form the procedure name dynamically and make use of that EXEC accepts a variable for the procedure name.

You find definition for the linked server LOOPBACK in the file loopback_svr.sql of which the full contents is:

IF EXISTS (SELECT * FROM sys.servers WHERE name = 'LOOPBACK')
   EXEC sp_dropserver 'LOOPBACK'
EXEC sp_addlinkedserver 'LOOPBACK', '', 'SQLNCLI', @@servername
EXEC sp_serveroption  'LOOPBACK', 'remote proc transaction promotion', 'false'
EXEC sp_serveroption  'LOOPBACK', 'rpc out', 'true

That is, I first set up the linked server itself. SQL Server will map SQLNCLI to the matching version of SQL Server Native Client. Next I set two server options. rpc out is needed to permit the procedure call. The very critical option is remote proc transaction promotion. This option has the same effect as cstring.Enlist = false in the C# procedure: it prevents the new connection from becoming part of the active transaction.

At this point, the reader probably says to himself Gosh! That was so much easier than all mumbo-jumbo with C#, keys, permissions etc! I will use a linked server instead! However, I advise you to hang on, and you may find that the linked server is not that simple after all. The first thing to note is that if you are on SQL 2005, you don't have much of a choice. The option remote proc transaction promotion was added in SQL 2008, so on SQL 2005 you must use the CLR loopback.

Next point worth considering is that with the CLR solution, the loopback connection is performed by the system account for SQL Server which is sysadmin. With the linked server, the loopback is performed in the context of the current user, which must have permission to run log_insert_sp.

A big concern is if you engage in impersonation somewhere, that is, you use EXECUTE AS, either as a statement or as clause in a stored procedure. In this case, the loopback over the linked server no longer works. Consider this batch where I create login and a user which I for the purpose of the demo grant permission on sqleventlog_sp. Then I go on to impersonate that user.

CREATE LOGIN YoMama WITH PASSWORD = 'Managua'
CREATE USER YoMama
GRANT EXECUTE ON slog.sqleventlog_sp TO YoMama
go
EXECUTE AS LOGIN = 'YoMama'
BEGIN TRANSACTION
EXEC slog.sqleventlog_sp @@procid, 'This is a loopback message'
ROLLBACK TRANSACTION
REVERT
go
DROP USER YoMama
DROP LOGIN YoMama

The output from this is:

Msg 50000, Level 16, State 1, Procedure sqleventlog_sp, Line 184

slog.sqleventlog_sp failed with Linked servers cannot be used under impersonation without a mapping for the impersonated login.

Original message: This is a loopback message

Msg 266, Level 16, State 2, Procedure sqleventlog_sp, Line 0

Transaction count after EXECUTE indicates a mismatching number of BEGIN and COMMIT statements. Previous count = 1, current count = 0.

Msg 3903, Level 16, State 1, Line 7

The ROLLBACK TRANSACTION request has no corresponding BEGIN TRANSACTION.

You may find this to be an edge case, but SqlEventLog is intended to be a general solution to be used in many applications. Sooner or later you will have code that uses impersonation – for instance, you may start using Service Broker where EXECUTE AS is mandatory for activation procedures. As the error message suggests, you can work around the issue by setting up a login mapping, but 1) then you need to know what logins that are being impersonated beforehand, 2) login mappings requires that you enable SQL Server authentication for the server.

The CLR solution is more difficult to set up, but once it is in place it will continue working in all situations. (Well, almost. You cannot use it in doomed transactions, where the linked server may work. I have actually not tested the latter.)

SqlEventLog on Azure SQL Database

If you are on Azure SQL Database, you cannot use neither the CLR nor a linked server for the loopback, since none of these features are available. Daniel Verzellesi faced this problem, and he was kind to share his solution with me.

Note: This section only applies to Azure SQL Database, and not to Azure SQL Managed Instance. On Managed Instance, you should be able to use the CLR solution. I have however not tested, since I don't have access to any Managed Instance.

To load this solution, you would use the file build_sqleventlog_azure.bat. There are a multitude of ways to authenticate to Azure SQL. I have geared the file for SQL authentication, and the configuration part looks like this:

SET SERVER=xxxxx.database.windows.net
SET DATABASE=
SET USER=
SET PASSWORD=

If you prefer to use Azure Active Directory or some other authentication option, just change the BAT file accordingly.

Note: when you run build_sqleventlog_azure.log, you wil get an error message when loading slog_schema.sql that the USE statement is not supported. This error is benign, and you can ignore this error. (The script wants to move to master to perform some cleanup related to the assembly, but these actions do not need to be performed on Azure SQL Database anyway.)

This BAT file does not load the files related to the assembly, but instead it loads loopback_azure.sql which includes the Azure-specific loopback. This file consists of two parts, a configuration part and a stored procedure.

The configuration part creates three things:

A database user LoopbackUser with a random password. This user is granted EXECUTE permission on log_insert_sp, but no other permissions.
A database scoped credential LoopbackCredential that maps to LoopbackUser.
An external data source Loopback that uses the credential.

When you re-run the script, these three items are dropped and re-created. The script uses dynamic SQL, and I am not showing that code, but here is an example of code that is generates:

CREATE USER LoopbackUser WITH PASSWORD = 'AE498EB5-7DC8-48FD-A66F-16023A43D8AFAa1+'
GRANT EXECUTE ON slog.log_insert_sp TO LoopbackUser

CREATE DATABASE SCOPED CREDENTIAL LoopbackCredential
      WITH IDENTITY = 'LoopbackUser', 
      SECRET = 'AE498EB5-7DC8-48FD-A66F-16023A43D8AFAa1+'

CREATE EXTERNAL DATA SOURCE Loopback WITH (
   TYPE = RDBMS,
   LOCATION = 'NightSchool.database.windows.net',
   DATABASE_NAME = 'StEtienne',
   CREDENTIAL = LoopbackCredential)

Note: There is no way to access the FQDN for an Azure SQL Database from T‑SQL. The script assumes that the domain is database.windows.net, which is the most common domain. If your database is in a different domain, you will need to edit the file accordingly.

The second part of the script includes a T‑SQL implementation of slog.loopback_sp which uses sp_execute_remote to run the loopback. The first parameter is the name of the external data source. Remaining parameters are akin to sp_executesql. That is, the second parameter is @stmt, the statement to execute and the third is @params, with the parameter list for @stmt. Remaining values are the parameter values for the parameters in @params.

There is however, an important difference to sp_executesql here: sp_execute_remote does not support output parameters, which causes a small problem, since log_insert_sp returns @logid as an output parameter. However, it is possible to capture result sets from sp_execute_remote with INSERT-EXEC. For this reason, log_insert_sp includes this passage:

IF convert(int, serverproperty('EngineEdition')) = 5 AND USER = 'LoopbackUser'
   SELECT @logid

The call to serverproperty checks that we are in Azure SQL Database, and the explicit check on the user name is to verify that we are in the loopback. Then, and only then, should we produce this result set. (Thus, if you decide to call the user something else, you will need to change log_insert_sp accordingly.)

Permissions

As long as you use the CLR loopback, the only permission that must be granted for SqlEventLog is EXTERNAL ACCESS ASSEMBLY, which is granted to the asymmetric-key login, as discussed above in the section Doing the Loopback.

The assumption here is that you make dbo the owner of the slog schema, and that dbo owns all other stored procedures in the database too so that ownership chaining applies. However, if your database has multiple procedure owners, you will need to grant EXECUTE permissions on the public interface procedures slog.sqleventlog_sp and slog.catchhandler_sp.

If you use the linked-server loopback rather than the CLR loopback, you will need to grant users EXECUTE permission on slog.log_insert_sp because ownership-chaining does not apply over the loopback connection.

I feel obliged to add one caveat, although it may not be called for. When you use the CLR loopback, you make the loopback connection as the service account for SQL Server. And when the SQL Server connects to itself, it is always sysadmin, even if the service account has not been added as a login in sys.server_principals at all. Thus, there is no need to grant permissions on log_insert_sp. What I don't know is exactly the mechanism through which the service account becomes sysadmin, and whether it can be removed from that role. But it goes without saying that if this happens, you will need to add the service account as a user in the database and grant it EXECUTE permission on log_insert_sp. Or change the connect string to use a dedicated user for the task.

Conclusion

We have now implemented an advanced solution for logging and raising errors. Does it live up to all requirements we can ask from a robust error-handling solution as I outlined in the preceding chapter? While I have presented is better than trying to use SAVE TRANSACTION, the answer has to be negative. The main issue is that SQL Server does not check procedure calls on compile time, so the error-handling code may fail if you confuse the parameters or misspell the name of an optional parameter. Or for that matter, you forget to specify the slog schema in the call. (Although, this does at least produce an informational message about the missing procedure.) I have a tip though: if you use SQL Server Data Tools to build and maintain your schema, it will give you warnings for incorrect procedure calls. (As will my own AbaPerls.)

Furthermore, while your typical CATCH handler will have no more than three lines of code, a programmer that is dozy enough can still mess them up. And we should not forget that there is the eternal problem with compilation errors that you cannot catch in the local scope.

All and all, it is not a perfect solution, but it is a fairly good solution to be SQL Server.

In the next chapters we will look at a how to use SqlEventLog.

Common Patterns for Error Handling

In this chapter we will look at some general patterns for error handling. In the following chapter we will engage in a longer example where we will employ these patterns as well as add a few more.

The Standard Routine

As part of the jumpstart error handling in Part One I presented a standard model for error handling that you should use in the vast majority of cases. In Part One, I did not go too much into why you should follow the model, but as you have progressed through Part Two and this far of Part Three the rationale should be clear by now. However, repetition is the mother of learning, so let's do a recap now that I have introduced SqlEventLog.

This is the general pattern that you should use in your stored procedures:

CREATE PROCEDURE your_sp @par1 ... AS
SET XACT_ABORT, NOCOUNT ON 
BEGIN TRY
   -- your code goes here
END TRY
BEGIN CATCH
   IF @@trancount > 0 ROLLBACK TRANSACTION
   EXEC slog.catchhandler_sp @@procid
   RETURN 1
END CATCH

In the CATCH block you should always roll back any open transaction. Keep in mind that this standard routine is for unanticipated errors. That is, when you come here, you typically have no idea of what is going on. Therefore you should always roll back, so that you don't commit incomplete, inconsistent or incorrect data. If you forget the rollback, catchhandler_sp will take care of that for you. (But this may result in an extra error message because of the trancount mismatch.)

Next is a call to a stored procedure that reraises and potentially also logs the error. Obviously, I think you should use catchhandler_sp, but if you don't care about logging the errors, you can use error_handler_sp or something you write yourself. The important thing is that you not should start meddling with all the error_xxx() functions in every CATCH handler, because that would be code-littering.

If you are on SQL 2012 or later, and you don't care about logging the errors you can also use ;THROW:

BEGIN CATCH
   IF @@trancount > 0 ROLLBACK TRANSACTION
   ;THROW
END CATCH

Don't forget the semicolon! Since ;THROW aborts the batch, you can skip the RETURN statement when you use ;THROW, since it will never be executed. But if you use catchhandler_sp or error_handler_sp, you need the RETURN statement for two reasons:

The procedure may be called from an old-style stored procedure, which relies on @@error and return codes rather than TRY-CATCH.
The procedure may be called from a remote server, in which case a reraise will not trigger a CATCH handler in the calling server. (See Appendix 1 for details.)

Don't make the false assumption that a particular stored procedure will not be called in any of these ways. Code reuse is a virtue, and you don't know how your procedure will be used two years from now.

Let's now take a closer look at the procedure header:

CREATE PROCEDURE your_sp @par1 ... AS
SET XACT_ABORT, NOCOUNT ON 
BEGIN TRY

That is, the only line between the AS of CREATE PROCEDURE and BEGIN TRY should be

SET XACT_ABORT, NOCOUNT ON

Why XACT_ABORT should be ON is something I have hammered throughout this series of articles: to get a more consistent behaviour when errors occur and to prevent that compilation errors or attention signals leave you with an orphaned transaction. NOCOUNT has nothing to do with error handling, but having NOCOUNT ON is best practice for performance reasons. These two options should be ON as a norm. That is, every stored procedure and trigger should include commands to set these options. (User-defined functions? No, SET is not permitted in user-defined functions.)

But why put the command in that place, as the only command before BEGIN TRY, and as a single SET command and not one SET command per option, which is the customary? The idea is that the code between BEGIN TRY and END TRY should include the main business logic and this logic should not be occluded by the noise of statements that appear in every procedure. That is, by making it a single SET statement, and putting it before BEGIN TRY, the statement is there without distracting the eye of someone who is reading the code to grasp the business logic. But if you are conducting a code review focused on error handling, you know exactly where to look, and if the SET line is missing, you can flag this in your review.

Occasionally, you may have the need to produce the row counts suppressed by SET NOCOUNT ON. So what do you do in this case? Modify the header? No! You add an explicit SET NOCOUNT OFF in the main body of the procedure, because that NOCOUNT OFF is part of your business logic. The same applies, if you need to fiddle with XACT_ABORT, and we will look into this more in the next section.

As I said, the part between BEGIN TRY and END TRY should include all your main business logic. This includes variable declarations, creations of temp tables etc. You may think of these as static declarations, but CREATE TABLE is always an executable statement that could fail at run-time. And from SQL 2008, DECLARE accept variables initialisations, which makes DECLARE an executable statement that can fail as well. And most of all, that BEGIN TRY is also a piece of noise, so make it as invisible as possible and put it at the very beginning – not in the middle of your business logic. (To make it clear: I'm talking about the BEGIN TRY that wraps the entire business logic. There are situations where you need an inner TRY-CATCH for a specific piece of your code, but that inner block is part of your business logic.)

What I have covered in this section is the general pattern. As we move on and look at more extensive examples, we will see deviations from this pattern, even to the extent that we encounter procedures without TRY-CATCH at all. As we will see, there are good reasons for these deviations, but they are not very common. The main bulk of your procedures should follow the pattern presented here.

Handling Anticipated Errors

As I discussed in the section A General Philosophy on Error Handling, errors can be unanticipated or anticipated. The main focus for error handling is the unanticipated errors, because they are the ones that can cause our code do bad things if they are not handled properly. An anticipated error is when you perform an action fully knowingly that it can result in error, and you have a plan B if this happens.

In the section of general philosophy I gave one example: you write a stored procedure of which the purpose is to delete a row if there are no references to it. If there is one or more reference to the row, the procedure should instead set the column isactive to 0. If there are only two-three referencing tables, you could check each of them. But if there are thirty referencing foreign keys and still counting with new releases of the system, this becomes untenable. An alternative approach is to attempt to delete the row and if this results in an error fall back to updating the flag.

Here is an example on how to do that. You may find that the code is a little more elaborate than you expected:

CREATE PROCEDURE delete_customer @custid int AS
SET XACT_ABORT, NOCOUNT ON
BEGIN TRY
   BEGIN TRY
      SET XACT_ABORT OFF
      DELETE Customers WHERE CustomerID = @custid
      SET XACT_ABORT ON
   END TRY
   BEGIN CATCH
      SET XACT_ABORT ON
      IF error_number() = 547 AND  -- 547 = Constraint violation
         parsename(error_procedure(), 1) = 'delete_customer'
      BEGIN
         UPDATE Customers
         SET    isactive = 0
         WHERE  custid = @custid
      END
      ELSE
      BEGIN 
         IF @@trancount > 0 ROLLBACK TRANSACTION
         EXEC slog.catchhandler_sp @@procid
      END
   END CATCH
END TRY
BEGIN CATCH
   IF @@trancount > 0 ROLLBACK TRANSACTION
   EXEC slog.catchhandler_sp @@procid
   RETURN 1
END CATCH

There is a nested TRY-CATCH block here. You may think that you could put the fallback in the outer CATCH block, but that would not be right. The outer CATCH block is the standard handler for unanticipated errors, but the inner CATCH block with the fallback is part of your main business logic, and thus it should appear within the outermost TRY block. It is not only a matter of aesthetics. Who says that the UPDATE statement cannot fail? Maybe someone adds a trigger to prevent deactivation of a customer with an outstanding order. That would be an unanticipated error on your part.

Next thing to observe is how we handle XACT_ABORT ON. We don't change the header, that line reads as always:

SET XACT_ABORT, NOCOUNT ON

To be able to have a fallback for the DELETE command, we need to lower our guard and set XACT_ABORT OFF so that the constraint violation does not doom any transaction. (There is no transaction in the procedure, but the caller may have started one.) Thankfully, a constraint violation is a statement-terminating error that does not doom the transaction when XACT_ABORT is OFF. The window where XACT_ABORT is OFF should be as short as possible (to avoid orphaned transactions due to attention signals), which is why there are two SET XACT_ABORT ON. One at the end of the TRY block, and one first in the CATCH block. (If you have read the chapter on In-Memory OLTP in Part Two, you may recognise the pattern from the procedure DontLetYourMeatLoaf.)

Turning to the inner CATCH block, there is a check to verify that the error is indeed a constraint violation and nothing else. There is also a check that the error was raised in this very procedure. 547 is a general message for foreign-key and CHECK constraint violations. There could be a trigger on the table, and the error could come from that trigger, for instance a constraint violation when inserting into an auditing table. For this reason, I have added the check on the procedure name. Since error_procedure() can return a three-part name on SQL 2017 and larter, depending on how the procedure was called, I use parsename to extract only the name part.

If the error number is not 547 or it was raised elsewhere, the error is unanticipated and we should roll back the transaction and reraise the error to get to the outer CATCH block.

In delete_customer there is only a single action, so the nested TRY blocks and the SET XACT_ABORT commands may seem a bit too much. But imagine that this operation is part of a longer procedure:

CREATE PROCEDURE delete_customer_and_more @custid int AS
SET XACT_ABORT, NOCOUNT ON
BEGIN TRY
   -- Some initial code here.

   BEGIN TRANSACTION
   
   -- Some initial updates in the transaction
   
   -- Update or deactivate customer
   BEGIN TRY
      SET XACT_ABORT OFF
      DELETE Customers WHERE CustomerID = @custid
      SET XACT_ABORT ON
   END TRY
   BEGIN CATCH
      SET XACT_ABORT ON
      IF error_number() = 547 AND  -- 547 = Constraint violation
         parsename(error_procedure(), 1) = 'delete_customer_and_more'
      BEGIN
         UPDATE Customers
         SET    isactive = 0
         WHERE  custid = @custid
      END
      ELSE
      BEGIN 
         IF @@trancount > 0 ROLLBACK TRANSACTION
         EXEC slog.catchhandler_sp @@procid
      END
   END CATCH

   -- More updating code here

   COMMIT TRANSACTION
END TRY
BEGIN CATCH
   IF @@trancount > 0 ROLLBACK TRANSACTION
   EXEC slog.catchhandler_sp @@procid
   RETURN 1
END CATCH

Now, the nested TRY block makes perfect sense. And it no longer looks funny with SET XACT_ABORT OFF in the middle of procedure. There is little reason that the rest of the code should be running with this setting.

Raising Your Own Errors

SqlEventLog offers sqleventlog_sp to raise and log errors, and thus your typical validation code would go:

IF NOT EXISTS (SELECT * FROM Customers WHERE CustomerID = @custid)  
   EXEC slog.sqleventlog_sp @@procid, 'No such customer %1', @p1 = @custid

However, you could argue if all these are true:

You don't care about the @msgid parameter.
You have no desire to continue executing and will roll back the transaction.
You have a CATCH handler which calls catchhandler_sp.

You might as well use RAISERROR instead:

IF NOT EXISTS (SELECT * FROM Customers WHERE CustomerID = @custid)  
   RAISERROR ('No such customer %d', 16, 1, @custid)

This has the small advantage that the error message raised by catchhandler_sp will include the actual procedure name and line number. If you use sqleventlog_sp, you will need to look up the procedure name in the sqleventlog table and you will have to search the procedure code for the message to find where it was raised.

In the validation examples later in this article, I will consistently use sqleventlog_sp, but I wanted to make you aware of this choice.

Retry Logic

For most errors there is little reason to try again. If it failed with a constraint violation the first time, there is all reason to believe that it will fail the next time too. But there is a small class of errors that are due to interaction with other processes where a new attempt has good chances to succeed. The by far most common of these errors is error 1205: Transaction (Process ID %d) was deadlocked on %.*ls resources with another process and has been chosen as the deadlock victim. Rerun the transaction. This occurs when two (or more) processes want to acquire locks on resources that are locked by the other process so that none of them can continue. SQL Server constantly looks for this condition, and when it finds a deadlock, it chooses one of them as the deadlock victim and injects a batch-and-transaction aborting error into that process.

Implementing retry logic for deadlocks may seem like a simple thing – just check if error_number() returns 1205 and in that case, start over from the beginning. And for a plain one-statement stored procedure it can be that simple. However, as soon as your stored procedure has a couple of statements, there can be quite a few things you need to think of, as the up-coming example will demonstrate. For the example we use this table:

CREATE TABLE FreakOut (id           int              NOT NULL PRIMARY KEY,
                       guid         uniqueidentifier NOT NULL UNIQUE,
                       Minnesota    varchar(40) NULL,
                       Kansas       varchar(40) NULL,
                       WashingtonDC varchar(40) NULL)

The procedure HelpImARock inserts data from an XML document into the FreakOut table. The procedure returns two output parameters: the lowest id used and the number of rows inserted. Given the logic of the procedure, there is a guarantee that the ids are in a contiguous range. Here is how this procedure looks like in its original shape:

CREATE PROCEDURE HelpImARock @data     xml, 
                             @firstid  int OUTPUT, 
                             @noofrows int OUTPUT AS
SET XACT_ABORT, NOCOUNT ON
BEGIN TRY
   CREATE TABLE #temp (MN  varchar(50) NOT NULL,
                       KS  varchar(50) NOT NULL,
                       DC  varchar(60) NOT NULL)

   INSERT #temp (MN, KS, DC)
      SELECT T.c.value('@Minnesota',    'varchar(50)'),
             T.c.value('@Kansas',       'varchar(50)'),
             T.c.value('@WashingtonDC', 'varchar(50)')
      FROM   @data.nodes('/FreakOut') AS T(c)

   DECLARE @guids TABLE (guid uniqueidentifier NOT NULL UNIQUE)

   BEGIN TRANSACTION

   SELECT @firstid = isnull(MAX(id), 0) + 1
   FROM   FreakOut WITH (HOLDLOCK)

   WAITFOR DELAY '00:00:10'

   INSERT FreakOut(id, guid, Minnesota, Kansas, WashingtonDC)
      OUTPUT inserted.guid INTO @guids
      SELECT @firstid + row_number() OVER(ORDER BY MN) - 1, newid(),
             MN, KS, DC
      FROM   #temp
   SELECT @noofrows = @@rowcount

   COMMIT TRANSACTION

   SELECT FO.*
   FROM   FreakOut FO
   WHERE  EXISTS (SELECT * FROM @guids g WHERE FO.guid = g.guid)
END TRY
BEGIN CATCH
   IF @@trancount > 0 ROLLBACK TRANSACTION
   EXEC slog.catchhandler_sp @@procid
   RETURN 1
END CATCH

The reader may find a few things a little odd here, but the procedure is not intended to be a showcase of best practice for this type of insert operations. It is a procedure that is designed to produce deadlocks. If two processes run this procedure in parallel, both will acquired a shared range lock on the FreakOut table to prevent a new MAX value to be added. (HOLDLOCK is the same as SERIALIZABLE) whereupon they will block each other when they try to do just that in the INSERT statement. The sole purpose of the WAITFOR is to make it simple to reproduce the deadlock by using query windows in SSMS.

Note: a proper implementation of the procedure would use the UPDLOCK hint instead to prevent deadlocks. Only one process at a time can have an UPDLOCK on a resource, so with hint the process that comes second will be blocked already on the SELECT.

To test the procedure, run this in one query window:

DECLARE @firstid int, @noofrows int
EXEC HelpImARock '<FreakOut Minnesota="Could you imagine?" 
                            Kansas="It can''t happen here"
                            WashingtonDC="You''re safe baby"/>
                  <FreakOut Minnesota="Twin cities" 
                            Kansas="Topeka"
                            WashingtonDC="Potomac"/>',
                 @firstid OUTPUT, @noofrows OUTPUT
SELECT @firstid, @noofrows

And in another window run:

DECLARE @firstid int, @noofrows int
EXEC HelpImARock '<FreakOut Minnesota="MinneMinneMinneMinneSota" 
                            Kansas="KansasKansasTotoTotoToto"
                            WashingtonDC="AC-DC AC-DC AC-DC"/>',
                 @firstid OUTPUT, @noofrows OUTPUT
SELECT @firstid, @noofrows

One of the windows will succeed, whereas the other window will produce this error message:

Msg 50000, Level 13, State 48, Procedure catchhandler_sp, Line 125

{1205} Procedure HelpImARock, Line 25

Transaction (Process ID 53) was deadlocked on lock resources with another process and has been chosen as the deadlock victim. Rerun the transaction.

Here is a second version of the procedure where I have added retry logic. The first thing to note is that the procedure has grown from 49 lines to 71 lines.

ALTER PROCEDURE HelpImARock @data     xml, 
                            @firstid  int OUTPUT, 
                            @noofrows int OUTPUT AS
SET XACT_ABORT, NOCOUNT ON
BEGIN TRY
   DECLARE @trancount_save int = @@trancount

   CREATE TABLE #temp (MN  varchar(50) NOT NULL,
                       KS  varchar(50) NOT NULL,
                       DC  varchar(60) NOT NULL)

   INSERT #temp (MN, KS, DC)
      SELECT T.c.value('@Minnesota',    'varchar(50)'),
             T.c.value('@Kansas',       'varchar(50)'),
             T.c.value('@WashingtonDC', 'varchar(50)')
      FROM   @data.nodes('/FreakOut') AS T(c)

   DECLARE @guids TABLE (guid uniqueidentifier NOT NULL UNIQUE)

   DECLARE @trycnt int = 1
   WHILE @trycnt > 0
   BEGIN TRY
      SELECT @firstid = NULL, @noofrows = NULL
      DELETE @guids

      BEGIN TRANSACTION

      SELECT @firstid = isnull(MAX(id), 0) + 1
      FROM   FreakOut WITH (HOLDLOCK)

      IF @trycnt = 1 WAITFOR DELAY '00:00:10'

      INSERT FreakOut(id, guid, Minnesota, Kansas, WashingtonDC)
         OUTPUT inserted.guid INTO @guids
         SELECT @firstid + row_number() OVER(ORDER BY MN) - 1, newid(),
                MN, KS, DC
         FROM   #temp
      SELECT @noofrows = @@rowcount

      SELECT FO.*
      FROM   FreakOut FO
      WHERE  EXISTS (SELECT * FROM @guids g WHERE FO.guid = g.guid)

      COMMIT TRANSACTION

      SELECT @trycnt = 0
   END TRY
   BEGIN CATCH
      IF @@trancount > 0 ROLLBACK TRANSACTION
      IF @trancount_save = 0 AND
         (error_number() = 1205 OR substring(error_message(), 1, 6) = '{1205}')
      BEGIN
         EXEC slog.sqleventlog_sp @@procid, 
                                  'Deadlock occurred in attempt %1, retrying',
                                  @severity = 1, @raiserror = 1, @p1 = @trycnt
         SELECT @trycnt = @trycnt + 1
      END
      ELSE
         EXEC slog.catchhandler_sp @@procid
   END CATCH
END TRY
BEGIN CATCH
   IF @@trancount > 0 ROLLBACK TRANSACTION
   EXEC slog.catchhandler_sp @@procid
   RETURN 1
END CATCH

Directly after BEGIN TRY we see this line:

DECLARE @trancount_save int = @@trancount

I save the current value of @@trancount in the beginning of the procedure. This is very important: if your procedure is called within a transaction in progress, you must never attempt a deadlock retry. The deadlock dooms the transaction and you must roll back the outer work as well. If you would redo your work only, that would be very wrong. I repeat that: very wrong.

This statement is followed by the original CREATE TABLE statement and the population of the temp table which is outside the retry scope. This is also an important point. This work cannot be redone. Or more exactly: the work cannot be redone with less than we add code to skip CREATE TABLE if the temp table exists and make sure it is empty before we reinsert. There is obviously little reason to do this here. The gist is that when you implement retry logic, you need to identify initial work that should not be reattempted. There is also a declaration of a table variable, which is seemingly innocuous in this context. The declaration is static, so if it would happen to be part of the retry scope, that would not pose a problem. ...until someone changes the table variable to be a temp table for performance reasons. Thus, handle table variables in the same way as creation of temp tables, and declare them outside the retry scope.

Next we arrive to a WHILE loop of which the body is a TRY-CATCH block:

   DECLARE @trycnt int = 1
   WHILE @trycnt > 0
   BEGIN TRY
      -- Loop init
      BEGIN TRANSACTION
      ...
      COMMIT TRANSACTION
      SELECT @trycnt = 0
   END TRY
   BEGIN CATCH
      ...
   END CATCH

It is the TRY block that defines the actual retry scope, that is, the code which is retried in case of a deadlock. Before we dive into the TRY block, let's first analyse the CATCH block. The first action is, as always, to roll back the transaction. Next, I check if the error was a deadlock, error number 1205. Since the error may have occurred in an inner stored procedure or trigger and been reraised by SqlEventLog, I also check if 1205 appears in the message text in the special format that catchhandler_sp uses. And most importantly, I check @tranccount_save, so that I don't retry only half a transaction.

If there is a deadlock, I log an informational message (@severity = 1) about the deadlock to the sqleventlog table and increment @trycnt. If the error is not a deadlock, or a retry is not permissible because of an outer transaction, the error is reraised and execution is transferred to the outer CATCH block.

Note: The call to sqleventlog_sp includes the parameter @raiserror = 1. Please be aware of that this is only for the sake of the demo! You should not use this parameter setting in production code.

Let's now look at the TRY block. What we have between BEGIN and COMMIT TRANSACTION is the logic in the original procedure with two changes. For the sake of the demo, I have added an IF statement for the WAITFOR, so that it's not executed on retries. The second change is an important one: I have moved COMMIT TRANSACTION to come after the SELECT statement. The SELECT statement could fail with a deadlock, but it would be very wrong to redo a committed transaction. Alternatively, I could have moved the SELECT statement outside of the retry scope, that is, after the WHILE loop.

There are two statements at the beginning of the loop, before BEGIN TRANSACTION:

SELECT @firstid = NULL, @noofrows = NULL
DELETE @guids

That is, I reset all local variables that are assigned in the retry scope. In this particular procedure this may not be needed for the variables @firstid and @noofrows, but I wanted to point to the principle. If you have a local variable which is incremented in the retry scope, the variable would be doubly incremented in case of a retry. It's a little more apparent with the @guids table. Say that you get a deadlock in the final SELECT statement. The transaction is rolled back – but table variables are unaffected by rollbacks, so you must clear the table variable yourself, or else the old GUIDs will stick in the table.

Now, with the improved procedure in place, run the two test cases again. Both windows will now return the two result sets from HelpImARock, but one of them will first first have this message in its output:

Deadlock occurred in attempt 1, retrying

Msg 50000, Level 1, State 1

Change the windows with the test batches, so that they are wrapped in a transaction:

BEGIN TRANSACTION 
DECLARE @firstid int, @noofrows int
EXEC HelpImARock '<FreakOut Minnesota="MinneMinneMinneMinneSota" 
                            Kansas="KansasKansasTotoTotoToto"
                            WashingtonDC="AC-DC AC-DC AC-DC"/>',
                 @firstid OUTPUT, @noofrows OUTPUT
SELECT @firstid, @noofrows
COMMIT TRANSACTION

The deadlocks are back in town, since there is no retry when there is an outer transaction.

As you see, implementing retry logic is not trivial, and there are several things to watch out for. I tried to put in as much as I could think of in this example, but it is not likely to be exhaustive. For instance, if there is a cursor in your transaction, you need to add precautions so that your cursor is restarted, and do not continue where it was when the deadlock occurred.

Another aggravating factor is that the retry logic adds litter to the code and hides the main business logic of the procedure.

But the biggest challenge is yet to be mentioned: to wit, testing. Untested retry code can lead to bugs that are very difficult to understand and to reproduce, since the system only behaves incorrectly when there is a deadlock in a certain place. I designed this example so that I would get deadlocks for the sake of the demo. But normally, you don't design for deadlocks; you (should) design so that you don't get deadlocks. So how do you test that your retry logic actually works? For a small retry scope maybe you can temporarily tweak the code to do something bad. But if your retry scope is 20 statements, can you really simulate a deadlock in all of them? In practice, you will have to rely on code review. Make sure that all statements in the retry scope are re-runnable. Make sure that the retry scope is a single transaction. Make sure that all variables, table variables and temp tables that are modified in the retry scope are re-initiated at the beginning of the loop. Make sure that there is nothing after COMMIT TRANSACTION that can deadlock. Scrutinise the use of cursors etc.

There are alternatives to littering an existing procedure with complex retry logic. One is to write a wrapper procedure which has the retry logic, and of which the retry scope is a single EXEC statement that calls the procedure that performs the real work. Since the procedure starts fresh, this evades many of the risks I have discussed. But you must still be careful that the original procedure has no data-access statements that can deadlock after updates have been committed. And of course, you must make sure that you don't retry if you are called within an active transaction.

If you want to have a general retry mechanism, the obvious place is your data-access layer in the client. I am not going into details how you would implement this, but if you are skilled developer, the biggest risk is maybe that it is too simple to do – just build a general wrapper for your data access methods. You probably understand by now that you will need to have a check that you don't have an open transaction before the call to SQL Server. But the difficult part is that you don't really know what all these stored procedures do. Let's say that there is a stored procedure that accepts data in an XML document and runs one transaction per second-level element in the XML document. (As for why you would write such a stored procedure, we will look at such an example in the next chapter.) When processing the fifth element, the procedure is chosen as a deadlock victim. If the procedure is carefully written, it may be able to determine that the first four elements have already been processed. But should a data-access layer blindly assume this?

If you have a means to discern read-only procedures from updating stored procedures, then you could have a general retry for the read-only operations. But you would be wary to have generic retry logic for unknown updating procedures, unless you can trust that very strict guidelines are in force for implementing the stored procedures. Else it is better to slip the deadlock error to the surface and let the user decide.

Leaving the deadlocks, there are a few more errors for which retry logic is conceivable. One is 3960, Snapshot isolation transaction aborted due to update conflict. You cannot use snapshot isolation to access table '%.*ls' directly or indirectly in database '%.*ls' to update, delete, or insert the row that has been modified or deleted by another transaction. Retry the transaction or change the isolation level for the update/delete statement. This error can occur if you perform an UPDATE or DELETE operation inside a transaction with the SNAPSHOT isolation level on row that was modified after the transaction started. The precautions here are similar to those for deadlock.

Another error that opens for retry logic is error 601: Could not continue scan with NOLOCK due to data movement. This error can occur when you use READ UNCOMMITTED, a.k.a. NOLOCK when another process moves a page while you are scanning. One would at least like to hope that this error can only occur in read-only procedures, so automatic retries makes more sense here.

Note: READ UNCOMMITTED is a highly abused isolation level which makes your queries prone to spuriously produce incorrect results. Consider setting the database in READ_COMMITTED_SNAPSHOT and scrap the NOLOCK hints that litter your queries.

As I discussed in Part Two, when you use In-Memory OLTP in SQL 2014, you can encounter are a few errors that calls for retry logic. See the topic Guidelines for Retry Logic for Transactions on Memory-Optimized Tables in Books Online for details. I have not worked enough with In-Memory OLTP to be able to make any recommendations myself for these errors.

Error Handling with Dynamic SQL

There is nothing radical to say here, but I figured that I should add this section in case someone would ask. There is one thing to keep in mind with dynamic SQL: syntax errors do not abort the batch and roll back the transaction even when XACT_ABORT is ON. Thus, when you work with dynamic SQL, you cannot use low-budget error handling with SET XACT_ABORT ON only, but you need to have your call to EXEC() or sp_executesql within TRY-CATCH.

As for the dynamic SQL, if you are really ambitious you could wrap your dynamic SQL code in the standard TRY-CATCH dance:

SELECT @sql = 
   'BEGIN TRY 
   ' + @sql + '
   END TRY
   BEGIN CATCH
      IF  @@trancount > 0 ROLLBACK TRANSACTION
      EXEC slog.catchhandler_sp @@procid
   END CATCH'

But I don't think I have ever done this myself. As long as there is TRY-CATCH in the surrounding procedure, that is enough.

However, I have a variation in my beta_lockinfo, which I figured is worth showing, because it is an example of handling anticipated errors. The scenario is that I have a table #objects which holds database ids, object ids and other ids that I want to translate to names. I loop over all the databases and run a UPDATE of #objects against the system catalogue in every database and this UPDATE statement is in the variable @stmt. Before I execute it, I add some error handling:

   SELECT @stmt = ' BEGIN TRY
                       SET LOCK_TIMEOUT 5
                  ' + @stmt +
                  ' END TRY
                    BEGIN CATCH
                       UPDATE #objects
                       SET    object_name = ''Error getting object name: '' +
                                            error_message()
                       WHERE  database_id = @dbidstr
                         AND  object_name IS NULL
                    END CATCH
                  '

Despite that I run with READ UNCOMMITTED, there is a risk that I get blocked (there is no guarantee that isolation levels are respected on system tables), which is why I add a SET LOCK_TIMEOUT, and in place of the object name I write the error message. In this particular context, I rather continue the procedure to display the information I have gathered than producing an error message.

Case Study: the Simple Order Database

We will now delve into a longer example which is intended to be a showcase for error handling in general in regular application code. You will recognise some of the ideas from what I have discussed so far, but I will also present some new techniques.

Overview

The scene is a very simple order system where we import XML files with orders from some external system that we have no reason to trust, so we need to make a ton of validations and I will spend quite some space on how to perform these validations, using different techniques – and levels of ambition. At the same time, the handling of unanticipated errors is important as ever.

An important business rule for this system is that if there are errors with one order, we must still import the correct orders in the file into our database. An order in itself must be fully correct, so if there is an error with a detail row on the order, we reject the order entirely. The file may contain modifications and deletions of orders sent earlier, and this is OK, as long as processing of the order has not yet commenced. When the file has been processed, there is a process that generates a response file which includes the outcome of the processing. That is, which orders that were entered/modified successfully, and which were rejected. This process is not included in the demo as such; there is only the code to generate the data this process needs.

The full script for the example is assembled in a single file which exists in two versions. There is SimpleOrderDatabase.sql which runs on SQL 2012 and later, and SimpleOrderDatabase-2008.sql which runs on all versions of SQL Server, but which is intended for SQL 2005/2008. (On SQL 2005 the script produces errors for two filtered indexes. These indexes are not crucial for the demo and the script will work fine without them. Comment them out if you like.) The difference between the SQL 2012 and the SQL 2008 versions is that the former makes use of a feature added in SQL 2012, try_convert(), in two stored procedures.

A short summary of the script is as follows:

Creates a couple of tables and loads them with test data.
Creates a number of stored procedures.
Runs a query over the initial data.
Creates two XML documents with test data and passes the documents to a stored procedure to store the documents.
Runs another procedure which processes the XML documents.
Runs a set of queries to show the result of the processing.
Finally the script drops all objects it has created, permitting you to easily change pieces and rerun.

Before you start playing with the script, you need to install SqlEventLog as it is used throughout the script. I recommend that you use tempdb as your playground.

In the text that follows, I present things in an order which I think is best for the presentation, whereas a script is organised to run without warnings about missing procedures etc. Thus, if you glance at the script while you are reading the text, you will have to jump forth and back a bit.

The Basic Tables

The script includes in total six tables. Four of them relate to the order system as such whereas the other two are for the file import. We will first take a quick look at the four tables Customers, Products, Orders and OrderDetails. The customer table is extremely basic:

CREATE TABLE Customers ( 
    CustomerID    int          NOT NULL,
    CustomerName  nvarchar(50) NOT NULL,
    DeregDate     datetime     NULL,
    CONSTRAINT pk_Customers PRIMARY KEY (CustomerID)
)

The only attribute beside the name is DeregDate. When non-NULL, this means that that the customer has closed his account on this day and no new orders may be entered for this customer. The Products table is only slightly more sophisticated:

CREATE TABLE Products (
    ProductID    int           NOT NULL,
    ProductName  nvarchar(50)  NOT NULL,
    CatalogPrice decimal(10,2) NOT NULL,
    MaxDiscount  int           NULL,
    CONSTRAINT pk_Products PRIMARY KEY (ProductID)
)

There is a catalogue price for all products. The external order may have different prices due to discounts. If the price on an order falls below CatalogPrice with more than MaxDiscount per cent, the order will be rejected. E.g., if the catalogue price is 50 and MaxDiscount is 10, the order price may not be below 45. If MaxDiscount is NULL, any price is accepted.

Here is the Orders table:

CREATE TABLE Orders (
    OrderID       int              NOT NULL IDENTITY,
    OrderGuid     uniqueidentifier NULL,
    OrderDate     datetime         NOT NULL,
    CustomerID    int              NOT NULL,
    CampaignCode  varchar(6)       NULL,
    State         char(1)          NOT NULL  -- N(ew), P(rocessing), C(ompleted)
       CONSTRAINT ck_Orders_State CHECK (State IN ('N', 'P', 'C'))
       CONSTRAINT def_Orders_State DEFAULT 'N',
    CONSTRAINT pk_Orders PRIMARY KEY (OrderID),
    CONSTRAINT fk_Orders_Customers FOREIGN KEY (CustomerID)
        REFERENCES Customers(CustomerID)
)
go
CREATE INDEX OrderGuid_ix ON Orders(OrderGuid) WHERE OrderGuid IS NOT NULL

The idea is that orders can enter the database in two ways, directly through a web application or from an external source. It is only the external orders that have an OrderGuid which is an external identifier that must be unique, something the filtered index enforces. (In SQL 2005 you would have to enforce uniqueness by some other means, not shown here.) I have an OrderDate here, but the script makes nothing interesting with it; it is always set to the date when the order is entered in the database. The campaign code is a token second settable attribute, but to keep the example down in size, there is no table that holds currently permitted campaign codes. The State column reflects whether other parts of the order system have started to work with the order.

No order is complete without products, and here is OrderDetails:

CREATE TABLE OrderDetails (
     OrderID   int            NOT NULL,
     ProductID int            NOT NULL,
     Quantity  int            NOT NULL
        CONSTRAINT ck_OrderDetails_Quantity CHECK (Quantity > 0),
     OrderPrice decimal(10,2) NOT NULL,
     CONSTRAINT pk_OrderDetails PRIMARY KEY (OrderID, ProductID),
     CONSTRAINT fk_OrderDetails FOREIGN KEY (OrderID)
        REFERENCES Orders(OrderID) ON DELETE CASCADE, 
     CONSTRAINT fk_OrderDetails_Products FOREIGN KEY (ProductID) 
         REFERENCES Products (ProductID)
)

Here I have made the ProductID part of the primary key. In a real-life system this may not work out, as the same product could be ordered with different campaign codes etc, but here I keep it simple. OrderPrice is the price per unit and the value for the row is Quantity × OrderPrice. You may note that the foreign key on OrderID is set up as cascading. That is, if you delete the order, the details go with it.

The script includes INSERT statements for sample customers, products and orders, that I do not show here. All orders in the sample are external orders with an OrderGuid.

The Interface Procedures

The order system exposes four stored procedures to manipulate external orders. One each to add, modify and delete an external order, and then there is one stored procedure that permits all three types of manipulation of a detail row. These procedures in their turn call three different ValidateXXXX procedures. I will first cover the interface procedures, and then turn to the validation procedures in the next section. In the script, however, the order is reversed to avoid warnings about missing procedures.

This is AddExternalOrder:

CREATE PROCEDURE AddExternalOrder @OrderGuid    uniqueidentifier,
                                  @CustomerID   int,
                                  @CampaignCode varchar(6),
                                  @OrderID      int OUTPUT AS
SET XACT_ABORT, NOCOUNT ON
BEGIN TRY
   IF @@trancount = 0 OR @@nestlevel = 1
   BEGIN
      EXEC slog.sqleventlog_sp @@procid,
           'This procedure must be called from a transaction.'
   END

   IF EXISTS (SELECT * FROM Orders WHERE OrderGuid = @OrderGuid)
   BEGIN
      EXEC slog.sqleventlog_sp @@procid,
           'Order %1 already exists.', @p1 = @OrderGuid
   END

   EXEC ValidateCustomer @@procid, @CustomerID

   INSERT Orders(OrderGuid, CustomerID, CampaignCode,
                 OrderDate)
      VALUES(@OrderGuid, @CustomerID, nullif(@CampaignCode, ''),
             convert(char(8), getdate(), 112))

   SELECT @OrderID = scope_identity()
END TRY
BEGIN CATCH
   IF @@trancount > 0 ROLLBACK TRANSACTION
   EXEC slog.catchhandler_sp @@procid
   RETURN 55555
END CATCH

The procedure follows the basic pattern we have established. That is, it turns on XACT_ABORT and NOCOUNT, and the rest of the procedure is wrapped in TRY-CATCH with a standard CATCH block. For this demo, I'm consistently using SqlEventLog to log and reraise all errors.

The first thing to note is that the procedure checks whether it's nested and whether there is an active transaction. This is because the procedure performs less than a transaction: there is a business rule that an order must have at least one product. Thus, the caller is supposed to call another procedure to add a detail row later, and therefore the caller needs to have a transaction. Of course, there is no guarantee that the caller will actually do that, but since there are no commit-time triggers in SQL Server, there is no way to enforce this rule, as long as header and details are added through different procedures.

Next is a check that this is not a duplicate order. Strictly speaking, this is check is not necessary. Since there is a unique index on OrderGuid, a duplicate would yield an error anyway. However, by making an explicit check we can produce a more friendly error message that will be understood on the sender side when they get back the response file.

The customer is validated, more about ValidateCustomer later, and finally the order is inserted and the order ID which is an IDENTITY column is retrieved.

Next in line is ModifyExternalOrder:

CREATE PROCEDURE ModifyExternalOrder @OrderGuid    uniqueidentifier,
                                     @CustomerID   int        = NULL,
                                     @CampaignCode varchar(6) = NULL AS
SET XACT_ABORT, NOCOUNT ON
BEGIN TRY
   EXEC ValidateOrderGuid @@procid, @OrderGuid

   IF @CustomerID IS NOT NULL
      EXEC ValidateCustomer @@procid, @CustomerID

   UPDATE Orders
   SET    CustomerID   = coalesce(@CustomerID, CustomerID),
          CampaignCode = CASE WHEN @CampaignCode IS NOT NULL
                              THEN nullif(@CampaignCode, '')
                              ELSE CampaignCode
                         END
   WHERE  OrderGuid = @OrderGuid
END TRY
BEGIN CATCH
   IF @@trancount > 0 ROLLBACK TRANSACTION
   EXEC slog.catchhandler_sp @@procid
   RETURN 55555
END CATCH

There is first a validation of the order GUID, more about that later. There is no check for an active transaction here, as it is perfectly legit to modify only the order header.

The idea is that the caller passes NULL for the customer ID in order to keep the current customer, and the same is true for the campaign code. There is a small problem with the campaign code since it is nullable: it may be desirable to clear an incorrect campaign code. This is handled by letting NULL mean "no change" while a blank string means "delete code".

Here is the short DeleteExternalOrder:

CREATE PROCEDURE DeleteExternalOrder @OrderGuid uniqueidentifier AS
SET XACT_ABORT, NOCOUNT ON
BEGIN TRY
   EXEC ValidateOrderGuid @@procid, @OrderGuid

   DELETE Orders
   WHERE  OrderGuid = @OrderGuid
END TRY
BEGIN CATCH
   IF @@trancount > 0 ROLLBACK TRANSACTION
   EXEC slog.catchhandler_sp @@procid
   RETURN 55555
END CATCH

Since there is a cascading foreign key on OrderDetails, a single DELETE is all that is needed in this procedure.

Finally we have ModifyExternalOrderDetail which is used for adding, updating or deleting rows in OrderDetails. (For the sake of legibility, a long message has been split up over two lines; it's not that way in the script.)

CREATE PROCEDURE ModifyExternalOrderDetail @OrderGuid uniqueidentifier,
                                           @ProductID int,
                                           @Quantity  int            = NULL,
                                           @OrderPrice decimal(10,2) = NULL AS
SET XACT_ABORT, NOCOUNT ON
BEGIN TRY
   DECLARE @OrderID int
   EXEC ValidateOrderGuid @@procid, @OrderGuid, @OrderID OUTPUT

   EXEC ValidateProduct @@procid, @ProductID, @OrderPrice

   IF NOT EXISTS (SELECT *
                  FROM   OrderDetails
                  WHERE  OrderID   = @OrderID
                    AND  ProductID = @ProductID)
   BEGIN
      INSERT OrderDetails(OrderID, ProductID, Quantity, OrderPrice)
         VALUES(@OrderID, @ProductID, @Quantity, @OrderPrice)
   END
   ELSE IF @Quantity IS NULL OR @Quantity <> 0
   BEGIN
      UPDATE OrderDetails
      SET    Quantity   = coalesce(@Quantity, Quantity),
             OrderPrice = coalesce(@OrderPrice, OrderPrice)
      WHERE  OrderID   = @OrderID
        AND  ProductID = @ProductID
   END
   ELSE IF (SELECT COUNT(*) FROM OrderDetails WHERE OrderID = @OrderID) > 1
      DELETE OrderDetails WHERE OrderID = @OrderID AND ProductID = @ProductID
   ELSE
   BEGIN
      EXEC slog.sqleventlog_sp @@procid,
           'Attempt to delete the last product on order %1. 
            Delete the entire order, or add another product first.',
           @p1 = @OrderGuid
   END
END TRY
BEGIN CATCH
   IF @@trancount > 0 ROLLBACK TRANSACTION
   EXEC slog.catchhandler_sp @@procid
   RETURN 55555
END CATCH

The first thing we do is to validate the order GUID and get back the internal order ID, which we need to maintain OrderDetails. The product is validated, more about ValidateProduct later. Next we check whether the product is on the order. If it's not, we add it. If the product is on the order and the quantity is non-zero, we update the quantity and price, maintaining the current value if the parameter is NULL. There is no special instruction to delete a product from an order, but the sender simply sets the quantity to zero to remove a product. We cannot permit the order to be without products, so there is a check for this.

The Validation Procedures

So far all procedures have had TRY-CATCH with the same standard CATCH handler. The validation procedures use a different pattern. Here is ValidateCustomer:

CREATE PROCEDURE ValidateCustomer @procid     int,
                                  @CustomerID int AS
   DECLARE @CustomerName nvarchar(50),
           @DeregDate    datetime

   SELECT @CustomerName = CustomerName, @DeregDate = DeregDate
   FROM   Customers
   WHERE  CustomerID = @CustomerID

   IF @CustomerName IS NULL
   BEGIN
      EXEC slog.sqleventlog_sp @procid,
           'Illegal customer id = %1.', @p1 = @CustomerID
   END

   IF @DeregDate IS NOT NULL
   BEGIN
      EXEC slog.sqleventlog_sp @procid,
           'Customer %1 (%2) was deregistered on %3.',
           @p1 = @CustomerID, @p2 = @CustomerName, @p3 = @DeregDate
   END

In this procedure, there is no TRY-CATCH, nor any initial SET commands. The thinking is that the validation procedures are never called on their own, but they are always called from another procedure, which is why can we leave it to that procedure to trap the errors we raise. Given the simplicity of this procedure we don't really expect any other run-time errors than those we raise ourselves.

Another new thing is the first parameter is @procid. If you go back to AddExternalOrder and ModifyExternalOrder, you see that they pass @@procid for this parameter. That is, in the sqleventlog table the error messages will be listed as coming from the caller, not ValidateCustomer itself. I preferred to make it this way, as this procedure is just an extraction of some checks that are common to these two procedures. I figured that it may be helpful to see directly whether the error comes from a new or modified order.

These two techniques – no TRY-CATCH and @procid as a parameter – are certainly to be considered as advanced, and when there is the slightest doubt, add a standard CATCH block. The longer the procedure is, and the more checks there are, the higher the risk that for an unanticipated error, i.e., a run-time error in a query.

As for the validations, the first check exists only to provide a more helpful message to the sender than the foreign-key violation which would be produced if the check was not there. The second check exists to enforce the rule that no orders may be placed for a deregistered customer. As a service, we include the customer name in clear text and when the customer was discontinued.

Here is ValidateProduct, which follows the same pattern (with a long message string wrapped here for legibility):

CREATE PROCEDURE ValidateProduct @procid int,
                                 @ProductID int,
                                 @OrderPrice int AS
   DECLARE @ProductName  nvarchar(50),
           @CatalogPrice decimal(10,2),
           @MaxDiscount  int

   SELECT @ProductName = ProductName, @CatalogPrice = CatalogPrice,
          @MaxDiscount = MaxDiscount
   FROM   Products
   WHERE  ProductID = @ProductID

   IF @CatalogPrice IS NULL
   BEGIN
      EXEC slog.sqleventlog_sp @procid,
           'Illegal product id %1.', @p1 = @ProductID
   END

   IF @OrderPrice < @CatalogPrice * (100 - @MaxDiscount) / 100
   BEGIN
      EXEC slog.sqleventlog_sp @procid,
           'The order price is %1 for the product "%2", but the catalog 
            price is %3 with a maximum discount of %4.',
           @p1 = @OrderPrice, @p2 = @ProductName, @p3 = @CatalogPrice,
           @p4 = @MaxDiscount
   END

This procedure follows the same pattern: no TRY-CATCH, and it accepts @@procid as a parameter. In the procedure body, we check that the product exists, and that the order price is not below the permitted discount. You could argue since there is only one procedure that looks at ProductID, i.e. ModifyExternalOrderDetail, there is no reason to have ValidateProduct. Originally, I had a separate procedure to add a product to an order, which I later scrapped. I decided to keep ValidateProduct to be consistent with the customer validation.

The final validation procedure is ValidateOrderGuid:

CREATE PROCEDURE ValidateOrderGuid @procid    int,
                                   @OrderGuid uniqueidentifier,
                                   @OrderID   int = NULL OUTPUT AS
   DECLARE @State char(1)
   SELECT @State = State, @OrderID = OrderID
   FROM   Orders
   WHERE  OrderGuid = @OrderGuid

   IF @State IS NULL
   BEGIN
      EXEC slog.sqleventlog_sp @procid, 'No such order %1.',
                               @p1 = @OrderGuid
   END
   ELSE IF @State <> 'N'
   BEGIN
      EXEC slog.sqleventlog_sp @procid,
           'Order %1 is processed/shipped and cannot be modified.',
           @p1 = @OrderGuid
   END

Again, we see the pattern with no TRY-CATCH and with @procid as a parameter. The procedure validates that the order exists and is modifiable. The procedure can also return the internal order ID if requested.

The Imported XML File

We have now looked at the order-entry procedures. Let's now look at the XML file from which we import the orders, so that we understand the tables and stored procedures for the import process itself better. Here is a sample:

<Orders Guid="9D4CAD24-F47D-4E5D-BAA5-1D60C0215DF9">
    <Order OrderID="CF4EEF4A-8F22-43ED-A2EF-B322649ECFD2" 
           Operation="Add" CustomerID="7" CampaignCode="STEVE">
       <Products>
          <Product ProductID="7" Quantity="12" Price="36" />
          <Product ProductID="8" Quantity="2"  Price="50.17" />
       </Products>
    </Order>
    <Order OrderID="BCAF1E80-D5CA-4BAD-B648-6A2B82CFB9CB" 
           Operation="Modify">
       <Products>
          <Product ProductID="4" Quantity="8"/>
          <Product ProductID="3" Quantity="0"/>
          <Product ProductID="1" Price="54.25"/>
          <Product ProductID="7" Quantity="1"  Price="40" />
       </Products>
    </Order>
    <Order OrderID="530EFABB-5AB4-4A37-90EA-BAB93F281762"
           Operation="Modify" CampaignCode="MUFFIN" />
    <Order OrderID="410D26C6-CA25-4D07-B048-E8516AD2AAAD" 
           Operation="Delete">
    </Order>
</Orders>

The root tag is Orders, and there is one attribute on this level, Guid. This GUID serves to identify the file as such. On the second level, there are one or more Order elements. These elements have a number of attributes of which two are mandatory: OrderID, which is the ID in the external system and this should be a GUID, and Operation which can be any of Add, Modify or Delete. Two more attributes are recognised: CustomerID, which is mandatory when Operation is Add, and CampaignCode. Within the Orders element there may be a Products element which its turn contains a number of Product elements. Products and Product are mandatory when Operation is Add, while they are optional when Operation is Modify. The Products element is ignored when Operation is Delete. As seen in the example the Products element can have three attributes: ProductID, Quantity and Price. All three are mandatory for Add, as well as when new products are added to the order with Modify. When an existing product in an order is modified, only ProductID is mandatory. Quantity = 0 means that the product is to be removed from the order.

To clarify: the semantics of Modify is that the external system only needs to include elements and attributes that are to be changed. There are two Modify entries in the example above. The first one adds, modifies and deletes products from the order, while the second one only adds a campaign code, which is why this entry does not have a Products element.

The Import-File Tables

There are two tables that hold information about the file import. (In the script they come with the other tables, before the validation stored procedures.) The first table tracks the files as such:

CREATE TABLE OrderImportFiles 
   (FileID   int              NOT NULL,
    FileName nvarchar(500)    NOT NULL,
    FileGuid uniqueidentifier NOT NULL,
    Status   char(4)          NOT NULL
       CONSTRAINT ck_OrderImportFiles_Status
          CHECK (Status IN ('NEW', 'WORK', 'DONE', 'SENT'))
       CONSTRAINT def_OrderImportFiles_Status DEFAULT 'NEW',
    Received datetime         NOT NULL
       CONSTRAINT def_OrderImportFiles_Received DEFAULT getdate(),
    Sent     datetime         NULL,
    CONSTRAINT pk_OrderImportFiles PRIMARY KEY (FileID),
    CONSTRAINT u_OrderImportFiles UNIQUE (FileGuid)
)
go
CREATE UNIQUE INDEX one_work_only ON OrderImportFiles(Status) 
    WHERE Status = 'WORK'

The FileID is assigned as the files are added, whereas FileGuid is taken from the Guid attribute of the Orders element in the file and the column is unique; that is, we don't permit the same file to be imported twice. We also track the name of the file for informational purposes, but we have no checks on this column. There is a Status column which reflects where the file is in the processing chain. The state transition is always as listed: First the file is NEW. At some point the file is being processed (i.e. WORK in progress). When all entries in the file have been processed, the status is set to DONE. A separate process, not included in the demo, is responsible for producing a response file, and when that file has been sent, the status is set to SENT. There is a filtered index on the Status column to enforce the rule that no more than one file may be processed at a time. Because a file may include modifications to earlier orders, the files must be processed sequentially in order. The table also includes two datetime columns to track when the file was received and when the response file was sent.

The second import table holds information about the orders in the file:

CREATE TABLE OrderImportFileItems
   (FileID          int           NOT NULL,
    RowNo           int           NOT NULL,
    Operation       varchar(100)  NULL,
    OrderGuid       nvarchar(400) NULL,
    Data            xml           NOT NULL,
    InternalOrderID int           NULL,
    Status          char(1)       NOT NULL 
       CONSTRAINT ck_OrderImportFileItems_Status
          CHECK (Status IN ('N', 'O', 'E')) -- (N)ew, O(K), E(rror)
       CONSTRAINT def_OrderImportFileItems_Status DEFAULT 'N',
    Message  nvarchar(2048)       NULL,
    CONSTRAINT pk_OrderImportFileItems PRIMARY KEY (FileID, RowNo),
    CONSTRAINT fk_OrderImportFileItems FOREIGN KEY (FileID)
       REFERENCES OrderImportFiles (FileID) ON DELETE CASCADE,
    CONSTRAINT fk_OrderImportFileItems_Orders FOREIGN KEY (InternalOrderID)
       REFERENCES Orders(OrderID) ON DELETE SET NULL
)

FileID is a foreign key to OrderImportFiles and RowNo is the second part of the primary key. RowNo is a running number. The entire Order element is saved in the Data column as an XML document to be parsed later. However, we extract the Operation and OrderID attributes already when the file is loaded and store them in separate columns, mainly to make it easier to browse the table for humans. You may note that the OrderGuid column has the "wrong" data type and both Operation and OrderGuid are nullable. I will return to why it is so. InternalOrderID is set to Orders.OrderID when a new order is successfully created in our database, so that the process that produces the return file can include it in the file. The foreign key has ON DELETE SET NULL, since the deletion of an order should not be blocked by this row.

The Status column holds the state of this order. It is N until it is processed, and if it is successfully processed, Status is set to O. If there is a problem with the order, Status is set to E, and an error message is saved in the column Message. This information is helpful for local troubleshooting. The message is also included in the return file to the external system.

The Procedure AddOrderImportFile

We will now look at the procedures that process the order files. They are written from the assumption that the external order system is able to commit every error in the book. There are two approaches we can take: validate like crazy and raise errors or be liberal in what we accept. The latter is preferable only if there is no risk for confusion or misinterpretation.

In this procedure we are fairly liberal, because the purpose of this procedure is to store the file in the tables OrderImportFiles and OrderImportFileItems. We only refuse the file if we cannot do it safely. In the script, you find this procedure directly after ModifyExternalOrderDetail. This is one of the two procedures that are different in the SQL 2012 and SQL 2008 versions of the script. This is the SQL 2012 version (with one error message wrapped for legibility):

CREATE PROCEDURE AddOrderImportFile @Contents xml,
                                    @FileName nvarchar(500) AS
SET XACT_ABORT, NOCOUNT ON
BEGIN TRY
   DECLARE @FileID      int,
           @FileGuidStr nvarchar(400),
           @FileGuid    uniqueidentifier,
           @nodecnt     int

   SELECT @FileGuidStr = T.c.value('@Guid', 'nvarchar(400)'),
          @nodecnt = COUNT(*) OVER()
   FROM   @Contents.nodes('/Orders') AS T(c)

   IF @nodecnt <> 1
   BEGIN
      EXEC slog.sqleventlog_sp @@procid,
           'Malformed or empty XML, there should be a single Orders top node.'
   END

   SELECT @FileGuid = try_convert(uniqueidentifier, @FileGuidStr)
   IF @FileGuid IS NULL
   BEGIN
      EXEC slog.sqleventlog_sp @@procid,
           'The file %1 contains an invalid file GUID "%2", 
            or the GUID is missing altogether',
           @p1 = @FileName, @p2 = @FileGuid
   END

   IF EXISTS (SELECT * FROM OrderImportFiles WHERE FileGuid = @FileGuid)
   BEGIN
      EXEC slog.sqleventlog_sp @@procid,
           'The file with GUID %1 has already been imported.',
           @p1 = @FileGuid
   END

   BEGIN TRANSACTION

   SELECT @FileID = coalesce(MAX(FileID), 0) + 1
   FROM   OrderImportFiles WITH (UPDLOCK)

   INSERT OrderImportFiles(FileID, FileName, FileGuid)
      VALUES(@FileID, @FileName, @FileGuid)

   INSERT OrderImportFileItems (FileID, RowNo,
                                Operation,
                                OrderGuid, Data)
      SELECT @FileID, row_number() OVER(ORDER BY T.c),
             upper(ltrim(T.c.value('@Operation', 'varchar(100)'))),
             T.c.value('@OrderID', 'nvarchar(400)'), T.c.query('.')
      FROM   @Contents.nodes('/Orders/Order') AS T(c)

   COMMIT TRANSACTION
END TRY
BEGIN CATCH
   IF @@trancount > 0 ROLLBACK TRANSACTION
   EXEC slog.catchhandler_sp @@procid
   RETURN 55555
END CATCH

The first check is already in the procedure header. If the file is not valid XML, the call will fail.

Next we do two validations: we count the number of top nodes and we attempt to extract the Guid attribute which is supposed to be there. Let's first look at the node count. What if the XML file looks like below, is this a problem or not?

<Orders Guid="...">
   <Order OrderID="..." ...>
   ...
</Orders>
<Orders Guid="...">
   <Order OrderID="..." ...>
   ...
</Orders>

I did test the code with XML like this, and all orders in both Orders elements were processed as expected. So why not be liberal and accept this? The problem is with the Guid attribute. The way our code is written, we would only read one of them, and it's undefined which. So while we would process the orders, the response would be wrong, because all responses would be for the same file GUID. Of course, we could fix our code to handle the situation. However, as it stands, we have agreed with the supplier that they are to send us XML documents and an XML document has a single root node; the above is an XML fragment.

Next is the validation of the Guid attribute. You may note that we extract it as nvarchar(400) and then convert this string to uniqueidentifier. By bouncing the value over a string variable, we can produce a more exact error message if the value is not correct GUID. Why the variable is overly long, is something I will return to; we will encounter more examples of this technique.

The function try_convert() returns NULL if the conversion fails. It was added in SQL 2012, and the code for SQL 2008 does instead have this:

BEGIN TRY
   SELECT @FileGuid = convert(uniqueidentifier, @FileGuidStr)
END TRY
BEGIN CATCH
END CATCH

Not only is it bulkier, there are also complications if there is an active transaction, something I will return to later.

Once the format of the GUID has been validated, I check for duplicates. I debated with myself whether I would permit the supplier to send a corrected file which would be accepted if the original file had not yet been processed, but I decided that this was out of scope for V1. However, rather than letting the UNIQUE constraint hold the axe, I perform an explicit check in order to produce a better error message.

At this point the transaction starts. I first acquire a FileID by rolling my own; since I don't expect more than a handful of calls per day, I see no reason to use IDENTITY. I insert information about the file as such into OrderImportFiles, and then I shred the document into orders. I store each order as an XML document in a row in OrderImportFileItems. I extract the operation and the order ID to their separate columns. Note this:

 upper(ltrim(T.c.value('@Operation', 'varchar(100)'))), 
 T.c.value('@OrderID', 'nvarchar(400)')

Again, I extract the values to string columns far longer than needed for the expected values. Furthermore, the columns are nullable. At this point I don't want any run-time errors due to bad data. I also take the occasion to strip any leading spaces from Operation and I convert it to all uppercase to be liberal.

The procedure does not check that the file actually includes any orders (which could be because the supplier neglected that XML is case-sensitive and put ORDER in the tag). Partly this is due to that this check did not occur to me until I had completed testing the script. But maybe that check is better part of the response process which could include a message File empty.

The procedure closes with a standard CATCH block.

Note: The issue with misspelled XML elements could be handled by using typed XML schema collections. Whether this is a good idea or not is certainly beyond the scope for this article. More importantly, untyped XML requires more validation in code which is what I'm after for the sake of the subject at hand. If you were importing a comma-separated file with BULK INSERT, you would have no choice but to perform the same validations as I do here.

The Procedure ProcessImportFiles

This is the main procedure for processing the order files. It loops over the OrderImportFiles table to handle all files with status WORK or NEW and for each file it calls the subprocedure ProcessOneImportFile. (Note that in the script, ProcessOneImportFile comes before ProcessImportFiles which is the last procedure in the pack.) A later file may have modifications to orders in an earlier file, and thus it is essential that the files are processed in the order they come in. For this reason, we do not permit multiple executions of this procedure.

Here is the procedure in its entirety:

CREATE PROCEDURE ProcessImportFiles AS
SET XACT_ABORT, NOCOUNT ON
BEGIN TRY
   DECLARE @FileID    int,
           @ret       int,
           @LockTaken int

   EXEC @ret = sp_getapplock 'ProcessImportFiles', 'Exclusive', 'Session', 0
   IF @ret < 0
   BEGIN
      EXEC slog.sqleventlog_sp @@procid,
           'At instance of this procedure is already running.'
   END

   SELECT @LockTaken = 1

   WHILE 1 = 1
   BEGIN
      SELECT @FileID = MIN(FileID)
      FROM   OrderImportFiles
      WHERE  Status IN ('WORK', 'NEW')

      IF @FileID IS NULL
         BREAK

      UPDATE OrderImportFiles
      SET    Status = 'WORK'
      WHERE  FileID = @FileID

      EXEC ProcessOneImportFile @FileID

      UPDATE OrderImportFiles
      SET    Status = 'DONE'
      WHERE  FileID = @FileID
   END

   EXEC sp_releaseapplock 'ProcessImportFiles', 'Session'
END TRY
BEGIN CATCH
   IF @@trancount > 0 ROLLBACK TRANSACTION
   IF @LockTaken = 1
      EXEC sp_releaseapplock 'ProcessImportFiles', 'Session'
   EXEC slog.catchhandler_sp @@procid
   RETURN 55555
END CATCH

The main logic of the procedure is not terribly complicated. We iterate over the files as long as there are files to process. We change the status to WORK, so that a human easily can tell from looking at the table what is going on, and once a file has been entirely processed, we change the status to DONE. There is no transaction here, because if the procedure is interrupted it will continue where it left off.

The interesting part of this procedure is how we prevent multiple executions. While maybe only tangentially related to the scope for this article, I still like to dwell on this a bit, as not all readers may be familiar with application locks, one of these lesser-known great features in SQL Server. An application lock is a lock that you take on a resource that you name yourself; it is not related to something physical like a table. You can take an application lock on transaction level, so that it is automatically released when the transaction is committed or rolled back. But you can also take an application lock on session level, meaning that you hold the lock until you disconnect or explicitly release it. This procedure uses a session lock, which is a perfect fit here. A lock on transaction level would not work for us, because we don't want a transaction here. (Since each order in the file should be processed separately.) Even better for us is that an application lock is tied to the database, so that we can use the same lock resource in two databases without conflict. Or in other words: the application lock prevents two instances of ProcessImportFiles running in the same database. But if we have two order databases in the same server they can simultaneously run ProcessImportFiles without blocking each other.

To get an application lock, we use sp_getapplock which in total takes five parameters of which the first two are mandatory. The first is the name of the lock resource which is just a string that you make up, and here I use name of the stored procedure. The parameter is nvarchar(255), but due to the display of application locks, I recommend that you don't use more than 32 characters. Beware that the name is handled a binary string, and MyLock, mylock and MYLOCK are three different lock resources. The second parameter is the lock mode, that is, whether you want a shared lock, exclusive lock or something else. In this case we want an exclusive lock, obviously. See Books Online for the full list of choices. The third parameter can be Transaction (the default) or Session, as I discussed in the previous paragraph. The fourth parameter is the time to wait for the lock to become available. The default is to wait forever (unless you have tinkered with SET LOCK_TIMEOUT). In this case, we don't want to wait at all and therefore we specify 0; if the procedure is already running we want to exit with an error message. I have left out the fifth parameter to sp_getapplock which I have never found any use for.

There are a couple of possible return values from sp_getapplock; 0 and 1 means that the lock was granted, whereas negative values means that it was not. As you see in the code above, you use sp_releaseapplock to release the application lock.

While application locks are great, there are a few small problems: say that you run the procedure from a query window and stop it with the red button. This sends an attention signal to SQL Server to tell SQL Server to stop executing and in this case the lock will not be released; you will need to remember to do that manually. The same problem can arise if the application that fires off the stored procedure is incorrectly written and employs the default query timeout of 30 seconds. A procedure like this one should not be executed with any timeout at all.

Another possible problem can be seen in the CATCH handler. Or rather cannot be seen: if the lock has been taken, the CATCH handler nicely releases the application lock. But the catch (sic!) is that a local CATCH handler will not fire in case of a scope-aborting error. This is not a major concern in ProcessImportFiles, because this procedure is short and easily tested. (Someone could drop the table OrderImportFiles, but in such case there are bigger problems than the application lock not being released.) But if you write a long complex procedure, you may have code paths with spelling errors that you somehow failed to test until you went live. So for longer procedures I would recommend that you put the handling of the application lock in an outer wrapper to remove this risk.

The Procedure ProcessOneImportFile

We now come to the final procedure in the pack, although in the script you find it before ProcessImportFiles. This is the longest procedure of them all, so I will show it piece by piece. There are also some differences between the SQL 2012 and the SQL 2008 versions.

Overview

The task of this procedure is to process all orders in a file. Some orders may have errors, but we can't permit one bad order to cause the entire file to be dropped; all correct orders should be processed. For this reason, ProcessOneImportFile runs a loop over the orders, processing them one by one. If you are used to set-based operations, you may ask if this is necessary, and the short answer is: yes. I will come back to the question with a longer answer after we have looked at the procedure.

Within an order there is a second loop to handle the products one by one since the procedure ModifyExternalOrderDetail only works with one product at time. Obviously, we could easily replace it with a set-based version, but imagine a more complex system where the corresponding procedure is 3000 lines of code. Rewriting it to be set-based could be prohibitively expensive. And for this article, I wanted an example with nested loops. As we shall see, one benefit is a slight improvement in error diagnostics.

This procedure has a different structure from all other procedures we have looked at. Here is the outline of it:

CREATE PROCEDURE ProcessOneImportFile @FileID int AS
SET XACT_ABORT, NOCOUNT ON

DECLARE ...

IF @@trancount > 0
BEGIN
   EXEC slog.sqleventlog_sp @@procid,
        'This procedure must not be called in a transaction.'
END

WHILE <More orders in this file>
BEGIN TRY
   ...
END TRY
BEGIN CATCH
   ...
END CATCH

First there are declarations of umpteen variables needed for the loops. Then there is a check that the procedure is not called from an active transaction. You may recall that in AddExternalOrder we required an active transaction. Here you see the antithesis. This is all about contract. The normal contract for a stored procedure is that an error causes a rollback of the transaction and the error is reraised to inform the caller that something went wrong. However, ProcessOneImportFile does not obey to that contract. It starts its own transactions to enter orders, and if an error occurs while processing an order, the procedure logs the error and marks the order as failed in OrderImportFileItems. Then it moves on to the next order. It never reraises any errors. Since a caller with an active transaction would not be notified that the rug has been pulled, the only reasonable contract is to outlaw all open transactions.

After this initial check, there is a WHILE loop of which the body is a TRY-CATCH block. That is, if an order is processed successfully, we mark it as OK, and move to the next order. If there is an error, we will wind up in the CATCH block where we set the status to 'E' in OrderImportFileItems. And then we move to the next order. (With one exception as we shall see later.) Because of this there is very little code outside the WHILE loop,

You may note that there is no TRY-CATCH block that encloses the entire procedure. This can be questioned since it contradicts the best practice I have preached in this article. It certainly would not be wrong to add a procedure-wide TRY-CATCH block, but since the CATCH block of the loop never reraises errors, the outer CATCH block would only trap errors before the WHILE loop and run-time errors in the inner CATCH block. For this reason, I skipped the outer CATCH block, but if there was the slightest piece of code for any sort of preparation before the WHILE (beyond the @@trancount check, that is), I would insist on that there should be an outer CATCH block.

Beginning of the WHILE Loop

Let's now look at the main meat in the procedure, that is, the code that is in the TRY block. This is how it starts out:

WHILE 1 = 1
BEGIN TRY
   SELECT @RowNo = NULL, @Operation = NULL, @data = NULL,
          @OrderGuidStr = NULL, @OrderGuid = NULL, @OrderID = NULL,
          @CustomerIDStr = NULL, @CustomerID = NULL,
          @CampaignCode   = NULL, @DetailRows = NULL,
          @ProductIDStr = NULL, @QuantityStr = NULL, @PriceStr = NULL,
          @ProductID = NULL, @Quantity = NULL, @Price = NULL

   DELETE @Products

   SELECT TOP 1 @RowNo = RowNo, @Operation = Operation, 
                @OrderGuidStr = OrderGuid, @data = Data
   FROM   OrderImportFileItems
   WHERE  FileID = @FileID
     AND  Status = 'N'
   ORDER  BY RowNo

   IF @RowNo IS NULL
      RETURN

The first thing that happens in the WHILE loop is that we initiate all variables we use in the loop, including a table variable, that I will introduce later. When you work with loops, it is important to start the loop with setting all variables to a defined value, usually NULL. If you neglect to do this, variables that retain their value from the previous iteration often result in bugs that are difficult to understand and track down.

The next step is to retrieve the data for the next unprocessed order, and I quit if there isn't any. As you can see the loop is of the type "roll-your-own", that is, not a cursor. There is a reason for this, as we will see when we come to the CATCH block.

Validations for the Paranoid

Next follows a number of validations. In this part, there are some differences between the script for SQL 2012 and later on the one hand and the script for earlier versions of SQL Server on the other. Here is the version for SQL 2012 and later:

SELECT @OrderGuid = try_convert(uniqueidentifier, @OrderGuidStr)

IF @OrderGuid IS NULL
BEGIN
   EXEC slog.sqleventlog_sp @@procid,
        'Invalid or missing GUID "%1" for order.', @p1 = @OrderGuidStr
END

SELECT @CustomerIDStr = nullif(T.c.value('@CustomerID', 'nvarchar(400)'), ''),
       @CampaignCode  = ltrim(T.c.value('@CampaignCode', 'nvarchar(100)')),
       @DetailRows    = T.c.query('Products')
FROM   @data.nodes('Order') AS T(c)

SELECT @CustomerID = try_convert(int, @CustomerIDStr)

IF @CustomerID IS NULL AND @CustomerIDStr IS NOT NULL
BEGIN
   EXEC slog.sqleventlog_sp @@procid,
        'Invalid customer id = "%1".', @p1 = @CustomerIDStr
END

IF len(@CampaignCode) > 6
BEGIN
   EXEC slog.sqleventlog_sp @@procid,
        'Incorrectly formed campaign code.', @p1 = @CampaignCode
END

First keep in mind that a call to sqleventlog_sp raises an error, which transfers execution to the CATCH block where the message text is written to the Message column in OrderImportFileItems.

There is a common theme here: paranoia. The first thing to note is that we extract all values as strings from the XML document even if the value is supposed to be an int or a uniqueidentifier. (In case of the OrderGuid, recall that we extracted it already in AddOrderImportFile.) If we were to optimistically say:

SELECT @CustomerID = T.c.value('@CustomerID', 'int')

but the actual value would not be a valid integer, this would result in an error message from SQL Server. We could catch that error message and send it back in the return file, but the sender would not easily understand which value that was malformed. So to get better diagnostics, we get the value into a string and attempt a conversion with try_convert(). If we get NULL back, something was not the way it should be.

For integers, there is a second reason to use a string. Say that the Order tag reads:

<Order OrderID="94D26EB3-48E4-499D-98B2-75FC5C85A29E" 
       Operation="Modify" CustomerID="">

The intention here is presumably to make changes to the products on the order, but leave the customer unchanged. But given this tag, what does this result in?

SELECT @CustomerID = T.c.value('@CustomerID', 'int')

You may expect this to return NULL, but it is not correct. @CustomerID is set to 0, and who knows, maybe there is a customer with this ID. So for integer data we must bounce the values over strings not only for better diagnostics, but also to avoid an incorrect interpretation of the data. By applying the nullif() function we make sure that a blank string is interpreted as NULL.

There is more to it. Not only do we extract the integer and GUID values to strings, but we also extract them to variables that are much longer than needed to fit an integer or a GUID. Or for that matter a six-character campaign code. We must consider the risk that the sender sends an overlong value or something with leading spaces, for instance any of:

<Order OrderID="94D26EB3-48E4-499D-98B2-75FC5C85A29E" 
       Operation="Add" CustomerID="123456789123">
<Order OrderID="94D26EB3-48E4-499D-98B2-75FC5C85A29E" 
       Operation="Add" CustomerID="       123456">

If we were to extract the values to the varchar(10) needed for an integer value, with some bad luck the order could be assigned to a completely different customer. This level of paranoia is mainly required for key values. Say that there had been a free-text field on the order and the corresponding database column had been nvarchar(60). For such a column, it is reasonable to be liberal and extract it as nvarchar(60) and simply truncate the value. However, for key values we should not be liberal, as a truncated or misinterpreted key value will result in an incorrect operation. Although the campaign code is not a key in this demo, I handle it as such, since in a real-world order application it would most likely be.

Finally, you may wonder why I retrieve the values as nvarchar and not varchar? This is not terribly important, but if the supplier sends Chinese characters, we might just as well send them back as such in the error message and not just a bunch of question marks which might be the consequence if we were to use varchar.

Validations for SQL 2008/2005

Here is the corresponding code for SQL 2008/2005:

BEGIN TRY
   SELECT @OrderGuid = convert(uniqueidentifier, @OrderGuidStr)
END TRY
BEGIN CATCH
END CATCH

IF @OrderGuid IS NULL
BEGIN
   EXEC slog.sqleventlog_sp @@procid,
        'Invalid or missing GUID "%1" for order.', @p1 = @OrderGuidStr
END

SELECT @CustomerIDStr = nullif(T.c.value('@CustomerID', 'nvarchar(400)'), ''),
       @CampaignCode  = ltrim(T.c.value('@CampaignCode', 'nvarchar(100)')),
       @DetailRows    = T.c.query('Products')
FROM   @data.nodes('Order') AS T(c)

BEGIN TRY
   SELECT @CustomerID = convert(int, @CustomerIDStr)
END TRY
BEGIN CATCH
   EXEC slog.sqleventlog_sp @@procid,
        'Invalid customer id = "%1".', @p1 = @CustomerIDStr
END CATCH

IF len(@CampaignCode) > 6
BEGIN
   EXEC slog.sqleventlog_sp @@procid,
        'Incorrectly formed campaign code.', @p1 = @CampaignCode
END

Since there is no try_convert() in SQL 2008, we have to resort to TRY-CATCH, which we do in two different ways. For the order GUID, the CATCH block is empty and then we check whether @OrderGuid is NULL after the CATCH block, killing two birds with one stone. That is, we check that the GUID is present and that it is correctly formed. On the other hand, for the customer ID, we log and raise the error message in the CATCH handler directly. This is because the CustomerID tag is not required for the operations Modify and Delete, so NULL could be a valid value here.

This trick with TRY-CATCH is not unproblematic. It is safe here, because we know that we are not in a transaction. If there is a transaction, it is a different matter, as we will see later.

Saving the Order Header

The next passage of the code starts a transaction that stores the values in the Orders table:

BEGIN TRANSACTION

IF @Operation = 'ADD'
BEGIN
   IF @CustomerID IS NULL
   BEGIN
      EXEC slog.sqleventlog_sp @@procid,
           'Customer not proivided for new order.'
   END

   IF NOT EXISTS (SELECT * FROM @DetailRows.nodes('/Products/Product') AS T(c))
   BEGIN
      EXEC slog.sqleventlog_sp @@procid,
           'For a new order, there must be at least one product.'
   END

   EXEC AddExternalOrder @OrderGuid, @CustomerID, @CampaignCode, @OrderID OUTPUT
END
ELSE IF @Operation = 'MODIFY'
   EXEC ModifyExternalOrder @OrderGuid, @CustomerID, @CampaignCode
ELSE IF @Operation = 'DELETE'
   EXEC DeleteExternalOrder @OrderGuid
ELSE
BEGIN
   EXEC slog.sqleventlog_sp @@procid,
        'Invalid operation %1 for order %2.', @p1 = @Operation, @p2 = @OrderGuid
END

We call the procedure that fits the operation, and for Add there are two more validations. The first check is redundant, because a missing customer ID would be trapped since Orders.CustomerID is defined as NOT NULL. However, this check gives a much prettier error message. The check that there are products on the order is in no way redundant – this is the only place where we can enforce this business rule.

If the operation is not one we recognise, this is an error that takes us to the CATCH block for the main WHILE loop.

Handling the Detail Rows

Next follows a loop to handle the products. There are some differences in this loop between the SQL 2008 and SQL 2012 implementations. We will look at the version for SQL 2008/2005 first:

DECLARE DetailsCur CURSOR STATIC LOCAL FOR
   SELECT ProductIDStr = nullif(T.c.value('@ProductID', 'nvarchar(400)'), ''),
          QuantityStr  = nullif(T.c.value('@Quantity',  'nvarchar(400)'), ''),
          PriceStr     = nullif(T.c.value('@Price',  'nvarchar(400)'), '')
   FROM   @DetailRows.nodes('/Products/Product') AS T(c)
   ORDER  BY QuantityStr DESC

OPEN DetailsCur

WHILE 1 = 1
BEGIN
   FETCH DetailsCur INTO @ProductIDStr, @QuantityStr, @PriceStr
   IF @@fetch_status <> 0
      BREAK

   SELECT @ProductID = NULL, @Quantity = NULL, @Price = NULL

   IF @ProductIDStr IS NULL
   BEGIN
      EXEC slog.sqleventlog_sp @@procid,
           'Product ID missing for detail entry.'
   END

   SELECT @ProductID = convert(int, @ProductIDStr),
          @Quantity  = convert(int, @QuantityStr),
          @Price     = convert(decimal(10, 2), @PriceStr)

   IF EXISTS (SELECT * FROM @Products WHERE ProductID = @ProductID)
   BEGIN
      EXEC slog.sqleventlog_sp @@procid,
           'Product %1 appears multiple times for the same order entry.',
           @p1 = @ProductID
   END

   INSERT @Products(ProductID) VALUES(@ProductID)

   IF @Operation = 'ADD' AND @Quantity = 0
      EXEC slog.sqleventlog_sp @@procid,
           'Quantity must be > 0 when adding a new order.'

   EXEC ModifyExternalOrderDetail @OrderGuid, @ProductID,
                                  @Quantity = @Quantity, @OrderPrice = @Price
END

DEALLOCATE DetailsCur

This time I use a cursor, rather than rolling the loop on my own as I do with the outer loop. This is mainly a matter of preference: since I'm working with each row in the data set exactly once, I can use a cursor which is the simplest way to iterate. Note that the cursor is declared as STATIC LOCAL, a setting you always should always use for cursors.

The validation of products has somewhat lower level of ambition than the validation of the header information in the order. As previously, we bounce all values over strings to avoid bad interpretations of empty attribute values. But when we convert them to the correct data type, there is no local TRY-CATCH. Whence, any illegal value will cause SQL Server will raise a conversion error and this conversion error will find its way to the return file. Why this inconsistency?

There are two reasons for this. The first reason is that this would not work:

BEGIN TRY
   SELECT @ProductID = convert(int, @ProductIDStr)
END TRY
BEGIN CATCH
END CATCH

At this point we are in a transaction, and once in the CATCH handler the transaction is doomed (XACT_ABORT is ON, and conversion errors are typically batch-aborting anyway) and when exiting the CATCH handler, SQL Server raises error 3998, Uncommittable transaction is detected at the end of the batch. The transaction is rolled back. So the CATCH block would have to include ROLLBACK TRANSACTION, making the solution even bulkier.

The other reason is that I simply wanted to demonstrate different techniques. After all, the main objective is to keep bad data out, and the unguarded convert meets that criteria. The point with trapping the error is that it permits us to produce a better error message, which may serve to reduce the need for calls to the support desk. But if you find it unlikely enough that the sending party would pass a non-numeric product ID, you may not find it worth the effort to add a targeted error message.

The loop also has a validation that the same product is not entered twice for the same order and there is a check that for a new order the quantity is > 0. Since there are constraints on the OrderDetails table to enforce these rules, these checks may seem redundant, but these checks must actually be here, because of the way ModifyExternalOrderDetail works. (A second instance of the same product would be taken as a modification.)

The reader may ask why this is a loop at all. Since the business requirement is that one bad product entry invalidates the entire order, a set-based implementation for the products certainly is an option. But as we shall see when we look at the CATCH block, the loop buys us somewhat better diagnostics, as we are able to tell which entry that caused the problem. The main reason, however, is that there were a few things I wanted to demo with nested loops.

Handling Details Rows in SQL 2012 and Later

Here is the SQL 2012 version of the loop. The level of ambition is somewhat higher here:

WHILE 1 = 1
BEGIN
   FETCH DetailsCur INTO @ProductIDStr, @QuantityStr, @PriceStr
   IF @@fetch_status <> 0
      BREAK

   SELECT @ProductID = NULL, @Quantity = NULL, @Price = NULL

   SELECT @ProductID = try_convert(int, @ProductIDStr),
          @Quantity  = convert(int, @QuantityStr),
          @Price     = try_convert(decimal(10, 2), @PriceStr)

   IF @ProductID IS NULL
   BEGIN
      EXEC slog.sqleventlog_sp @@procid,
           'Illegal product ID "%1" on detail entry.', @p1 = @ProductIDStr
   END

   IF EXISTS (SELECT * FROM @Products WHERE ProductID = @ProductID)
   BEGIN
      EXEC slog.sqleventlog_sp @@procid,
           'Product %1 appears multiple times for the same order entry.',
           @p1 = @ProductID
   END

   INSERT @Products(ProductID) VALUES(@ProductID)

   IF @Price IS NULL AND @PriceStr IS NOT NULL
   BEGIN
      EXEC slog.sqleventlog_sp @@procid,
           'Illegal format for Price. Value is "%1".', @p1 = @PriceStr
   END

   IF @Operation = 'ADD' AND @Quantity = 0
      EXEC slog.sqleventlog_sp @@procid,
           'Quantity must be > 0 when adding a new order.'

   EXEC ModifyExternalOrderDetail @OrderGuid, @ProductID,
                                  @Quantity = @Quantity, @OrderPrice = @Price
END

Thanks to try_convert(), the cost in terms of code bloat for an explicit check for a malformed product ID is lower, so I have added it in this version. I've also added an explicit check for the format of the price, but not the quantity. This goes back to the discussion above: how likely is it that a certain data element would be incorrect? I decided that there is some risk that the external system may use comma for the decimal delimiter; after all, that is the standard where I live. But that an incorrectly formed number for quantity, what would that be? Roman numerals? But, again, this is all a matter of level of ambition and what you expect.

End of the Loop

Here is the end of the outer loop, and it is the same in both versions of the stored procedure:

   UPDATE OrderImportFileItems
   SET    InternalOrderID = CASE WHEN @Operation = 'ADD' THEN @OrderID END,
          Status = 'O',
          Message = NULL
   WHERE  FileID = @FileID
     AND  RowNo = @RowNo

   COMMIT TRANSACTION

END TRY

We save the generated order ID, set the status to O and clear any error message there might be. Then we commit. END TRY marks the end of the loop.

The CATCH block

The CATCH block is a centrepiece of ProcessOneImportFile. This is where execution is diverted when an error is raised, be that because of an error raised by SQL Server or from one of our own validations. The error may originally have been raised in one of the inner procedures, but it will be propagated to this level:

BEGIN CATCH
   DECLARE @errmsg nvarchar(2048),
           @errno  int

   IF @@trancount > 0 ROLLBACK TRANSACTION

   EXEC slog.catchhandler_sp @@procid, @reraise = 0,
                             @errno = @errno OUTPUT, @errmsg_aug = @errmsg OUTPUT

   IF @errno <> 1205
   BEGIN
      IF @ProductIDStr IS NOT NULL
      BEGIN
         SELECT @errmsg = @errmsg + ' This occurred for product "' +
                                    @ProductIDStr + '".'
      END

      UPDATE OrderImportFileItems
      SET    Status = 'E',
             Message = @errmsg
      WHERE  FileID  = @FileID
        AND  RowNo   = @RowNo
   END

   BEGIN TRY
      DEALLOCATE DetailsCur
   END TRY
   BEGIN CATCH
   END CATCH
END CATCH

First there are declarations of two variables that are used only in the CATCH block. The first executable statement is the usual: roll back any open transaction. This is followed by calling catchhandler_sp, but we deviate from the regular routine by suppressing the error from being reraised by passing @reraise = 0. We also request to get back the error number and the error message augmented with the information that catchhandler_sp tacks on in the beginning.

Next we update the row for the order in OrderImportFileItems. We set the status to E and save the error message into the Message column. @ProductIDStr has a value if the error occurred in the cursor where we process the products. If there are many products on the order, it may be difficult for a human on the other end to understand where the error is, which is why we add this value to the error message. Never mind that in some cases it will be redundant as the error message already includes the product information.

I like to make a comment here, which may seem to be too much of a fine detail, but which in fact is of some importance. As you see, I have an IF statement to determine whether I should add text to @errmsg. It wasn't there originally, but instead I had the code to add the information about the product ID directly in the UPDATE statement:

Message = @errmsg + CASE WHEN @ProductIDStr IS NOT NULL THEN ...

Do you see the potential problem? If the error message is very long, the added information about the product could cause the total string to exceed 2048 characters, the size of the Message column. This would cause the CATCH handler to die with String or binary data would be truncated. I did have logic to prevent this, but it was quite complicated and untested. For the variable assignment, truncation happens silently, so the solution you see here is both simpler and safer. This is something to keep in mind when you compose strings for error messages in log tables: make sure that they don't cause truncation errors. Using a variable is often the simple way out.

So what is the purpose of this line?

IF @errno <> 1205

You may recall that 1205 is the error number for deadlock. In an earlier section, I advised you be cautious with implementing deadlock retries. But this is a situation where retry on deadlocks is a must. It would be highly embarrassing to send back an order to the sender with deadlock as the error message. Thankfully, we are in excellent position to perform the retry. We know that there was no active transaction when the procedure started executing. We already have the WHILE loop required for the deadlock retry set up with variable initialisation and all. On top of that, the code for the deadlock retry is merely three extra lines of code to avoid updating the status in OrderImportFileItems. And here is the reason the loop is not a cursor: this permits a row to be processed a second time. A final remark: I look at @errno and not error_number(); the latter could be 50000 if the deadlock occurred in a nested procedure and was reraised by SqlEventLog.

The last mission in the CATCH handler is to handle the cursor for the inner loop over products. To ensure that this cursor is removed, we deallocate the cursor inside a TRY block with an empty CATCH block to suppress the error that is raised if the cursor does not exist at this point. The reader may object that it is prettier to use the function cursor_status(). I looked at it Books Online, but I was not sure how I should interpret all the possible return values. I decided that I did not want to test all cases, so I went with this brute-force method instead.

Loops vs. Set-Based

There is a distinct drawback with running loops: performance. What if we have peek days with one million orders? Assume further that the order system and the business logic are a lot more complicated than in this simple example. It is highly unlikely that performance would be satisfactory. We all know that set-based operations is the way to go. But the problem with set-based operations is that they are all-or-nothing. If a single order fails to insert, the entire set fails, which is unacceptable given our business rules.

One way to deal with this is to load the data into staging tables, perform validations and add error information to bad rows as we encounter them and at some point write back the error messages to OrderImportFileItems. When our validations are done, we can insert safely all at once for better performance. Or can we? In our simple example we may have full control over all business rules, but in a complex real-world system, life is more difficult. New checks and rules could be added with new releases. And some checks may be so horrendously complicated that it is out of the question of duplicating them in the import procedure. Thus, in the general case, you can never exclude the risk of a set-based operation failing because of a single row. So you need a fallback. And that fallback is a loop.

So you would have both set-based logic and a loop? With duplicated logic and risk for bugs? No, combine them! There is nothing that says that a loop must be one row at a time. In many cases when you need to insert large amount of data, it is often a good idea to process data in reasonably sized batches for performance. But batching is also good for better error handling. You divide the data into batches of some size, considering the likelihood of an error (and what might be good for performance). Say that one order in 20 000 has an error that you are not able to trap by validating the data beforehand. If you divide the set into batches of 1 000 orders, statistically every 20th batch will fail. When a batch fails, you re-divide into smaller batches. Maybe you are brave and take 10 at a time. Or you fall back to a batch size of one directly. But the point is: you use the same loop no matter the batch size is 1, 10 or 1 000. And the logic inside the batch is set-based in itself.

Undoubtedly, a batching strategy like this is more complex than the one-by-one loop in the example with the simple order database, particularly if there is a requirement that items should be handled in strict order. But in many cases it is necessary because of volumes. I did not include batching to this example, as the article is already long and it would require a lot more test data. So as they say: it is left as an exercise to the reader to implement batching for the simple order database.

Testing It Out

The script includes a test section where I load two XML documents with AddOrderImportFile and then I run ProcessImportFiles to process them. In the test section there are queries to show the orders with their details before and after, as well as queries to show the contents of the tables OrderImportFiles and OrderImportFileItems. There is also a dump of the rows added to sqleventlog during the test.

I am not showing any of the testing code, but I leave it to you explore the script. Below is the resulting output of selected columns in the table OrderImportFileItems after the test. For some orders there are two different messages. The upper message is from the SQL 2012 version and the lower message is what you get on SQL 2005 and SQL 2008.

FileID	RowNo	Operation	Status	Message
1	1	ADD	O	NULL
1	2	MODIFY	O	NULL
1	3	DELETE	O	NULL
1	4	ADD	E	Illegal customer id = 17.
1	5	ADD	E	Customer 3 (Ricky Lancelotti) was deregistered on 1980-04-07.
1	6	ADD	E	Illegal product id 11. This occurred for product "11".
1	7	ADD	E	The order price is 10 for the product "Over-Nite Sensation", but the catalog price is 50.00 with a maximum discount of 10. This occurred for product "1".
1	8	ADD	E	Order 77A320C1-FAF1-487D-B91E-7023C0264FB4 already exists.
1	9	MODIFY	O	NULL
1	10	MODIFY	E	No such order 6E20440C-E75F-41BD-A4D4-B7438E9D7BAA.
1	11	MODIFY	E	Order 301384C6-5E19-4AC8-847F-D1619E45709F is processed/shipped and cannot be modified.
1	12	DELETE	E	Order 9104F66E-EBAF-44DA-91D3-13E67DFC0655 is processed/shipped and cannot be modified.
1	13	DELETE	E	No such order DBD10A7B-52F5-44C6-B767-FF019134EB27.
1	14	MODIFY	E	Attempt to delete the last product on order 530EFABB-5AB4-4A37-90EA-BAB93F281762. Delete the entire order, or add another product first. This occurred for product "4".
1	15	ADD	E	Invalid or missing GUID "" for order.
1	16	NULL	E	Invalid or missing GUID "(null)" for order.
1	17	ADD	E	Customer not proivided for new order.
1	18	ADD	E	Invalid customer id = "ALFKI".
1	19	ADD	E	Incorrectly formed campaign code.
1	20	ADD	E	Product 1 appears multiple times for the same order entry. This occurred for product "1".
2	1	ADD	E	For a new order, there must be at least one product.
2	2	ADD	E	For a new order, there must be at least one product.
2	3	NEWORDER	E	Invalid operation NEWORDER for order ED5C0356-269D-46F3-8756-50AA98A2D371.
2	4	ADD	E	Illegal product ID "123456789012" on detail entry. This occurred for product "123456789012".
2	4	ADD	E	{248} Procedure ProcessOneImportFile, Line 153 The conversion of the nvarchar value '123456789012' overflowed an int column. Maximum integer value exceeded. This occurred for product "123456789012".
2	5	ADD	E	Illegal product ID "(null)" on detail entry.
2	5	ADD	E	Product ID missing for detail entry.
2	6	ADD	E	Quantity must be > 0 when adding a new order. This occurred for product "4".
2	7	MODIFY	E	{547} Procedure ModifyExternalOrderDetail, Line 25 The UPDATE statement conflicted with the CHECK constraint "ck_OrderDetails_Quantity". The conflict occurred in database "tempdb", table "dbo.OrderDetails", column 'Quantity'. This occurred for product "4".
2	8	ADD	E	{515} Procedure ModifyExternalOrderDetail, Line 20 Cannot insert the value NULL into column 'OrderPrice', table 'tempdb.dbo.OrderDetails'; column does not allow nulls. INSERT fails. This occurred for product "8".
2	9	MODIFY	O	NULL
2	10	MODIFY	O	NULL
2	11	ADD	E	Illegal format for Price. Value is "55,20". This occurred for product "4".
2	11	ADD	E	{8114} Procedure ProcessOneImportFile, Line 153 Error converting data type nvarchar to numeric. This occurred for product "4".
2	12	ADD	E	{248} Procedure ProcessOneImportFile, Line 143 The conversion of the nvarchar value '9876543210' overflowed an int column. This occurred for product "4".

Most of the messages come from our explicit checks and hopefully they serve to help the people on the other end to understand what is wrong. But then there are messages like this one which will leave them in the dark:

{547} Procedure ModifyExternalOrderDetail, Line 20
The INSERT statement conflicted with the CHECK constraint "ck_OrderDetails_Quantity". The conflict occurred in database "tempdb", table "dbo.OrderDetails", column 'Quantity'. This occurred for product "4".

The message includes error number, procedure and line number, since the CATCH block in ProcessOneImportFile gets the error message from the output parameter @errmsg_aug in catchhandler_sp. We could have opted to use the @errmsg parameter to get the original message without the first line. But while the procedure name and line number may not help the people on the other end, it could help our support staff when they look directly into OrderImportFileItems. (And there is nothing that says that the augmented error information has to be sent back in the return message. The send process could easily strip off everything up to the first line break in the message, if it starts with a number in braces.)

To some extent these SQL messages represent a lower level of ambition where we have said to ourselves they are not making that mistake, are they? As an example, take the violation of the constraint ck_OrderDetails_Quantity (row 7 in file 2), which has its own little story. The problem with this order is that for product 4 the value of the Quantity attribute is -2. Who knows, maybe someone intended to reduce the quantity with 2 for that product, but that is not the way to do it. Should this be explicitly validated? Originally, when I composed the test data, I had this Order element as an Add operation. Then I realised that -2 was a more likely error with a Modify operation, so I changed operation for the order element. To my surprise, the order was now processed successfully. This was because this line in ModifyExternalOrderDetail:

ELSE IF @Quantity IS NULL OR @Quantity <> 0

originally read:

ELSE IF @Quantity IS NULL OR @Quantity > 0

The way that procedure is written, the erroneous -2 caused the product to be deleted from the order, which was not right. Thus, when I thought I was really paranoid when I wrote these stored procedures, I was not paranoid enough. And, yes, there should probably be explicit check of the quantity also for Modify (there is one for Add). But it only goes to show that it can be very difficult to predict what silly – and not so silly – errors users and systems on the other end might commit.

Service Broker

You are now at a fork in your reading. If you work with Service Broker, you may prefer to pause reading Part Three at this point in favour of Appendix 3. To wit, the case study of the simple order database continues in Appendix 3 with a Service Broker implementation where the orders arrive as Service Broker messages. When you implement error handling with Service Broker, you need to take precautions so that queues do not get disabled because of poison messages and this is exactly the focus of the appendix.

If you are not interested in Service Broker, or you have had enough of order processing for now, just proceed to the next chapter where we will talk about triggers.

Error and Transaction Handling in Triggers

I have touched on triggers in a few places in this series of articles, but now it's the time to take a deeper look at this subject. The basic principles that you have learnt for stored procedures apply to triggers as well. However, there are some differences in the presumptions that you need to be aware of and which affects how you should implement your error handling. I have already mentioned some of the things I discuss in this chapter, but I permit myself to repeat to have everything covered in one place.

Always in a Transaction

As we have learnt, when you are in a trigger, you are always in the context of a transaction, even if you never issued BEGIN TRANSACTION. The transaction is defined by the statement that fired the trigger. This snippet illustrates:

IF @@trancount > 0 ROLLBACK TRANSACTION
go
CREATE TABLE ZombyWoof (a int NOT NULL)
go
CREATE TRIGGER ZombyWoof_tri ON ZombyWoof  AFTER INSERT AS
   PRINT 'Trancount is ' + ltrim(str(@@trancount)) + '.'
go
-- BEGIN TRANSACTION
INSERT ZombyWoof (a) VALUES(8)
-- COMMIT TRANSACTION
go
DROP TABLE ZombyWoof

This prints Trancount is 1. If you uncomment the BEGIN/COMMIT TRANSACTION statements, you will find that the output is the same. That is, inside of the trigger you cannot discern whether the statement that fired the trigger is part of a multi-statement transaction or not. Then again, there little reason why you would need to know that.

The fact that a trigger executes inside a transaction is not some funny quirk, au contraire, it relates to the very essence of triggers. A trigger is an extension of the statement that fired the trigger, and in a trigger you can expect to find statements of two kinds:

Data-integrity checks that cannot be implemented with constraints. E.g., it is not permitted to add a discontinued product to a new order. (Some people may suggest you to implement such a check in a constraint with help of a UDF. That's a performance killer. Don't go there.)
Cascading updates that cannot be performed with cascading foreign-key constraints.

The key is that if any of the statements in the trigger fails, the statement that fired the trigger should be considered to have failed as well, because the statement and a trigger is a unit. From this follows that you should not put things in a trigger that are optional. For instance, say that you would like to send a mail when a row is inserted into a table, but if the mail can't be sent, you still want the row to be inserted. That mail-sending endeavour does not belong in a trigger. More about this later.

XACT_ABORT On by Default

One thing to observe is that in a trigger XACT_ABORT is ON by default. Here is an example to show that:

SET XACT_ABORT OFF
go
CREATE TABLE FiftyFifty (id int NOT NULL CONSTRAINT pk PRIMARY KEY, 
                         b  int NULL CONSTRAINT ck CHECK (b > 0))
go
CREATE TRIGGER FiftyFifty_tri ON FiftyFifty AFTER INSERT AS
   UPDATE FiftyFifty
   SET    b = -i.b
   FROM   FiftyFifty FF
   JOIN   inserted i ON FF.id = i.id
   PRINT 'This does not print'
go
BEGIN TRANSACTION
INSERT FiftyFifty(id, b) VALUES(1, -981)
PRINT 'Still running'
INSERT FiftyFifty(id, b) VALUES(1, 981)
PRINT 'Nor does this print'
go
PRINT 'Trancount is ' + ltrim(str(@@trancount)) + '.'
DROP TABLE FiftyFifty

The output is:

Msg 547, Level 16, State 0, Line 2

The INSERT statement conflicted with the CHECK constraint "ck". The conflict occurred in database "tempdb", table "dbo.FiftyFifty", column 'b'.

The statement has been terminated.

Still running

Msg 547, Level 16, State 0, Procedure FiftyFifty_tri, Line 2

The UPDATE statement conflicted with the CHECK constraint "ck". The conflict occurred in database "tempdb", table "dbo.FiftyFifty", column 'b'.

The statement has been terminated.

Trancount is 0.

The same error is raised twice. The first time it is fired directly in the INSERT statement and execution continues, since a constraint violation is a statement-terminating error. The second time, the INSERT as such succeeds, and instead the UPDATE statement in the trigger causes the constraint violation, and this time the batch is aborted and the transaction is rolled back. That happens, because XACT_ABORT is ON in the trigger, even if we did not say so explicitly.

The reason it works this way is precisely the notion that a trigger is an extension to the statement. Originally, up to SQL 2000, the behaviour was that any error in a trigger aborted the batch, and there was no way to prevent it from happening. Starting with SQL 2005, you can use SET XACT_ABORT ON or OFF in triggers as well, but to be compatible with older versions, XACT_ABORT defaults to ON in a trigger. Since we have learnt in this article that having XACT_ABORT ON is best practice, we have little reason to be concerned.

Note: if the database is in compatibility level 80, the old behaviour is retained and SET XACT_ABORT OFF has no effect.

Ending Transactions in Triggers

If the transaction ends in the trigger, this raises an error and aborts the batch. This example illustrates:

SET XACT_ABORT OFF
go
CREATE TABLE DinahMoe (Humm  int   NOT NULL CONSTRAINT pk PRIMARY KEY, 
                       Bill  money NOT NULL)
go
CREATE TRIGGER DinahMoe_tri ON DinahMoe AFTER INSERT AS
   IF EXISTS (SELECT * FROM inserted WHERE Bill <> $40)
   BEGIN
      ROLLBACK TRANSACTION
      RAISERROR('The bet is on a forty-dollar bill, nothing else', 16, 1)
      RETURN
    END
go
INSERT DinahMoe (Humm, Bill) VALUES (1, 20)
PRINT 'This does not print'
go
PRINT 'Trancount is ' + ltrim(str(@@trancount)) + '.'
go
DROP TABLE DinahMoe

The output is:

Msg 50000, Level 16, State 1, Procedure DinahMoe_tri, Line 5

The bet is on a forty-dollar bill, nothing else

Msg 3609, Level 16, State 1, Line 1

The transaction ended in the trigger. The batch has been aborted.

Trancount is 0.

Message 3609 only appears if the trigger terminates normally by reaching the last statement or exits through the RETURN statement. If you replace the RAISERROR statement with ;THROW, in the example above, message 3609 is not displayed, as the batch is aborted immediately. Nor is message 3609 raised, if the error raised by RAISERROR is trapped by a CATCH handler higher up the call stack.

Message 3609 is similar to message 266 (trancount mismatch). Both reflect that @@trancount did not have the same value when the module exited as when the module started running, and you should handle them in the same way: as long as the message appears together with other messages, it is white noise. But if 3609 appears alone, there is a problem in your trigger or in a procedure that it calls.

As for why 3609 aborts the batch, there are historical reasons. Originally, up to SQL 2000, message 3609 was never produced, but an error in a trigger always aborted the batch without any further message.

Triggers and Error 266

Speaking of error 266, the behaviour with this error is different in triggers. To wit, this error is never raised if you are in trigger context. Here is an example:

CREATE TABLE Centerville (a int NOT NULL)
go
CREATE PROCEDURE sp266 AS 
   BEGIN TRANSACTION
go
CREATE TRIGGER tri ON Centerville FOR INSERT AS
   SELECT @@trancount AS bef266
   EXEC sp266
   SELECT @@trancount AS aft266
go
CREATE PROCEDURE ARealNicePlaceToRaiseYourKidsUp AS
   SET NOCOUNT ON
   INSERT Centerville(a) VALUES(9)
   SELECT @@trancount AS trancount_after_insert
go
EXEC ARealNicePlaceToRaiseYourKidsUp
COMMIT TRANSACTION
go
DROP PROCEDURE sp266, ARealNicePlaceToRaiseYourKidsUp
DROP TABLE Centerville

This produces the output:

bef266

-----------

aft266

-----------

trancount_after_insert

----------------------

Msg 266, Level 16, State 2, Procedure ARealNicePlaceToRaiseYourKidsUp, Line 0

Transaction count after EXECUTE indicates a mismatching number of BEGIN and COMMIT statements. Previous count = 0, current count = 1.

Note here that the call to sp266 does not produce any error despite the unbalanced BEGIN TRANSACTION. Nor is any error raised when you exit the trigger. However, the error is finally raised when you exit the procedure which included the INSERT statement, and the trancount mistake is eventually revealed.

Why this inconsistency? Historic reasons? Here is an interesting titbit: In SQL Server 6.5 the call to sp266 would raise error 266. (But a stray BEGIN TRANSACTION directly inside the trigger would not.) So maybe this is a regression bug introduced somewhere along the line? Nah, it may have been intentional. The behaviour is explicitly documented in the topic Rollbacks in Stored Procedures and Triggers in Books Online for SQL 2000. But to make it more confusing, that particular sentence is missing from later version of Books Online. Bug or feature, it is unlikely that Microsoft would change the behaviour when it has worked this way for so many versions.

I should add that this behaviour applies as soon as there is a trigger on the call stack. That is, if the trigger calls a stored procedure that in its turn calls sp266, there is still no error raised.

xact_state()

On SQL 2005 and SQL 2008 you cannot rely on the function xact_state() when you are in a trigger, because as long as there is no user-defined transaction started with BEGIN TRANSACTION, xact_state() returns 0 inside the trigger, despite that there is an active transaction. Here is an example:

CREATE TABLE Drowning (Witch int NOT NULL)
go
CREATE TRIGGER IComeFromNowhere ON Drowning AFTER INSERT AS
   BEGIN TRY
      IF EXISTS (SELECT * FROM inserted WHERE Witch < 0)
         RAISERROR('Don''t be negative!', 16, 1)
   END TRY
   BEGIN CATCH
      PRINT 'xact_state() in trigger is ' + ltrim(str(xact_state())) + '.'
      PRINT '@@trancount in trigger is ' + ltrim(str(@@trancount)) + '.'
      IF @@trancount > 0 ROLLBACK TRANSACTION
      EXEC slog.catchhandler_sp @@procid
   END CATCH
go
BEGIN TRANSACTION
INSERT Drowning (Witch) VALUES (-11)
COMMIT TRANSACTION
go
DROP TABLE Drowning

The output is:

xact_state() in trigger is -1.

@@trancount in trigger is 1.

Msg 50000, Level 16, State 1, Procedure catchhandler_sp, Line 125

{50000} Procedure IComeFromNowhere, Line 4

Don't be negative!

Msg 3609, Level 16, State 1, Line 2

The transaction ended in the trigger. The batch has been aborted.

Here xact_state() returns -1, since there is an explicit transaction. And it is -1 and not 1, since XACT_ABORT is ON by default when you are in a trigger.

If you comment out BEGIN TRANSACTION above and you are on SQL 2005 or SQL 2008, the first line in the output changes to:

xact_state() in trigger is 0.

This has been corrected in SQL 2012, so that xact_state() always returns -1 or 1 inside a trigger.

The behaviour is not restricted to the trigger body itself, but it also applies to stored procedures called from the trigger. If you are looking at xact_state() in your CATCH blocks and you are really paranoid, you could use trigger_nestlevel() to see if you are in trigger context. Given that this bug has been fixed in SQL 2012, I would recommend that you only do this places where this causes a real problem for you.

Pattern for Error Handling in Triggers

We have looked at a pattern for error handling in stored procedures. The same pattern can be used for triggers with some adaption. Here is what I would suggest:

CREATE TRIGGER some_tri ON some_table FOR INSERT, UPDATE AS
SET NOCOUNT ON
BEGIN TRY
   IF NOT EXISTS (SELECT * FROM inserted) RETURN
   -- you trigger code goes here.
END TRY
BEGIN CATCH
   IF @@trancount > 0 ROLLBACK TRANSACTION
   EXEC slog.catchhandler_sp @@procid
END CATCH

In the preamble I have left out XACT_ABORT, since it's on by default in triggers, but it would not be wrong to include it explicitly. On the other hand, SET NOCOUNT ON is still there and it is even more compulsory in triggers, since row counts from a trigger can really confuse an application that is not prepared for it.

The first line after BEGIN TRY does not really have much to do with error handling, but the purpose is to make a quick exit if the statement did not affect any rows. I have included this line, because many books and articles incorrectly suggest that you should look at @@rowcount, but this is bad for two reasons:

@@rowcount is set after almost any statement, and this includes both SET NOCOUNT ON and BEGIN TRY, so you cannot check @@rowcount inside the TRY block. You could check @@rowcount before SET NOCOUNT ON, but then someone adds
```
DECLARE @i int = 0
```
before the check, and the trigger will now always exit early. In other words, even if you check @@rowcount in the right place, it's an accident waiting to happen.
If the trigger was fired from a MERGE statement, @@rowcount holds the total number of rows affected. Triggers are fired once per action, so if the MERGE statement inserted 19 rows, but updated none, the full trigger body will still execute twice if you rely on @@rowcount.

Thus, rather than looking at @@rowcount, should check whether there are any rows in the virtual table inserted. In a DELETE trigger you should of course look at deleted rather than inserted. If you have a trigger that is common for all three actions, your check needs to be:

IF NOT EXISTS(SELECT * FROM inserted) AND
   NOT EXISTS(SELECT * FROM deleted) 
   RETURN

The CATCH handler looks like it did before, except that there is no RETURN statement, since you cannot specify an explicit return value in a trigger. In the template above, I have used catchhandler_sp, but you could also use the simpler error_handler_sp. On SQL 2012 and later you can use ;THROW – provided that you remember the semicolon!

If you would like to perform an action that can produce an anticipated error and take an alternate action in such case, this is no different from stored procedures. You have an inner TRY-CATCH block where you lower your guard and set XACT_ABORT OFF and perform the alternate action in the CATCH handler, as I discussed in the section Handling Anticipated Errors.

Retry logic for deadlocks and similar errors is completely pointless to in a trigger. You always have an active transaction when the trigger starts executing which has to be rolled back after the deadlock, and you cannot redo the trigger alone.

Communicating Errors from Triggers

Consider this excerpt from a trigger on the table OrderDetails:

IF EXISTS (SELECT *
           FROM  inserted i
           JOIN  Products P ON i.ProductID = P.ProductID
           WHERE P.isdiscontinued = 1)
BEGIN
   EXEC slog.sqleventlog_sp @@procid,
        'At least one product on the order is discontinued.'
END

What's wrong with this? Say that there are 100 products on the order and we are inserting all at once. Which is the discontinued one? No matter if the person who gets this in the face is an end user or a developer testing his code, that person is looking at a tedious job to find the bad product.

If we write a stored procedure that works with scalar input, there is no problem with putting the name of the actual product in the error message. But when you write a trigger, you always need to consider multi-row input. (You may "know" that the application will only permit the user to enter/modify one row at a time, but 1) A multi-add facility for a file import may be added later. 2) For maintenance reasons, someone may perform a multi-row operation directly through SQL. Your trigger must work correctly in all these situations.) Thus, error messages from triggers tend to be less informative. But if you are ambitious you can write the check as:

IF EXISTS (SELECT *
           FROM  inserted i
           JOIN  Products P ON i.ProductID = P.ProductID
           WHERE P.isdiscontinued = 1)
BEGIN
   DECLARE @prodname nvarchar(400)
   SELECT  TOP 1 @prodname = P.name 
   FROM    inserted i
   JOIN    Products P ON i.ProductID = P.ProductID
   WHERE   P.isdiscontinued = 1
   EXEC slog.sqleventlog_sp @@procid,
        'At least one product on the order is discontinued. One example is %1',
        @p1 = @prodname
END

If you are really ambitious you can list all products:

IF EXISTS (SELECT *
           FROM  inserted i
           JOIN  Products P ON i.ProductID = P.ProductID
           WHERE P.isdiscontinued = 1)
BEGIN
   DECLARE @prodnames nvarchar(400)
   SELECT  @prodnames = (SELECT P.name + ' '
                         FROM   inserted i
                         JOIN   Products P ON i.ProductID = P.ProductID
                         WHERE  P.isdiscontinued = 1
                         FOR XML PATH(''), TYPE).value('.', 'nvarchar(MAX)')
   EXEC slog.sqleventlog_sp @@procid,  
        'The following products on the order are discontinued: %1',
        @p1 = @prodnames
END

(If you have never seen FOR XML PATH used like this before, this is a neat trick to build a concatenated string of values in a column. See this blog post from SQL Server MVP Brad Schultz for more details).

Admittedly, if you do this for every check in your trigger, your code gets cluttered, it takes longer time to write, and you may feel obliged to test that you got it right. It has to be balanced against the time that other people lose because of less informative error messages. I will have to admit that it's not often that I write my triggers like this. One reason for this is that the purpose of many trigger checks is to protect the database from bad programmers. For instance, the application should not offer discontinued products to be entered in the first place, and so a user should never see that message. For checks where I suspect that the message will be displayed to end users, I am more likely to add extra information to be helpful.

Can I Prevent the Trigger from Rolling Back the Transaction?

Occasionally, I see people on the forums who want to write a trigger, but they don't want their trigger to cause the statement that fired the trigger to be rolled back even if the trigger itself fails. Is this achievable? In the general case: no.

If your concern is a specific error, and this is a statement-terminating error when XACT_ABORT is OFF, you can do:

BEGIN TRY 
   SET XACT_ABORT OFF
   -- your statement here
   SET XACT_ABORT ON
END TRY
BEGIN CATCH
   SET XACT_ABORT ON
   -- maybe some alternate action here
END CATCH

But if it is a general concern that you have, this is not going to help since a batch-aborting error like a conversion error or deadlock will doom the transaction no matter the setting of XACT_ABORT.

I would argue that if you want to write a trigger that must not cause a rollback, you are most likely looking at the wrong solution. Recall what I said in the beginning of this chapter: triggers are for code that logically is part of the statement that fired the trigger, and the different behaviour we have seen with triggers exists just because of this. So if you want some piece code to execute when data is updated in a table, but you don't want to roll back the update if this code fails, consider something else. Have your trigger to write a message to a Service Broker queue for asynchronous processing. Or the trigger could insert data into a queue table that you poll later. Or poll the table for changes by other means, for instance Change Tracking. Moreover, if you don't want to cause rollbacks because you are meddling with a third-party application, you should read the license agreement. Most likely, the vendor does not support user-written triggers, so the trigger idea is a non-starter anyway.

You may ask, what if I put COMMIT TRANSACTION first in my trigger? Yes, technically you can; SQL Server will not give you a compilation error. But your trigger will die with error 3609, since the transaction ended in the trigger. It does not matter if the transaction was rolled back or committed, it's still an error that the transaction ended in the trigger. And, oh, if the trigger executes in a nested transaction, so that @@trancount > 1, your COMMIT TRANSACTION will not commit anything, only decrease @@trancount. And then there will be an error somewhere later down the line.

Hm, but what if I immediately add a BEGIN TRANSACTION after my COMMIT? Good boy, now there will not be any errors from your trigger – but you have caused chaos in the ecosystem. Maybe there are other triggers on the table that executes later – recall that beyond defining a first and last trigger for a table, the execution order of multiple triggers is undefined. Maybe the statement is part of a multi-statement transaction, which you commit prematurely, and when a statement later in that transaction fails, the first part of the transaction is not rolled back, because of your trigger.

So, yes, SQL Server permits you to use BEGIN and COMMIT TRANSACTION in a trigger, but you should never put them there. The only transaction-control statement that belongs in a trigger is ROLLBACK TRANSACTION. And when I put never in both bold and italics, this is because I mean it: you should never include BEGIN TRANSACTION or COMMIT TRANSACTION in a trigger. Never.

I did say Never, didn't I?

BACKUP/RESTORE and Other Administrative Commands

This far, I have tacitly assumed a scope of regular application code and ignored administrative commands like BACKUP/RESTORE, DBCC as well as DDL. (DDL = Data Definition Language, i.e. CREATE/ALTER TABLE, PROCEDURE, USER, etc.). The pattern we have used for error handling so far, does not work equally well with these types of commands. In this chapter I will look at error handling with administrative commands in general, and in the next chapter I will cover DDL which comes with its own set of complications.

The Presumptions

There are a few things that are different with administrative commands from regular application code, both in how you use them and how they behave.

Looking at the usage first, these commands are typically not involved in equally complicated application logic as DML statements (DML = Data Manipulation Language, i.e. SELECT, INSERT etc.). For instance, it is not very often you have a transaction involved. In fact, some of these commands like BACKUP and RESTORE are not permitted inside transactions. Another pattern is that you are less keen on aborting on the first error. If you are looping over a set of databases to perform some action in every database, you want to continue even if the action fails on one database. Another difference is that your script may consist of multiple batches. This is particularly common with DDL, and we shall look at this situation in the next chapter. The use of dynamic SQL is prevalent, so we should not forget that syntax errors in dynamic SQL do not abort the batch even when XACT_ABORT is ON.

Note: I often see people using dynamic SQL to build BACKUP and RESTORE commands. This is far from often necessary. Unlike many other commands BACKUP and RESTORE accept variables for most of their arguments.

One thing that makes error handling troublesome with these commands is that they may produce multiple error messages. As an example consider this command:

BACKUP DATABASE msdb TO DISK = 'K:\NoSuch\msdb.bak'

The output from this command is:

Msg 3201, Level 16, State 1, Line 1

Cannot open backup device 'K:\NoSuch\msdb.bak'. Operating system error 3(The system cannot find the path specified.).

Msg 3013, Level 16, State 1, Line 1

BACKUP DATABASE is terminating abnormally.

That is, there are two error messages. Look what happens if we apply the standard routine we have learnt in this article:

BEGIN TRY
   BACKUP DATABASE msdb TO DISK = 'K:\NoSuch\msdb.bak'
END TRY
BEGIN CATCH
   IF @@trancount > 0 ROLLBACK TRANSACTION
   EXEC slog.catchhandler_sp @@procid
END CATCH

The resulting output on SQL 2008 or later is far from helpful:

Msg 50000, Level 16, State 1, Procedure catchhandler_sp, Line 125

{3013} Line 2

BACKUP DATABASE is terminating abnormally.

We only get the second error message. And if you look in the sqleventlog table, all you see is this message. The first error message, which the real information is, has been lost.

This is a restriction with the error_xxx() functions: if SQL Server raises multiple error messages, they can only return information about one error since they are scalar functions. I have not conducted any deeper investigations of which commands that may produce more than one error message, but beside BACKUP/RESTORE many DDL commands are in this camp as well. (I have a feedback item. Make is possible to retrieve all error messages in a CATCH handler that calls for an improvement in this area.)

Note: On SQL 2005, the output includes both error messages, but this is only because the first message is sent directly to the client. Only the second message is logged to sqleventlog.

DBCC CHECKDB

A very special case is DBCC CHECKDB which can produce very many error messages. As I showed in the section Uncatchable Errors in Part Two, errors from DBCC are swallowed entirely when you use TRY CATCH.

Then again, DBCC CHECKDB opens for some philosophical discussions. If DBCC CHECKDB completed and reported tons of errors, the database is surely corrupt, but did the command fail? After all, it did what it was supposed to do. So you could say that it is correct that the CATCH handler does not fire. To check whether your database is corrupt, you can use DBCC CHECKDB with the options WITH NO_INFOMSGS, TABLERESULTS and save the data into a table. If this produces any rows, your database is corrupt and then you can apply whatever logic you want in this situation.

That said, DBCC CHECKDB can fail for real. If there are damages to crucial allocation pages or system tables, DBCC may have to choice but to give up. And in this case, DBCC may trigger the CATCH handler and not produce a result set even if you use WITH NO_INFOMSGS, TABLERESULTS. Here is an example, using Paul Randall's demo database FatalCorruption1:

BEGIN TRY
   DBCC CHECKDB WITH NO_INFOMSGS, TABLERESULTS
END TRY
BEGIN CATCH
   PRINT 'The actual severity is: ' + ltrim(str(error_severity()))
   EXEC tempdb.slog.catchhandler_sp @@procid
END CATCH

I got this output (on SQL 2005):

The actual severity is: 23

Msg 50000, Level 18, State 51, Procedure catchhandler_sp, Line 125

{211} Line 2

Possible schema corruption. Run DBCC CHECKCATALOG.

This is quite bewildering: the severity of the message is 23 which should cut the connection. And, indeed, if you don't have TRY-CATCH, the connection is terminated for real. (The reraised message from SqlEventLog has severity level 18; this is due to that catchhandler_sp never uses a higher severity level than 18 when reraising messages.) I have not tested how this behaves on higher versions of SQL Server, since attaching FatalCorruption1 on later versions fails because of the corruption.

DBCC CHECKDB can also fail because it is not able to create the database snapshot it works from and this is a real failure, since this does not answer the question whether your database is corrupt or not. I have not investigated how DBCC CHECKDB behaves in this case.

Alternatives for Error Handling

As we have seen, with this type of commands, TRY-CATCH does not work well. What alternatives are there?

After all, use TRY-CATCH. I don't recommend this if you are on SQL 2005 and SQL 2008, because your error diagnostics will be poor. When things go wrong, you will be left in the dark without understanding of what happened. As for TRY-CATCH on SQL 2012 and later, see the discussion below.
SET XACT_ABORT ON without TRY-CATCH. This works if:
1. You want to abort on the first error.
2. Your script is a single batch (if there are multiple batches, execution will continue with the next batch).
3. After invocations of dynamic SQL you also check @@error, since syntax errors in dynamic SQL do not abort the batch, even when XACT_ABORT is on.
But as noted above, in many cases you want to run this type of commands in a loop and continue with the next guy in case of an error. So it is more likely you want XACT_ABORT to be OFF, so that you at least can survive some errors.
Use old-style error handling with @@error. This would be the main alternative, and I cover it in a little more detail below.
Use any of the approaches – which all qualify as kludges – that I present in the last two sections in this chapter.
Give up the idea of writing everything in T‑SQL, but write a controlling program in PowerShell or C#. This is likely to be the best alternative, but this is outside the comfort zone of more than one DBA.

If you are on SQL 2012 or later, your prospects with TRY-CATCH may at first glance look at lot better, because you can use ;THROW:

SET XACT_ABORT OFF
go
BEGIN TRY
   BACKUP DATABASE msdb TO DISK = 'K:\NoSuch\msdb.bak'
END TRY
BEGIN CATCH
   ;THROW
END CATCH
PRINT 'This does not print'

When you use ;THROW to reraise the error, you get all error messages:

Msg 3201, Level 16, State 1, Line 2

Cannot open backup device 'K:\NoSuch\msdb.bak'. Operating system error 3(The system cannot find the path specified.).

Msg 3013, Level 16, State 1, Line 2

BACKUP DATABASE is terminating abnormally.

So far, so good, but to what benefit? As testified by the PRINT statement, execution is aborted. ;THROW always aborts the batch, even if the original error is not batch-aborting and XACT_ABORT is OFF. So in practice, using ;THROW is only an alternative to using SET XACT_ABORT ON. Admittedly a better alternative, since you also trap syntax errors in dynamic SQL. But if you want to continue with the next item on your list, you need to have an outer CATCH handler that traps the reraised error. In that CATCH handler you can log and decline to reraise. But you can only log the single error returned by error_message(), and you will lose the full error information. That is, your situation is just as dire as on SQL 2005 and SQL 2008.

Thus, in practice when you work with many administrative commands, you will have to resort to check @@error. For instance:

BACKUP DATABASE msdb TO DISK = 'K:\NoSuch\msdb.bak'
SELECT @err = @@error IF @err <> 0 GOTO ErrorExit

Or something like this:

BACKUP DATABASE msdb TO DISK = 'K:\NoSuch\msdb.bak'
SELECT @err = @@error

IF @err = 0
BEGIN
   -- Next command goes here
   SELECT @err = @@error
END

Recall that @@error is very volatile and set after each statement (with some odd exceptions). That is, if you splice in a new statement between the one you want to check and where you look at @@error, your error handling will break. To avoid this, always copy @@error to a local variable in a SELECT (or SET) statement that is glued to the statement you want to check. No blank lines in between!

If you nest stored procedures with this type of error handling, you should be careful to return a non-zero number in case of an error. For instance:

ErrorExit:
   RETURN 1

When you call stored procedures, you need to check both the return value and @@error:

EXEC @ret = some_sp
SELECT @err = coalesce(nullif(@ret, 0), @@error) IF @err <> 0 GOTO ErrorExit

The use of both coalesce and nullif may perplex some readers. The main idea is that if @ret is zero, we also look at @@error. But there are also situations where @ret can be NULL, as I discuss in the section on the RETURN statement in Part Two.

In this example, I did not check @@trancount, since BACKUP cannot run inside a transaction. But if you work with transactions with some other commands, your error exit should read:

ErrorExit:
   IF @@trancount > 0 ROLLBACK TRANSACTION
   RETURN 1

Introducing the CMD Catch Handler

I wanted to improve this story, and I figured that the CLR could be used for this. There were two things I wanted to achieve:

Be able to catch and reraise all errors when multiple errors are raised.
Be able to log all errors.

Or put differently: I wanted to be able to use SqlEventLog and catchhandler_sp for commands with multiple error messages without any loss of information.

This resulted in the CMD Catch Handler, which is an add-on to SqlEventLog. The CMD Catch Handler is included in the download file for SqlEventLog. Since it is more experimental in nature and adds one column, cmdtext, to the sqleventlog table, there is a separate install file, build_cmd_catchhandler.bat. You configure this file in the same way as you configure build_sqleventlog.bat. You need to first install SqlEventLog as such with build_sqleventlog.bat before you run build_cmd_catchhandler.bat. The CMD Catch Handler includes an assembly, and like the loopback assembly for SqlEventLog, this assembly requires EXTERNAL_ACCESS permission. The assembly is signed with the same key that is used for the loopback assembly.

Here, I only discuss how to use the CMD Catch Handler. If you want to learn how it works on the inside, I cover this in Appendix 2, which covers error handling with the CLR.

The basic idea is that the CMD Catch Handler accepts a command (which can be a batch of commands or a call to a stored procedure, but I recommend that you stick to single statements), which it runs on the context connection. If the command produces errors, the error messages are sent to the client, almost as if you had run the command directly. The errors are also logged to the sqleventlog table. (The logging is performed on a loopback connection, whence the need for EXTERNAL_ACCESS permission.)

The basic usage pattern of the CMD Catch Handler is like in this example:

EXEC slog.cmd_catchhandler_sp 
     'BACKUP DATABASE msdb TO DISK=''K:\NoSuch\msdb.bak'''

SELECT TOP 2 logid, errno, msgtext
FROM   slog.sqleventlog 
ORDER  BY logid DESC
SELECT TOP 2 logid, errno, cmdtext
FROM   slog.sqleventlog 
ORDER  BY logid DESC

The output (somewhat truncated for space):

Msg 50000, Level 16, State 1, Line 1

{3201} Cannot open backup device 'K:\NoSuch\msdb.bak'. Operating system error 3(The system cannot find the path specified.).

Msg 50000, Level 16, State 1, Line 1

{3013} BACKUP DATABASE is terminating abnormally.

logid errno msgtext

----- ----- -------------------------------------------------

8 3013 BACKUP DATABASE is terminating abnormally.

7 3201 Cannot open backup device 'K:\NoSuch\msdb.bak'. Operating system...

(2 row(s) affected)

logid errno cmdtext

----- ----- ------------------------------------------------

8 3013 BACKUP DATABASE msdb TO DISK='K:\NoSuch\msdb.bak'

7 3201 BACKUP DATABASE msdb TO DISK='K:\NoSuch\msdb.bak'

The error messages are collected and then reraised with RAISERROR (to work around some problems that I discuss in Appendix 2). The original error number is prepended to the message text, just like catchhandler_sp does. However, cmd_catchhandler_sp never adds the original line number or stored procedure, because of limitations in the SQLCLR. As long as you only pass single-line commands, this restriction is not an issue.

The preferred error handling with the CMD Catch Handler is by checking the return code:

DECLARE @ret int
EXEC @ret = slog.cmd_catchhandler_sp 
            'BACKUP DATABASE msdb TO DISK=''K:\NoSuch\msdb.bak'''
IF @ret <> 0 GOTO error_exit

If the command completes successfully, the return value is 0. If there are errors, the return value is the number of the first error encountered.

The procedure takes three optional parameters: @procid, @quotechar and @trycatch. For @procid you would pass @@procid. With catchhandler_sp, you should always pass @@procid, but many times you will call the CMD Catch Handler from a script outside a stored procedure, and in this case it is not meaningful to pass @@procid. But if you call the handler from inside a stored procedure, you do like this.

EXEC @ret = slog.cmd_catchhandler_sp 
            'BACKUP DATABASE msdb TO DISK=''K:\NoSuch\msdb.bak''', @@procid

@quotechar permits you to use an alternate quote character in the SQL string to save you from nested quotes. For instance:

EXEC @ret = slog.cmd_catchhandler_sp 
            'BACKUP DATABASE msdb TO DISK="K:\NoSuch\msdb.bak"', @quotechar='"'

Double quote (") is the obvious choice, but it will not work if the command string includes XML, full-text or other use of double quotes.

The purpose of the @trycatch parameter is to inform the CMD Catch Handler that you are in a TRY block, and that you need an extra error to be raised to trigger the CATCH in T‑SQL. To understand this, let's first look at the default behaviour when @trycatch is 0:

BEGIN TRY
   EXEC slog.cmd_catchhandler_sp 
        'BACKUP DATABASE msdb TO DISK=''K:\NoSuch\msdb.bak'''
   PRINT 'Here we are!'
END TRY
BEGIN CATCH
   PRINT 'We do not come here'
END CATCH

The output is not what you may expect:

Msg 50000, Level 16, State 1, Line 1

{3201} Cannot open backup device 'K:\NoSuch\msdb.bak'. Operating system error 3(The system cannot find the path specified.).

Msg 50000, Level 16, State 1, Line 1

{3013} BACKUP DATABASE is terminating abnormally.

Here we are!

If you pass @trycatch as 1, the CMD Catch Handler also raises the error in a way that triggers the CATCH block, at the price that the error messages are wrapped in a .NET stack dump.

BEGIN TRY
EXEC slog.cmd_catchhandler_sp 
      'BACKUP DATABASE msdb TO DISK=''K:\NoSuch\msdb.bak''', @trycatch = 1
   PRINT 'No entry.'
END TRY
BEGIN CATCH
   IF @@trancount > 0 ROLLBACK TRANSACTION
   EXEC slog.catchhandler_sp @@procid
END CATCH

The output this time is:

Msg 50000, Level 16, State 1, Line 1

{3201} Cannot open backup device 'K:\NoSuch\msdb.bak'. Operating system error 3(The system cannot find the path specified.).

Msg 50000, Level 16, State 1, Line 1

{3013} BACKUP DATABASE is terminating abnormally.

Msg 50000, Level 16, State 1, Procedure catchhandler_sp, Line 125

{6522} Procedure cmd_catchhandler_clr_sp, Line 0

A .NET Framework error occurred during execution of user-defined routine or aggregate "cmd_catchhandler_clr_sp":

System.Exception: Cannot open backup device 'K:\NoSuch\msdb.bak'. Operating system error 3(The system cannot find the path specified.).

System.Data.SqlClient.SqlError: BACKUP DATABASE is terminating abnormally.

System.Exception:

at SqlEventLog.cmd_catchhandler(String cmdtext, SqlInt32 procid, SqlBoolean trycatch, SqlString quotechar, String server, String dbname, SqlString username, SqlString appname, SqlString hostname)

It is not particularly pretty, but the error 6522 could be withheld from human eyes with something like:

IF error_number() <> 6522
   EXEC slog.catchhandler_sp @@procid

But this also means that the error is not reraised and that any outer CATCH will not fire.

I can sense some doubt from the reader about the CMD Catch Handler, and you are right. While it has its merits in that you can log messages to sqleventlog, there are several weak points. First and most of all: while SqlEventLog itself is drawn from something which has been tested and tried in production, the CMD Catch Handler is not. It is only something that I came up with for this article. Arriving at a solution which passed all the simple tests was not trivial; I had to fight a bit to deal with the idiosyncrasies in the interplay between T‑SQL and the SQLCLR. This is also why I recommend to mainly use it with single statements. While you can pass a long batch or a call to a 3000-line stored procedure that spews out error messages to have these logged, I am not sure how well it work out, least of all if you do it inside a transaction.

And there is no denial that the solution is clunky, since everything has to be embedded SQL strings. To add insult to injury, there is no provision to pass parameters, so all parameter values must be inlined into the SQL string.

As I said previously, a better alternative is to run the SQL code from a client-side program and have this program to log the errors. This does not have to be a custom-written program – it could be a program that reads a script file and which sends batches separated by GO to SQL Server just like SSMS does, but with the added value that it logs errors to SqlEventLog.

Using Extended Events

This might be the best approach, at least for things like BACKUP/RESTORE. I would be even less keen on using this solution in application code than the CMD Catch Handler, though, for reasons I will return to at the end of this section.

I originally learnt of this idea from Dave Mason who has published two blog posts about the technique, which you find here:

As it happens, Dave's posts were "missing" for a while, because he had moved them from another site to the current location, and it took a while before he informed me. In that window, Arnold Lieberman alerted me about the missing posts. Arnold was also kind to share his adaption of Dave's solution with me. Thinking that Dave's posts were gone for good, I worked from Arnold's code to develop what I present below. If you compare my code to the code in Dave's posts, you will find that there are notable difference. I don't want to claim that my solution is "better". The important thing is the idea, and then you can skin the cat in more than one ways. You can use my code or Dave's code as-is, and whichever you use, you should not be shy from making adaptions to fit your needs and preferences.

You find the complete script for my implementation in the file multiplemsghandling.sql which includes one user-defined function and three stored procedures. All objects are in the hmem schema. (hmem = Handle Multiple Error Messages). There is no tie-in to SqlEventLog as I figured that it is likely that there are readers who may be interested in this solution, but who have no interest in SqlEventLog. The scripts run on all versions of SQL Server from SQL 2012 and up.

Here is an outline of how you would use it:

--CREATE TABLE #hmem_errors (...

EXEC hmem.start_error_event_session

WHILE ...
BEGIN
   -- Get next row to process.
   BEGIN TRY
      -- Do something here
   END TRY
   BEGIN CATCH
      EXEC hmem.process_error_messages --@print_errors = 0/1/2
   END CATCH
END

--SELECT * FROM #hmem_errors ORDER BY id
go
EXEC hmem.drop_error_event_session
DROP TABLE #hmem_errors

You call hmem.start_error_event_session to set up an extended event session that captures exceptions and which is filtered for the current spid. When you are done, you drop it with hmem.drop_error_event_session. I recommend that you put this call in a separate batch, so that it is executed also in case your execution runs into a batch-aborting error. In the outline I have a loop, because that would be a common use case. You are doing something in a loop that could fail and produce multiple error message. You want all messages printed/logged, and you want to continue with the loop.

In the case of an error, you end up in the CATCH handler inside the WHILE loop and here you call hmem.process_error_messages, which reads error information from the extended events session. Depending on the parameter @print_errors, the procedure can be silent, print the errors as informational messages or raise them as errors. If you have created the temp table #hmem_errors, the procedure writes the error information to that table, so you can digest the errors that way.

That was the outline. Here follows more detailed information. As I mentioned there is a user-defined function. It is very simple:

CREATE FUNCTION hmem.xe_event_name() RETURNS sysname AS
BEGIN
   RETURN concat('error_handling_spid_', @@spid)
END

The purpose is to keep the naming of the XE session to a single place so that you easily can change it. All the stored procedures call this function.

The procedure hmem.start_error_event_session starts with dropping any existing event session for the spid to get a clean slate. It then proceeds with setting up a new event session which captures the event sqlserver.error_reported, filtered for the current spid and for severity ≥ 11, so that informational messages are filtered out. The target is a ring buffer. Of the event-session options, the following are worth calling out:

MAX_DISPATCH_LATENCY = 1 SECONDS. This is a very low latency for extended events, but obviously we don't want to wait for the error messages to become available.
TRACK_CAUSALITY = OFF. This is the default setting. However, in Dave's posts he has this setting to ON, for reasons unknown to me. I could not see any point with causality tracking here, so I turned off this setting.
STARTUP_STATE = OFF. Again, this is the default setting, but I still want to highlight that if left around, the session will not start after a reboot of SQL Server.

hmem.start_error_event_session starts the event session immediately after creation.

There is not much to say about hmem.drop_error_event_session, so let's move over to the centrepiece, hmem.process_error_messages. It starts with checking if the temp table #hmem_errors exists, and if does not, it creates it, since it needs it for its internal use anyway. As mentioned above, you can create this table yourself, and in that case you will get back the error messages in this table. This is how you need to created it:

CREATE TABLE #hmem_errors (id       int            NOT NULL IDENTITY PRIMARY KEY,
                           errno    int            NOT NULL,
                           severity tinyint        NOT NULL,
                           state    int            NOT NULL,
                           errproc  sysname        NULL,
                           linenum  int            NULL,
                           msgtext  nvarchar(2048) NOT NULL,
                           -- errtime datetime2(3) NOT NULL DEFAULT sysdatetime()
)

That is, you must have the columns id, errno, severity, state, errproc, linenum and msgtext, and the id column must have the IDENTITY property. But as the example suggests, you can add additional columns for your own purposes. Obviously, such extra columns must either be nullable or have a default. Or you need to modify hmem.process_error_messages to set them.

If the temp table does not exist and the parameter @print_errors is 0, the procedure sets @print_errors to 1, since else the error messages would be dropped on the floor entirely.

Next, the procedure attempts to get the XEvent data off the ring buffer. Extended events are designed to be asynchronous there can always be delays, even if the session is defined to have a very short latency. Therefore, the procedure loops until there is actually data to retrieve, waiting 200 ms before trying again. If there still is nothing after two seconds, it gives up and falls back on the error_xxx() functions. (Which means that you will only get the last error message, if there are multiple messages.)

Once the loop has terminated, it extracts the error information into the temp table. In the extraction there is a filter on the destination, since the session may capture messages that are only intended for the Windows event log. If you want these messages as well, feel free to remove this filter. The columns errproc and linenum are populated from error_procedure() and error_line(), as this information is not available in the event session. (That is doable, but I did not find it worth the extra complexity. In any case, this information is the same for all errors.)

Next, if @print_errors is 1 or 2, it loops over the new messages in #hmem_errors (that is, ignoring those that were in the table when the procedure started) to print them. If @print_errors is 1, they are printed with the PRINT statement. The procedure prepends the regular error-message headers, so you will see something like this:

Processed 328 pages for database 'incaroads', file 'incaroads' on file 1.

Processed 2 pages for database 'incaroads', file 'incaroads_log' on file 1.

BACKUP DATABASE successfully processed 330 pages in 0.017 seconds (151.654 MB/sec).

Msg 924, Level 14, State 1, Line 33

Database 'lumpygravy' is already open and can only have one user at a time.

Msg 3013, Level 16, State 1, Line 33

BACKUP DATABASE is terminating abnormally.

Processed 464 pages for database 'master', file 'master' on file 1.

Processed 4 pages for database 'master', file 'mastlog' on file 1.

This is from a test where I ran some backups in a loop, and where I had some errors. Since they are PRINT messages, the errors do not stand out against the other verbiage from BACKUP DATABASE. Therefore, you may find it better to set @print_errors to 2, in which case the messages are indeed raised as errors:

Processed 328 pages for database 'incaroads', file 'incaroads' on file 1.

Processed 2 pages for database 'incaroads', file 'incaroads_log' on file 1.

BACKUP DATABASE successfully processed 330 pages in 0.016 seconds (161.132 MB/sec).

Msg 50000, Level 14, State 1, Procedure process_error_messages, Line 118 [Batch Start Line 8]

Msg 924, Level 14, State 1, Line 33

Database 'lumpygravy' is already open and can only have one user at a time.

Msg 50000, Level 16, State 1, Procedure process_error_messages, Line 118 [Batch Start Line 8]

Msg 3013, Level 16, State 1, Line 33

BACKUP DATABASE is terminating abnormally.

Processed 464 pages for database 'master', file 'master' on file 1.

Processed 4 pages for database 'master', file 'mastlog' on file 1.

Now the errors are more visible, but on the other you get an extra noise line from RAISERROR. Keep in mind that if you use this option, you should not have any outer CATCH handler, as it will intercept and catch the first error and thereby terminate hmem.process_error_messages prematurely. Not only does this mean that you only get to see the first message, but it also means that the last action of the procedure is not performed, and that is an important one. To wit, it stops and restarts the event session to clear the ring buffer, so that next time it is called, there are only the new exceptions in the ring buffer.

For the solution to work, permissions to manage event sessions and access the ring buffer must be present. That is, either the user needs to have these permissions, or you need to package this permission within the stored procedures, a technique that I describe in detail in my article Package Permissions in Stored Procedures. For a script that you run as sysadmin, for instance a script to backup or restore databases, the permissions are not much of an issue. But even if certificate signing is a great technique, the permission issue certainly raises a hurdle for use in application code.

And that is not the only hurdle. If you were to put this in application code, what are your feeling of having this pattern with start and stop event sessions all over the place in your code? And from a management perspective, do you really want to have lots of extended event sessions being created and dropped all the time?

Obviously, an alternative is to have one single, unfiltered, event session running all the time, like the one I describe in my article Monitor Exceptions in Your Database. I was considering this possibility when I designed my adaption, but I came out against it for two reasons:

For this purpose, we need a session with a low latency, and that's OK when the session is filtered for a single spid. But for a general session, it can be problematic. There may be a process that spews out thousands of errors in a very short interval due to some misconfiguration or something else. In combination with a low latency setting this can result in extra load on the server.
When you read from a common event session, it is more difficult to identify which are the error messages that led to the current invocation of the CATCH handler, and which are messages you reported in the previous round. This is the problem that hmem.process_error_messages avoids by stopping and restarting the session.

Overall, while this solution is more robust than the CMD Catch Handler, both of them spell K-L-U-D-G-E. With capital letters.

Other Approaches

I am not the only one to have tackled the problem. Here are three other suggestions I've picked up from other people.

Using the SQL Server Errorlog

The idea here is that you are interested in specific messages, typically those produced by BACKUP/RESTORE. You use sp_altermessage to specify that these errors should be written to the SQL Server errorlog, and then you can grab them with xp_readerrorlog. This is how Adrian Buckman approached the problem, and he describes his solution in two blog posts: The dreaded SQL server error 3013! ‘RESTORE LOG is terminating abnormally’ the bane of my life, until recently and Using the SQL Error log to help log meaningful errors for your Automated Restore jobs.

xp_cmdshell

Yet an approach is to use xp_cmdshell and run the command through SQLCMD. This technique was originally presented in a blog post from Maria Zakourdaev, but that blog post does not seem to be available any more. Here is an outline of the idea, as I recall it:

DECLARE @sql nvarchar(MAX),
        @cmd varchar(8000),
        @ret int

SELECT @sql = 'BACKUP DATABASE master TO DISK = ''K:\NosuchDir\master.bak'''
SELECT @cmd = 'SQLCMD -b -S ' + @@servername + ' -Q "' +  @sql + '"'
EXEC @ret = xp_cmdshell @cmd
SELECT @ret

When you run this, you get the output from xp_cmdshell as a result set, and you can use INSERT-EXEC to capture it in a table. Note the use of the option -b with SQLCMD. This option instructs SQLCMD to return an error code to the calling shell, and eventually, this return code ends up in the @ret variable, so you know if the command succeeded of failed.

I can't say that I'm enthusiastic over this approach. For one thing, I prefer xp_cmdshell to be disabled. And it's definitely precarious to use in application code where you need to arrange with permissions for plain users to run xp_cmdshell. However, since BACKUP and RESTORE require extra permissions anyway that may not be that much of an issue in these cases. An advantage with using xp_cmdshell over the CMD Catch Handler is that the behaviour is more consistent.

Note that if you need the command to be inside a transaction that is not going to work out with xp_cmdshell. While this is not an issue with BACKUP and RESTORE (since they cannot be inside a transaction), it could matter with DDL, where you also have this problem, as we will see in the next chapter.

DBCC OUTPUTBUFFER

If you use old-style error handling with @@error, there is still a possibility to get the hold of error message so that you can log it to a table. You can use DBCC OUTPUTBUFFER, but if you look at the output from this command, you will think that this the biggest kludge of them all. However, there are some good news! Paulo Santos has done the hard work for you. His spGET_LastErrorMessage wrestles out the most recent error message from the output of DBCC OUTPUTBUFFER. If there are multiple error message, he only gets the first message, but with BACKUP/RESTORE it is typically the first error message that carries the useful information. If you feel that you want both messages, you can use the information in his blog post to augment the procedure.

Note that DBCC OUTPUTBUFFER requires sysadmin membership. It is not uncommon that BACKUP and RESTORE are executed by sysadmin staff, but if you want a junior DBA who is not sysadmin to run your backup script, you should not use this procedure unless you use certificate signing to add sysadmin to it.

Error Handling with DDL

What Are the Challenges?

As with administrative commands, you often get multiple error messages with DDL statements. Here is one example.

CREATE TABLE Baltimore (a int NOT NULL)
INSERT Baltimore(a) SELECT 1 UNION ALL SELECT 1

We try to add a primary key:

ALTER TABLE Baltimore ADD CONSTRAINT WhatsNewIn PRIMARY KEY (a)

The output is:

Msg 1505, Level 16, State 1, Line 1

The CREATE UNIQUE INDEX statement terminated because a duplicate key was found for the object name 'dbo.Baltimore' and the index name 'WhatsNewIn'. The duplicate key value is (1).

Msg 1750, Level 16, State 0, Line 1

Could not create constraint. See previous errors.

The statement has been terminated.

But if you embed the ALTER TABLE statement in TRY-CATCH, the same things happens as with BACKUP/RESTORE:

BEGIN TRY
   ALTER TABLE Baltimore ADD CONSTRAINT WhatsNewIn PRIMARY KEY (a)
END TRY
BEGIN CATCH
   EXEC slog.catchhandler_sp @@procid
END CATCH

That is, you only see the second (and not very helpful) message:

Msg 50000, Level 16, State 0, Procedure catchhandler_sp, Line 125

{1750} Line 2

Could not create constraint. See previous errors.

If you use ;THROW you will see both messages.

It's not only the two-message errors that make error handling for DDL scripts different from regular application code. The main challenge lies in the fact that DDL scripts typically consist of multiple batches. This is necessary for several reasons. There are statements like CREATE PROCEDURE, CREATE VIEW etc that must be in batches of their own. There are also situations where a batch fails to compile because a later statement depends on the execution of an earlier statement in the same batch. For instance, given this table:

CREATE TABLE HotPlate (Heaven int NOT NULL)

If you attempt any of the batches below

ALTER TABLE HotPlate ADD GreenHotel int
UPDATE HotPlate
SET    GreenHotel = Heaven + 2
go
ALTER TABLE HotPlate ADD GreenHotel int
ALTER TABLE HotPlate ADD CONSTRAINT AtThe CHECK (GreenHotel > Heaven)

They will both fail with:

Msg 207, Level 16, State 1, Line 2

Invalid column name 'GreenHotel'.

As inconsistency is the hallmark of SQL Server, you can sometimes get away with multiple statements in the same batch, because the check on the added/changed item is not until run-time. But since it's difficult to know when it works or not, the common pattern is one DDL statement per batch. The challenge is how to implement error handling under these conditions.

The DDL_Demo Database

To learn what strategies we can apply, we will work with the example of the DDL_Demo database. (Readers on SQL 2005 will need to change the column Audit.AuditTime to datetime with getdate() as default.)

USE tempdb
go
IF EXISTS (SELECT * FROM sys.databases WHERE name = 'DDL_Demo')
   DROP DATABASE DDL_Demo
go
CREATE DATABASE  DDL_Demo
go
USE DDL_Demo
go
CREATE TABLE Audit (
     Auditid   int           IDENTITY,
     AuditTime datetime2(3)  NOT NULL
        CONSTRAINT def_AuditTime DEFAULT sysdatetime(),
     TableName sysname       NOT NULL,
     KeyValue1 nvarchar(100) NOT NULL,
     KeyValue2 nvarchar(100) NULL,
     OtherCols xml           NOT NULL,
     CONSTRAINT pk_Audit PRIMARY KEY (Auditid)
)
go
CREATE TABLE Lookup
    (LookupId   int         NOT NULL,
     LookupName varchar(20) NOT NULL,
     CONSTRAINT pk_lookup PRIMARY KEY (LookupId)
)
INSERT Lookup (LookupId, LookupName)
   SELECT 1, 'Lookup 1'
   UNION ALL
   SELECT 2, 'Lookup 2'
go
CREATE TABLE Main 
    (KeyCol      int          NOT NULL,
     LookupId    int          NOT NULL,
     UselessCol  float        NULL,
     Constrained int          NULL
        CONSTRAINT ck_constrained CHECK (Constrained > 0),
     DataCol     varchar(20) NULL,
     CONSTRAINT pk_Main PRIMARY KEY (KeyCol),
     CONSTRAINT fk_Main_Lookup FOREIGN KEY (LookupId)
        REFERENCES Lookup (LookupId)
)
go
CREATE INDEX LookupId_ix ON Main (LookupId)
go
CREATE TRIGGER Audit_Main ON Main FOR INSERT, UPDATE AS
   INSERT Audit(TableName, KeyValue1, OtherCols)
      SELECT 'Main', ltrim(str(KeyCol)), 
             (SELECT LookupId, UselessCol, Constrained, DataCol
              FOR    XML RAW('Data'))
      FROM   inserted
go
INSERT Main (KeyCol, LookupId, Constrained, UselessCol, DataCol)
  SELECT 1, 1, 5, pi(), replicate('1', 9)
  UNION ALL
  SELECT 2, 1, NULL, log(pi()), replicate('2', 6)
  UNION ALL
  SELECT 3, 2, 98, NULL, replicate('3', 9)
go
CREATE TABLE Sub
  (KeyCol    int         NOT NULL,
   SubKeyCol int         NOT NULL,
   ChildData varchar(23) NOT NULL,
   CONSTRAINT pk_Sub PRIMARY KEY (KeyCol, SubKeyCol),
   CONSTRAINT fk_Sub_Main FOREIGN KEY (KeyCol)
      REFERENCES Main (KeyCol)
)
go

In summary: the database has four tables: Main, Sub, Lookup and Audit. Main has a foreign key to Lookup, and Sub has a foreign key to Main. There is a trigger on Main that logs new and changed rows to Audit.

In this example, we will implement two changes in the table Main: 1) drop the column UselessCol and 2) change the data type of DataCol to int. These are fairly simple changes that can be implemented with ALTER TABLE, but to highlight the challenges with error handling in DDL scripts, we will implement them in a more complicated way

Using the Table Designer

Our staring point is to modify the table Main through one of the most dreadful pieces of software that ships with SQL Server: the Table Designer. Why it's dreadful? ...you will understand quite soon.

If you want do to the GUI part of this exercise, you need to go into Tools->Options->Designers and uncheck Prevent saving changes that require table re-creation, if you have SSMS 2008 or later. Don't forget to set it back when you are done. This initial GUI exercise is not terribly important itself, but serves to create the working material for the rest of the demo.

In Object Explorer, find the database DDL_Demo and the table Main. Right-click and select Design. Delete the column UselessCol and change the data type for the column DataCol to int. As an initial exercise, press Save. You first get an information box that says that the tables Lookup, Main and Sub will be changed, despite that you only changed Main. Never mind, press Yes, and then there is another warning about data loss when you change DataCol, just press Yes. You get this output:

'Lookup' table saved successfully

'Main' table

- Unable to preserve trigger 'Audit_Main'.

Invalid column name 'UselessCol'.

Of course! Since we deleted a column we need to change the trigger. Hm, rather than saving the changes, we should probably generate a script so we can modify the trigger. However, before you move on, run these two queries:

EXEC DDL_Demo..sp_help Main
SELECT * from DDL_Demo.sys.foreign_keys

The first query shows that Main is unchanged, but the other query is worrying: it lists only one foreign key, but there are two in the creation script above. ¿Que?

Close the designer window and re-rerun the creation script. (Make sure that you are not in DDL_Demo in the Object Explorer tree, or else DROP DATABASE may fail.) Open the designer again and perform the same changes: delete UselessCol and change DataCol to int. This time select Generate Change Script from the Table Designer menu. You find the full output from SSMS in DDL_demo1.sql (opens in a separate window like all other scripts in this article). For readers who are reading this on the beach or in some other offline environment, here is a more compact version where I have removed some statements and clauses not relevant to this article:

/* To prevent any potential data loss issues, you should review this script
 in detail before running it outside the context of the database designer.*/
BEGIN TRANSACTION
SET QUOTED_IDENTIFIER ON
SET ARITHABORT ON
SET NUMERIC_ROUNDABORT OFF
SET CONCAT_NULL_YIELDS_NULL ON
SET ANSI_NULLS ON
SET ANSI_PADDING ON
SET ANSI_WARNINGS ON
COMMIT
BEGIN TRANSACTION
GO
ALTER TABLE dbo.Main DROP CONSTRAINT fk_Main_Lookup
GO
COMMIT
BEGIN TRANSACTION
GO
CREATE TABLE dbo.Tmp_Main (KeyCol int NOT NULL,
                           LookupId int NOT NULL,
                           Constrained int NULL,
                           DataCol int NULL)  ON [PRIMARY]
GO
IF EXISTS(SELECT * FROM dbo.Main)
   EXEC('INSERT INTO dbo.Tmp_Main (KeyCol, LookupId, Constrained, DataCol)
     SELECT KeyCol, LookupId, Constrained, CONVERT(int, DataCol) 
     FROM dbo.Main WITH (HOLDLOCK TABLOCKX)')
GO
ALTER TABLE dbo.Sub DROP CONSTRAINT fk_Sub_Main
GO
DROP TABLE dbo.Main
GO
EXECUTE sp_rename N'dbo.Tmp_Main', N'Main', 'OBJECT' 
GO
ALTER TABLE dbo.Main ADD CONSTRAINT pk_Main PRIMARY KEY (KeyCol)
GO
CREATE NONCLUSTERED INDEX LookupId_ix ON dbo.Main (LookupId)
GO
ALTER TABLE dbo.Main ADD CONSTRAINT
        ck_constrained CHECK (([Constrained]>(0)))
GO
ALTER TABLE dbo.Main ADD CONSTRAINT fk_Main_Lookup 
   FOREIGN KEY(LookupId) REFERENCES dbo.Lookup(LookupId)
GO
CREATE TRIGGER Audit_Main ON dbo.Main FOR INSERT, UPDATE AS
   INSERT Audit(TableName, KeyValue1, OtherCols)
      SELECT 'Main', ltrim(str(KeyCol)), 
             (SELECT LookupId, UselessCol, Constrained, DataCol
              FOR    XML RAW('Data'))
      FROM   inserted
GO
COMMIT
BEGIN TRANSACTION
GO
ALTER TABLE dbo.Sub ADD CONSTRAINT fk_Sub_Main 
   FOREIGN KEY(KeyCol) REFERENCES dbo.Main(KeyCol)
GO
COMMIT

The change in Main is implemented by first creating a new table Tmp_Main and then copying over data from Main to the new table. Later Main is dropped whereupon Tmp_Main is renamed to Main. The constraints, the index and the trigger are all recreated. As noted earlier, these changes could be implemented with ALTER TABLE DROP COLUMN and ALTER TABLE ALTER COLUMN. However, the crippled Table Designer only uses ALTER TABLE when the command would work on SQL Server 6.5! Nevertheless, there are many situations where ALTER TABLE is not possible or feasible and where you have to perform this copy-dance, so it is a technique you need to master.

To complete the implementation of our intended DDL change, the first thing we should do is to remove UselessCol from the trigger.

While we are it, we take the occasion to address one of the flaws in the Table Designer. Note that the script has no less than four transactions. This is the reason why we lost one of the foreign keys in our first attempt to save the change; the FK was dropped in the second transaction which was committed before any of the errors occurred. (Since the table is dropped later, there is no reason to drop the FK explicitly, but the Table Designer is, as I said, dreadful.) Thus, delete all but the first BEGIN TRANSACTION and the last COMMIT. This results in DDL_demo2.sql, where I have highlighted the changes from the original script. DDL_demo2.sql is the starting point for the rest of this chapter.

Before we move on, permit me a general discussion on transactions and change scripts. It may seem natural to have a transaction around a change like this one, but it is not always the best choice. Say that you need to change a dimension key in a 1 TB fact table from tinyint to smallint. While this is possible with ALTER TABLE, your transaction log will grow to extremes. An alternative is to take the long way, copying the data from the old version of the table to the new version in batches of, say, 100000 rows at a time without a transaction, so that the log can be truncated by backup or checkpoint while your script is running. In this case, there would not be much error handling in the change script. If things go wrong, you restore a backup – or, if you are on Enterprise Edition, you revert from a database snapshot. (In the latter case, you need to be aware of this ugly bug which resets the log file half a mega-byte.)

For this article, I'm assuming that we want a transaction around our table change to make sure that we don't leave the database in an inconsistent state, and for this we need robust error handling.

Trying the Script from the Table Designer

Copy DDL_demo2.sql to the end of the same window where you have the creation script for the DDL_Demo database. Add these two statements at the end for testing purposes:

SELECT * FROM Main
SELECT * FROM sys.foreign_keys

You can run the entire query window to test the change script. You will find that it runs without error and that the data in Main is still there, as are the foreign keys.

Mission completed? Not really. In the creation script for DDL_Demo, add this INSERT statement to the other INSERT statements:

INSERT Main (KeyCol, LookupId, UselessCol, DataCol)
   VALUES (8, 2, 1/pi(), replicate('4', 12))

Make sure that you have output set to text and re-run the query window. This time there is an error message:

Msg 248, Level 16, State 1, Line 1

The conversion of the varchar value '444444444444' overflowed an int column.

The statement has been terminated.

Caution: Changing any part of an object name could break scripts and stored procedures.

KeyCol LookupId Constrained DataCol

----------- ----------- ----------- -----------

(0 row(s) affected)

name

-------

fk_Main_Lookup

fk_Sub_Main

Not only is there an error: Main is empty. If you look closer, you can tell that this is the new version of Main, since UselessCol is absent. And if you try to find the data in Tmp_Main, you find that this table is no longer there.

Imagine now that the first execution, where everything went OK, was in your test database. The second execution, on the other hand, was in your production database which had bad data you had not foreseen. Because of the lack of error handling in your change script, you have caused data loss, and you will have to restore a backup. Certainly not a situation you want to be in.

Overflow is a statement-terminating error, so the transaction continues on the next statement. All through these articles I have been driving the point of using SET XACT_ABORT ON to make most errors batch-terminating. Sounds like a good bet? Well, add SET XACT_ABORT ON to the existing suite of SET statements and run again. This time we get a whole slew of error messages:

Msg 248, Level 16, State 1, Line 1

The conversion of the varchar value '444444444444' overflowed an int column.

Msg 15248, Level 11, State 1, Procedure sp_rename, Line 357

Either the parameter @objname is ambiguous or the claimed @objtype (OBJECT) is wrong.

Msg 4902, Level 16, State 1, Line 1

Cannot find the object "dbo.Main" because it does not exist or you do not have permissions.

Msg 1088, Level 16, State 12, Line 1

Cannot find the object "dbo.Main" because it does not exist or you do not have permissions.

Msg 4902, Level 16, State 1, Line 1

Cannot find the object "dbo.Main" because it does not exist or you do not have permissions.

Msg 4902, Level 16, State 1, Line 1

Cannot find the object "dbo.Main" because it does not exist or you do not have permissions.

Msg 8197, Level 16, State 4, Procedure Audit_Main, Line 1

The object 'dbo.Main' does not exist or is invalid for this operation.

Msg 1767, Level 16, State 0, Line 1

Foreign key 'fk_Sub_Main' references invalid table 'dbo.Main'.

Msg 1750, Level 16, State 0, Line 1

Could not create constraint. See previous errors.

Msg 3902, Level 16, State 1, Line 1

The COMMIT TRANSACTION request has no corresponding BEGIN TRANSACTION.

Msg 208, Level 16, State 1, Line 1

Invalid object name 'Main'.

As the error messages indicate, the table Main is now gone entirely and there is a pale shade on your face. For a moment, you sense a glimpse of hope, thinking that the Tmp_Main table must still be there, but, no, it's gone too. So, what happened?

This time the error aborted the batch and rolled back the transaction, including the creation of Tmp_Main. But it was only the particular batch that copies the data that was aborted, not the entire script. Keep in mind that all these GO are intercepted by SSMS, and SSMS sends batch by batch to SQL Server. And SSMS does not understand that it should stop sending batches because there was an error. Thus, the transaction was rolled back, but all batches after the failing INSERT batch were executed and committed, including the fatal DROP TABLE dbo.Main.

Note: when the change script is run directly from the Table Designer, the Table Designer is smart enough to stop execution on the first error, but that helps little since you cannot change the script in this case.

Using SQLCMD

Clearly, we need more sophisticated error handling. What are our options? We are here at a fork in the road. If you know for sure that you change script will only be executed through SQLCMD or OSQL (for instance, because there is a setup package that will invoke SQLCMD), your prospects are not too bad. You need to splice in this batch after each batch in the change script:

IF @@error <> 0 OR @@trancount = 0
   RAISERROR('Error occurred or transaction aborted', 16, 127)
GO

The key here is that the State argument to RAISERROR is 127; this causes SQLCMD to exit immediately. You find such a script in DDL_demo3.sql with the changes highlighted. (The repeated change has a lighter highlighting to help you to find the important changes.) I have also added SET XACT_ABORT ON in the beginning of the script.

The purpose of checking @@trancount is obvious: if the transaction has been rolled back, we should not continue. However, this is not sufficient, since syntax errors are not batch-aborting even when XACT_ABORT is ON. And syntax errors in DDL scripts are not unheard of. For this reason there is also check on @@error. (I will have to admit that it makes me a little uneasy, because I am not sure that there is a guarantee that this volatile variable will survive between batches.)

Copy DDL_demo3.sql to a local file on your disk, clear the query window with the creation script for DDL_Demo from the miserable contents of DDL_demo2.sql and recreate the database. Now open a command-line window and run:

SQLCMD -d DDL_Demo -i DDL_demo3.sql

(You may have to add -S to specify server, and if you use SQL authentication you also need to add -U and -P.) The output on my machine is:

Msg 248, Level 16, State 1, Server NATSUMORI\SQLXI, Line 1

The conversion of the varchar value '444444444444' overflowed an int column.

Msg 50000, Level 16, State 127, Server NATSUMORI\SQLXI, Line 2

Error occurred or transaction aborted

If you run SELECT * FROM Main, you see that all data is still there. Thus, here we have a working method, at the price that the script is verbose.

But, here are some good news! It is not too difficult to add all those error-checking batches in an automated fashion. As an exercise, copy DDL_demo2.sql to a query window in SSMS. Press Ctrl-H to open the Find/Replace window. At the bottom check Use and select Regular Expressions. In Find what enter ^GO\n and in Replace with enter GO\nIF @@error <> 0 OR @@trancount = 0\n RAISERROR('Error occurred or transaction aborted', 16, 127)\nGO\n. Save the modified query window to disk and compare to DDL_demo3.sql. You will see that the results are largely the same and if you manually change the indentation of the RAISERROR statement, it gets even better.

And it does not stop there. Say that you want to work more with the logic in your script as such, and you don't want all this error-checking cluttering your view. You can easily get rid of it by reversing the Find/Replace operation. There is however some trickery with the regular expressions, so the actual search expression for Find what should be ^GO\nIF \@\@error \<\> 0 OR \@\@trancount = 0\n *RAISERROR$'Error occurred or transaction aborted', 16, 127$\nGO\n. (The ^ represents beginning of line, and the @, <> and () characters have special meaning in regexps and must be escaped. Note that if you use a different text editor, the exact rules for what you need to escape may be different.) In Replace with enter GO\n.

Running Change Scripts in SSMS

Unfortunately, the scheme in DDL_demo3.sql does not work if you want to be able to run the script from SQL Server Management Studio (or you cannot exclude the possibility that someone will do that) because SSMS does not react on state 127. One option is change the RAISERROR statement to:

RAISERROR('Error occurred or transaction aborted', 20, 127) WITH LOG

By specifying the severity level 20, you abort the connection. However, to raise an error on that level you need to use WITH LOG, and this requires membership in sysadmin or at least the permission ALTER TRACE. If the script is to be executed in a database that lives on a shared server by an application admin, that person may not have that permission. And even if he has, the DBA may have set up alerts for high-severity errors and will not appreciate the joke. No, we need something else.

One possibility can be seen in DDL_demo4.sql. Here I have again used my favourite text editor for some Find/Replace (starting from DDL_demo2.sql) and the script goes like this:

IF @@trancount > 0
ALTER TABLE dbo.Main
	DROP CONSTRAINT fk_Main_Lookup
GO
IF @@error <> 0 AND @@trancount > 0 ROLLBACK TRANSACTION
GO
IF @@trancount > 0
ALTER TABLE dbo.Lookup SET (LOCK_ESCALATION = TABLE)

That is, after each batch, I check @@error and in case of an error I roll back any transaction that is still active. Then I start the next batch with a check on @@trancount, and I only perform the command if the transaction is still alive. There are however some caveats here. One is that if a batch is actually a multi-statement batch, I need to wrap the statements in BEGIN-END, for instance:

IF @@trancount > 0
BEGIN
ALTER TABLE dbo.Main
	DROP CONSTRAINT fk_Main_Lookup
ALTER TABLE dbo.Main
    DROP CONSTRAINT some_other_constraint
END
GO

This can be automatically with some more Find/Replace skills, although as you see the lack of indentation makes the script difficult to read. The next caveat can be demonstrated by pasting in DDL_demo4.sql into the query window with the creation script for the DDL_Demo database and execute the resulting query window. (Feel free to add diagnostic queries, if you like): The output:

Msg 248, Level 16, State 1, Line 1

The conversion of the varchar value '444444444444' overflowed an int column.

Msg 2714, Level 16, State 2, Procedure Audit_Main, Line 1

There is already an object named 'Audit_Main' in the database.

Beside the expected conversion error, there is one more error message, caused by the trigger. Since CREATE TRIGGER must be alone in a batch, I had to manually remove the check on @@trancount for this batch. The error message as such is benign, because all that happens that the deployment of the new version of the trigger fails, which is exactly what we want at this point. (Since the old version of the table is still there.)

However, imagine now that you had implemented the changes with ALTER TABLE DROP COLUMN and ALTER TABLE ALTER COLUMN and the latter statement would have failed with the conversion error. In this case, the correct statement to update the trigger would have been ALTER TRIGGER. And this statement would have succeeded. Say now that the deployment of the changes was cancelled because of the encounter of the bad data. If no one thinks of the trigger, this means that the system will jog along without UselessCol being audited.

You may argue that I have created a corner case: dropping a column is not a very common action. A more likely action is to add a new column, and the trigger would be augmented with this new column. Assuming now that the ALTER TABLE ADD COLUMN was before the failing ALTER TABLE ALTER COLUMN in the change script, the ALTER TRIGGER script would fail because the new column would have been missing after the rollback. However, I only used the trigger as an example to keep the demo down in size. Assume that the package includes a new table as well as new versions of a couple of stored procedures that refers to this new table. The change script fails because of bad data, but thanks to deferred name resolution the ALTER PROCEDURE statements for the new versions of the procedures succeed without errors. Then on Monday morning after the cancelled database upgrade, users get errors all over the place because of missing tables or columns.

Or in another words, we need a clever way to prevent this from happening.

I looked at the change scripts generated by SQL Server Data Tools (SSDT), which is far better product from Microsoft than the Table Designer (and it's free!), and inspired by their technique I composed DDL_demo5.sql, again using my Find/Replace skills with manual additions. In the beginning of the script, I added this statement:

SET CONTEXT_INFO 0x

Then I spliced in a number of batches with this text:

IF @@error <> 0 AND @@trancount > 0 ROLLBACK TRANSACTION
IF @@trancount = 0 BEGIN SET CONTEXT_INFO 0x01 BEGIN TRANSACTION END
GO

And I changed the text of the final batch to read:

IF context_info() IS NULL
   COMMIT
ELSE
   ROLLBACK TRANSACTION

The basic idea is that if an error occurs, we roll back the transaction. But to make sure that successive batches do not cause damage, we start a new transaction. We also set an error flag, and if this error flag is set at the end of the script, we roll back, else we commit. In their scripts, SSDT uses a temp table for the flag; I decided to use context_info instead. The command SET CONTEXT_INFO permits you to set a process-global varbinary(128) value that you can retrieve with the function context_info(). There is a funny quirk that SET CONTEXT_INFO NULL is not permitted, but if you use 0x, context_info() returns NULL.

If you paste in DDL_demo5.sql into the window with the creation script (make sure that you first delete the contents of DDL_demo4.sql) and run the window, you get this output:

Msg 248, Level 16, State 1, Line 1

The conversion of the varchar value '444444444444' overflowed an int column.

Msg 15248, Level 11, State 1, Procedure sp_rename, Line 357

Either the parameter @objname is ambiguous or the claimed @objtype (OBJECT) is wrong.

Msg 4902, Level 16, State 1, Line 1

Cannot find the object "dbo.Main" because it does not exist or you do not have permissions.

Msg 1913, Level 16, State 1, Line 1

The operation failed because an index or statistics with name 'LookupId_ix' already exists on table 'dbo.Main'.

Msg 2714, Level 16, State 5, Line 1

There is already an object named 'ck_constrained' in the database.

Msg 1750, Level 16, State 0, Line 1

Could not create constraint. See previous errors.

Msg 2714, Level 16, State 5, Line 1

There is already an object named 'fk_Main_Lookup' in the database.

Msg 1750, Level 16, State 0, Line 1

Could not create constraint. See previous errors.

Msg 2714, Level 16, State 2, Procedure Audit_Main, Line 1

There is already an object named 'Audit_Main' in the database.

Msg 2714, Level 16, State 5, Line 1

There is already an object named 'fk_Sub_Main' in the database.

Msg 1750, Level 16, State 0, Line 1

Could not create constraint. See previous errors.

That is, a whole slew of error messages. But when you check Main, all columns and data are there, and if you test inserting a row, you find that all columns are audited, including UselessCol.

We get this deluge of error messages because we execute all batches. Clearly, this is quite noisy and a better solution may be to combine the strategies of DDL_demo4.sql and DDL_demo5.sql, but I leave it as an exercise to the reader to explore this option.

Have we arrived at a good solution? Yes and no. Yes, because if we want a T‑SQL-only solution that we can run from SSMS, this is as good as it can get. But the resulting script is complex, and more difficult to revert to the original script than DDL_demo3.sql, the solution for SQLCMD. The true answer is that running multi-batch change scripts from SSMS is not the best of ideas in the first place. Despite this, I have given quite some coverage to this situation, since I suspect that a lot of people out there will use SSMS anyway.

Single-Batch Scripts

If we could make the script into a single batch, we would not have to worry about SSMS continuing with next batch after an error. Is this a viable option? It is possible to have multiple DDL statements in a single batch, if we wrap them in dynamic SQL one way or another so that the statements are not compiled until the previous statements have executed.

The first example script is DDL_demo6.sql which I have created manually from DDL_demo2.sql. I have wrapped all statements in EXEC() with the exception of the call to sp_rename; since this is stored procedure, there are no compilation problems with this statement. The code generated by the Table Designer for some reason wraps the INSERT which copies data from Main to Tmp_Main in EXEC(); I extended that EXEC() to include the IF EXISTS for that batch. I also took the occasion to clean up the formatting generated by the Table Designer a little. Despite this, the script is fairly ugly. Here is an excerpt:

SET XACT_ABORT ON
...
BEGIN TRANSACTION

EXEC ('ALTER TABLE dbo.Main DROP CONSTRAINT fk_Main_Lookup')
IF @@error <> 0 GOTO BitterEnd

EXEC ('ALTER TABLE dbo.Lookup SET (LOCK_ESCALATION = TABLE)')
IF @@error <> 0 GOTO BitterEnd
...
COMMIT
BitterEnd:
IF @@trancount > 0 ROLLBACK TRANSACTION

The script has SET XACT_ABORT ON, but since syntax errors do not abort the batch, we need to check @@error after each EXEC to be able to divert execution to the final ROLLBACK in case of an error. I will have to admit that I don't know whether checking @@error is sufficient for the call to sp_rename.

The script DDL_demo7.sql is considerably cleaner, but it only runs on SQL 2012 and later. DDL_demo7.sql uses TRY-CATCH, and the CATCH handler uses ;THROW, so I am guaranteed to get all error messages if a DDL statement fails. This permits me to skip all those checks on @@error. An excerpt:

SET XACT_ABORT ON
...
BEGIN TRY

   BEGIN TRANSACTION
   EXEC ('ALTER TABLE dbo.Main DROP CONSTRAINT fk_Main_Lookup')

   EXEC ('ALTER TABLE dbo.Lookup SET (LOCK_ESCALATION = TABLE)')
   ...
   COMMIT
END TRY
BEGIN CATCH
   IF @@trancount > 0 ROLLBACK TRANSACTION
   ; THROW
END CATCH

There is a problem that is common to both DDL_demo6.sql and DDL_demo7.sql. If you put them after the end of the creation script for the DDL_Demo database and run the query window, you get this output with both scripts:

Msg 248, Level 16, State 1, Line 2

The conversion of the varchar value '444444444444' overflowed an int column.

If you double-click on the error message, you are not taken to the right place. This does not happen with scripts 4 and 5, because SSMS knows which batch that produced the error and takes you to the right line. In a longer change script, this can cause some headache, because you only know that something went wrong, but not where. To be fair, the same issue exists with the SQLCMD version, DDL_demo3.sql, since SQLCMD reports numbers related to the batch, not to the script you feed it.

The next script, DDL_demo8.sql, addresses the issue with the line numbers. It includes this helper procedure:

CREATE PROCEDURE #exec @sql nvarchar(MAX) AS
  PRINT @sql
  EXEC (@sql)
  IF @@error <> 0
  BEGIN
     IF @@trancount > 0 ROLLBACK TRANSACTION
     ; THROW 50000, 'Some error occurred, script aborted.', 1
--     SELECT 1/0
  END
go

And the main part of the script goes:

BEGIN TRANSACTION
EXEC #exec N'ALTER TABLE dbo.Main DROP CONSTRAINT fk_Main_Lookup'

EXEC #exec N'ALTER TABLE dbo.Lookup SET (LOCK_ESCALATION = TABLE)'
...
COMMIT
go

The procedure #exec prints the command before executing it, so that you know where in the change script where you are. If there is a syntax error (or any other error that does not abort the batch), the procedure rolls back the transaction and produces an error that is known to abort the batch when XACT_ABORT is ON. On SQL 2012, using ; THROW comes naturally as you can produce an explaining message. On earlier versions, you will have to cheat with some other statement, since RAISERROR does not cut it.

The output from DDL_demo8.sql (on SQL 2012 using ;THROW) is:

ALTER TABLE dbo.Main DROP CONSTRAINT fk_Main_Lookup

ALTER TABLE dbo.Lookup SET (LOCK_ESCALATION = TABLE)

CREATE TABLE dbo.Tmp_Main

(

KeyCol int NOT NULL,

LookupId int NOT NULL,

Constrained int NULL,

DataCol int NULL

) ON [PRIMARY]

ALTER TABLE dbo.Tmp_Main SET (LOCK_ESCALATION = TABLE)

IF EXISTS(SELECT * FROM dbo.Main)

INSERT INTO dbo.Tmp_Main (KeyCol, LookupId, Constrained, DataCol)

SELECT KeyCol, LookupId, Constrained, CONVERT(int, DataCol)

FROM dbo.Main WITH (HOLDLOCK TABLOCKX)

Msg 248, Level 16, State 1, Line 2

The conversion of the varchar value '444444444444' overflowed an int column.

The final option we will look at is using the CMD Catch Handler. The advantage with this option is that the error message is logged in the sqleventlog table together with the text for the batch that produced the error. (The command text is in the cmdtext column.) This can be useful if you have a long script consisting of changes that are relatively independent from each other. Rather than aborting on the first error, you can let the script move on to the next section, with each section being its own transaction. Whether this is a good approach in a production database can be disputed, but it is certainly helpful when you develop a change script and test it out the first few times, as you can spot many errors in the same test run.

The script DDL_demo9.sql uses the CMD Catch Handler. You cannot run this script in the same way as the other scripts. You first need to create the DDL_Demo database. Next configure the BAT files build_sqleventlog.bat and build_cmd_catchhandler.bat to run in the DDL_Demo database and then execute them to install SqlEventLog and the CMD Catch Handler in this database. Once this is done, you can run DDL_demo9.sql.

Here is an excerpt of the script:

SET XACT_ABORT OFF
...
DECLARE @ret int
SELECT @ret = 0

BEGIN TRANSACTION
   IF @ret = 0 EXEC @ret = slog.cmd_catchhandler_sp
'ALTER TABLE dbo.Main DROP CONSTRAINT fk_Main_Lookup'

   IF @ret = 0 EXEC @ret = slog.cmd_catchhandler_sp
'ALTER TABLE dbo.Lookup SET (LOCK_ESCALATION = TABLE)'
...
IF @ret = 0
   COMMIT TRANSACTION
ELSE
   ROLLBACK TRANSACTION

The first thing to note is that the script has SET XACT_ABORT OFF. The setting XACT_ABORT ON is not very useful with the CMD Catch Handler, because it only aborts the batch submitted by the CMD Catch Handler and not the entire script. On top of that, if the batch submitted by the CMD Catch Handler attempts to roll back the transaction, this produces a new error because the CLR is not permitted to roll back a transaction started in T‑SQL. (This is something I discuss further in Appendix 2.)

The recommended error checking with the CMD Catch Handler is to use the return value, which makes the script tedious, but I've tried to use some clever formatting to hide the calls to the cmd_catchhandler_sp to give focus to the actual work in the script.

The output from DDL_demo9.sql is:

Msg 50000, Level 16, State 1, Line 1

{248} The conversion of the varchar value '444444444444' overflowed an int column.

The statement has been terminated.

The line number is not very helpful, but as noted the statement that caused the error is available in the column slog.sqleventlog.cmdtext.

The Final Solution

There is a distinct drawback with all these single-script solutions that I have glossed over so far, but which the reader might already have realised: the trigger. Here is how it looks in DDL_demo8.sql:

EXEC #exec N'CREATE TRIGGER Audit_Main ON dbo.Main FOR INSERT, UPDATE AS
   INSERT Audit(TableName, KeyValue1, OtherCols)
      SELECT ''Main'', ltrim(str(KeyCol)),
             (SELECT LookupId, Constrained, DataCol
              FOR    XML RAW(''Data''))
      FROM   inserted'

Or more precisely: it's not a big deal to wrap ALTER TABLE commands into string literals, and the same is true for simple batches to copy data from one table to another. It is a different thing with this trigger where I had to double all single quotes. For this short trigger that may be alright, but it is completely out of the question for a 1000-line stored procedure, least of all if the procedure itself uses dynamic SQL.

And this sort of brings us to the end of the road. Handle database changes in a T‑SQL script with no other aid is simply not a scalable way to run database upgrades. Of all the scripts I have presented here, the only one that has some viability beyond simple demos is DDL_demo3.sql, because the error checking is not too intrusive, and it can be added and removed with Find/Replace without manual editing. It's beyond the scope of this article, but a presumption is that you have the code for stored procedures, triggers etc in separate files and you include these with :r in your SQLCMD script.

There is a final option, which in my opinion is the superior: Run the DDL scripts from a purpose-written client (in C#, PowerShell, VBscript or whatever you prefer). It could be a general tool that reads and executes scripts, maybe as listed in a master file. Or it could be something that you write and compile every time from a template and where you embed the DDL inside it and read the code for stored procedures etc from files.

Such a client can easily trap errors and rollback transactions, and you can keep the DDL scripts clean. The client can also log errors to SqlEventLog if you fancy. Such a client has to parse out the batches in a script file from the GO separators (recall that GO is not a T‑SQL command), but that is dirt simple with one small qualification. Consider:

SELECT 'This is a string literal with a 
GO
in the middle of it'
go
/* And this is a comment with a 
GO
in the middle of it. */

This script executes successfully in SSMS as well as Azure Data Studio. But when I run it from Query Analyzer that shipped with SQL 2000, I get errors, because QA looks for GO without considering that the GO may be inside a string literal or a comment. Which I think is a perfectly acceptable restriction for a home-written tool, so there is no need figure out how to solve this. Just look for GO alone on a line.

A simple script client would abandon execution as soon as there is an error (and thus roll back the transaction) to avoid further damage. If you deploy larger sets of changes, you may want the script tool to continue with the next "package" in which case you need a way to define packages that can commit independently.

No, I don't have such a tool for you. I considered writing something fairly simple as an outline, but this article is already long enough. Furthermore, the whole task of composing change scripts and database upgrades is a complex affair, and there are several aspects I have not touched in this chapter since they go far beyond the topic of error handling.

I have written this section on error handling with DDL with the focus on scripts that are submitted from SSMS, because as I said earlier, this is what many people do in practice, although I don't think it is a good idea. But at least I have made you aware of the pitfalls. And, that should not be forgotten, as long as you take a backup before you start the change script and you can accept to use RESTORE as a means of recovery in case of an error, you don't really have to be worried.

A final word: Stay away from that Table Designer! (There are flaws with the Table Designer that I did not cover in this chapter.)

Error Handling in Client Code

Yes, you should have error-handling of your data-access code in your client programs. Given the large number of environments out there, a complete text on this topic would be an article on its own. Not that I would be in position to write that article, since I mainly write database code, and I am not familiar with all APIs. For this reason, I will only give you a brief overview to point out a few things from the perspective of a database guy.

Overview of Error-Handling Mechanisms

Depending on your environment and API, the way to detect that a call to SQL Server raised an error is different. I have encountered three different mechanisms:

Exception.
Callback.
Return status.

In modern environments like .NET (and I assume Java), any error message from SQL Server generates an exception in your client code. One advantage with this is that if you make no precautions at all, your program will stop and any transaction will be rolled back. That is, there is no risk that you jog along after an incomplete update, just because you were on auto-pilot. More generally, the exception mechanism permits you to write code that focuses on the main logic, because you know that if something goes wrong, you will end up in the exception handler. However, as we shall see, this model has its limitations, at least the way it works in SqlClient.

The callback mechanism means that you register a routine that the API calls in case of an error. The callback routine may decide to crash the program or set some global flag that the main code can check for. Or, for that matter, if the environment permits it, the callback can raise an exception. The advantage with callbacks over pure exceptions is that they give you more flexibility what to do. You may think that this is not a very common mechanism. However, it was the model in DB-Library, the original client API for SQL Server. It is also the model in Win32::SqlServer an API I have developed for Perl. The callback model is also offered by SqlClient, as an alternative to the exception mechanism, and we will look a little closer into this later.

Return status is used in APIs designed for languages like C where there is no exception mechanism. In such environments it is essential to always check the return code, which makes the code a lot more tedious. Examples of such APIs are ODBC and OLE DB.

What You Should Do in Case of an Error

Your gut reaction when a database call produces an error should be 1) Make sure that there is no active transaction. 2) Find safe ground. 3) Communicate the error. That is, your error-handling code should submit this to the database:

IF @@trancount > 0 ROLLBACK TRANSACTION

(If you are using a transaction object in your client API, you should perform the rollback through the API.) Recall, that there are several situations where SQL Server may start a transaction, and in case of an error abort execution without rolling back the transaction:

The ;THROW command and other new errors in SQL 2012, as well as errors raised by In-Memory OLTP in SQL 2014.
Compilation error at run-time in a top-level routine.
Query timeout.

If the database code runs with SET XACT_ABORT ON and has proper error handling, you are protected from orphaned transactions for the reasons above, but this nothing you should assume in your data-access code. And if nothing else, there may be a mismatch of BEGIN/COMMIT TRANSACTION in the SQL code.

The code to roll back should itself be error-handled in the sense that if it fails, it should not cause the application to crash. That is, in an environment with an exception mechanism you should have an empty catch block or similar. How could the rollback fail? Elementary, my dear Watson. The error might have implied that the communication with SQL Server was lost and your channel is dead.

It is also worth pointing out that simply disconnecting is not enough, as in most APIs a disconnection means that the connection goes back to the pool without any communication with SQL Server until the connection is reused. Which means that until then you may be holding locks that block other processes.

Once you have rolled back, you should take your application back to safe ground, as well as communicating the error in one way or another. What is safe ground depends on the type of application. For a simple console program, you may just stop execution. On the other hand, for a GUI application it is completely impermissible to crash the application because a database call failed. (You think that I am pointing out the obvious? I have had some battles with the developers of the system I work with...) For a GUI application, safe ground is likely to be the same state the form was in before the user pressed Save. Overall, all database-access code should be defensive: assume that any database call can go wrong – and most of them will sooner or later.

For some errors like deadlock, you may want to retry immediately. This requires that your safe ground is right there. As I discussed in the section on retry logic you cannot retry if you already were in a transaction. And, if the stored procedure you call performs updates, it is a little iffy, because you don't know if the deadlock happened in some code that was executed after update was committed. A retry could cause a double update. It may be better to refresh the screen and give the user a message that the update might have failed.

And, finally, don't forget to communicate the error, one way or another. If you prefer to give the users a generic message, because you want to protect them from the mumbo-jumbo produced by SQL Server, make sure that you write the error message somewhere you easily can find it when you get the bug report, be that Windows event log, a table like sqleventlog or something else.

Query Timeouts

The only thing that is different with query timeouts from other errors that can occur with a database call is that the query timeout comes from within the API. It should be handled in the same way as above.

Don't forget that the query timeout is something you can control in your API. Exactly in what matter, depends on your API. For instance, in SqlClient in ADO .NET, you use the CommandTimeout property on the SqlCommand object. Many APIs have a default timeout of 30 seconds, which depending on your application could be far too long, or ridiculously short. If you have no special requirements, set the timeout to 0, that is, wait forever. For a slow-running but important report that can be the difference between an irritating performance problem and an emergency situation when the users can not get their data.

ADO .Net and SqlClient

SqlClient offers two models for error handling: exceptions and callbacks. The default is that errors from SQL Server throw an exception, while messages with a severity < 11 invoke the event handler for the InfoMessage event if you have registered one. But you can alter this behaviour by setting the connection property FireInfoMessageEventOnUserErrors. When this property is set, errors up to severity 16 will invoke the InfoMessage event handler. Only errors with a severity ≥ 17 will throw an exception.

As noted above, the exception model is very good for simple error handling, where any error means that you want interrupt the current flow of statements, clean up (i.e. make sure that you have rolled back), and return to safe ground. If there would happen to be any result sets after the error message, you don't care about these.

However, sometimes you may for one reason or another want to have the complete story of what happened in SQL Server. That is, you want all error messages, and you want all result sets. Maybe because your application is just a monitor, maybe you need some really fancy error handling. In this case, the exception model in SqlClient will not fit you well. Once SqlClient has raised an exception, it does not seem to return any result sets; at least I have not been successful in my testing. This is where the callback model is much better. As long as the error has a severity ≤ 16, the error message will be passed to your handler for the InfoMessage event and you can print or log the message just like any other informational message from SQL Server. Errors with a severity of 17 or higher will still cause SqlClient to throw an exception, but since errors with such a high severity typically aborts the batch, this means that the exception is that last piece of information that comes from SQL Server anyway.

Now, you might expect me to show some C# code that demonstrates the differences between the exception model and the callback model, but I'm a little lazy and will use an application you have within reach, to wit SQL Server Management Studio. SSMS is a typical example of a monitoring application; you would not want SSMS to stop to return data just because there was an exception. Thus, SSMS sets the property FireInfoMessageEventOnUserErrors.

Run this in SSMS (in text mode):

RAISERROR ('Ladies and Gentlemen,', 16, 1)
SELECT 200 AS Motels
RAISERROR ('Life on the road', 16, 1)

The output is:

Msg 50000, Level 16, State 1, Line 1

Ladies and Gentlemen,

Motels

-----------

200

(1 row(s) affected)

Msg 50000, Level 16, State 1, Line 3

Life on the road

Which of course is exactly what you expect to see. But change the severity level for the first message (but not the second) to 17, and you get:

(1 row(s) affected)

Msg 50000, Level 16, State 1, Line 3

Life on the road

Msg 50000, Level 17, State 1, Line 1

Ladies and Gentlemen,

The result set has disappeared, and the order of the messages is reversed! It is not SQL Server that is pulling your legs, but this is due to that SSMS gets the messages on different channels. SqlClient sends the level 16 error to the InfoMessage event handler more or less directly, while it waits until the end to throw an exception for the level 17 error and for some reason it withholds the result set in this case. (For a little deeper discussion on this, see the discussion about the implementation of the CMD Catch Handler in Appendix 2; the CMD Catch Handler uses the FireInfoMessageEventOnUserErrors property.) This behaviour is not really good for a query tool, but since errors with level 17 normally abort the batch it is not much of an issue. As long as you don't use level 17 or higher with RAISERROR yourself, that is.

If you try the example above in OSQL (ODBC) or SQLCMD (OLE DB with SQL 2005 and SQL 2008, ODBC in SQL 2012 and later), you get the output in the expected order. These APIs uses the return-code model where the client has to retrieve the messages from a certain structure, and all severity levels are handled in the same way.

A second advantage with the callback model is when you use RAISERROR WITH NOWAIT. With the callback model, the messages can be printed as they come through SQL Server, whereas with the exception model, you need to wait until the end. Again we can look at this with SSMS. Try this, and switch to the Message tab as soon as you have started the batch:

RAISERROR ('One', 16, 1) WITH NOWAIT
WAITFOR DELAY '00:00:02'
RAISERROR ('Two', 16, 1) WITH NOWAIT
WAITFOR DELAY '00:00:02'
RAISERROR ('Three', 16, 1) WITH NOWAIT
WAITFOR DELAY '00:00:02'

You will see that the messages come one by one. Change the severity to 17 and you get all messages at the end.

Finally, a word of caution. If you set FireInfoMessageEventOnUserErrors you should stick to the callback model. Don't get the idea that you should start to throw exceptions in the event handler. I tried it, and I was mighty puzzled. My fellow MVP John Huang used Reflection to find out what the SqlConnection object is doing: it drops the exception thrown by the delegate on the floor, with the exception of fatal errors like StackOverflow or OutOfMemory.

Other APIs

There are plenty of other APIs, but there are several of which I have too little experience to say anything about at all, for instance ODBC, JDBC and PHP. Others like DB-Library and OLE DB are deprecated for access to SQL Server, so it seems pointless to spend text on them. While I'm proud over my own Win32::SqlServer, it's only for Perl programmers, and if you are one, you can read the full story in the manual for the module.

That leaves only one API: ADO "Classic", which in opinion is not very classic at all. ADO has several problems, not the least with error handling. If errors occur in the phase where it interrogates SQL Server for the shape of the result set – which it often does without reason – it may drop that error on the floor. And if this happened inside a transaction, and the transaction was lost you will continue with first half of the transaction rolled back. That's horribly ugly, and I have been bitten more than once by this. For more details on old ADO, see this section in one of my old error-handling articles.

The Virtue of Getting All Result Sets

This section is about a trap that my long-time MVP colleague Dan Guzman made me aware of. Dan has blogged about the issue on his blog, but I also like to cover it here. The examples are in C# and .NET, but I would expect the issue to apply to all client APIs.

We will work with this SQL statement:

SELECT name, log(max_length) FROM sys.columns

If you run this statement in SSMS, you will get this error message:

Msg 3623, Level 16, State 1, Line 1

An invalid floating point operation occurred.

If you flip over to the Results tab, you will see that a number of rows have been returned. The SELECT statement fails on the first LOB column it encounters; for these columns, max_length is -1, which is an illegal input value for the log function. While the statement is nonsensical, it serves to illustrate the issue: You have a SELECT query that fails, after it has started to execute. Beside floating-point errors, errors that can cause a query to fail during execution include overflow, division by zero and conversion errors.

Let's embed this statement in a simple C# program which is an adaption from Dan's blog post. (You may have to change the connection string to fit your environment):

using System.Data.SqlClient;

public static class SqlErrorRepro {
   private static string sqlBatchText = @"
      SELECT name, log(max_length) FROM sys.columns";

   private static string connString =   
        @"Data Source=.;Integrated Security=SSPI;Database=tempdb";

   public static void Main() {
      try {
         using (SqlConnection cn = new SqlConnection(connString))
         using (SqlCommand cmd = new SqlCommand(sqlBatchText, cn)) {
            int rowCount = 0;
            cn.Open();
            using (SqlDataReader reader = cmd.ExecuteReader()) {
               while(reader.Read()) { ++rowCount; };
            }
            System.Console.WriteLine("{0} rows read", rowCount);
         }
      }
      catch (System.Exception ex){
         System.Console.WriteLine("ERROR: " + ex.Message);
      }
   }
}

The output is as expected:

ERROR: An invalid floating point operation occurred.

Now embed the query in TRY-CATCH. Of course, that's sort of a funny thing to do for a single query, but imagine that the query is part of a longer stored procedure where you use TRY-CATCH as matter of best practice.

using System.Data.SqlClient;

public static class SqlErrorRepro {
   private static string sqlBatchText = @"
      BEGIN TRY
         SELECT name, log(max_length) FROM sys.columns
      END TRY
      BEGIN CATCH
          ; THROW
      END CATCH";

   private static string connString =
        @"Data Source=.;Integrated Security=SSPI;Database=tempdb";

   public static void Main() {
      try {
         using (SqlConnection cn = new SqlConnection(connString))
         using (SqlCommand cmd = new SqlCommand(sqlBatchText, cn)) {
            int rowCount = 0;
            cn.Open();
            using (SqlDataReader reader = cmd.ExecuteReader()) {
               while(reader.Read()) { ++rowCount; };
            }
            System.Console.WriteLine("{0} rows read", rowCount);
         }
      }
      catch (System.Exception ex){
         System.Console.WriteLine("ERROR: " + ex.Message);
      }
   }
}

This time the output is:

60 rows read.

That is, there is no sign of the error. Now imagine that this occurs in a real application. Maybe the user immediately understands that something is wrong, because the user expects to see a lot more rows. But there is also a non-negligible risk that the user thinks he is looking at a complete and correct output. And from this output the user makes a business decision. Which maybe would have been completely different, if the user had seen the missing rows in the result set. That is, the business consequences could be quite grave.

To find a solution to this nasty situation, let's first understand why the addition to TRY-CATCH has this effect. When clients and SQL Server talk to each other, they use the TDS protocol. When an error occurs during the execution of a SELECT statement and it is not trapped in SQL Server by TRY-CATCH, the client API sees the stream of rows being interrupted by the error message, and thus the API can activate its error mechanism. But when the error in the SELECT statement is trapped in the CATCH handler, SQL Server sends a packet that signals that the SELECT statement has completed which marks the end of the result set, and this far the client API is happy. The key here is that the client API will not read further on the TDS stream, unless the calling program instructs it to. If the program simply closes the command object, the API just discards what SQL Server has sent without further analysis. On the other hand, if the program asks the API for the next result set, the client API will find the error message and sound the alarm.

Thus, the conclusion is that even if you expect only one result set from your stored procedure, you should make sure that your client code nevertheless retrieves all result sets by adding a call to NextResult as below:

using System.Data.SqlClient;

public static class SqlErrorRepro {
   private static string sqlBatchText = @"
      BEGIN TRY
         SELECT name, log(max_length) FROM sys.columns
      END TRY
      BEGIN CATCH
          ; THROW
      END CATCH";

   private static string connString =
        @"Data Source=.;Integrated Security=SSPI;Database=tempdb";

   public static void Main() {
      try {
         using (SqlConnection cn = new SqlConnection(connString))
         using (SqlCommand cmd = new SqlCommand(sqlBatchText, cn)) {
            int rowCount = 0;
            cn.Open();
            using (SqlDataReader reader = cmd.ExecuteReader()) {
               while(reader.Read()) { ++rowCount; };
               while(reader.NextResult());
            }
            System.Console.WriteLine("{0} rows read", rowCount);
         }
      }
      catch (System.Exception ex){
         System.Console.WriteLine("ERROR: " + ex.Message);
      }
   }
}

The output is again:

ERROR: An invalid floating point operation occurred.

That is, by adding one line of code to you can make sure that you don't miss any error messages from SQL Server. The line sifts through all result sets returned by the procedure, but without actually processing them. And, no, it is not very likely there is a performance issue with this. Not the least if the procedure only produces one result set, which is what you expected anyway.

In this example we worked with .NET and SqlClient, but this is something applies to about all client APIs out there. Admittedly, I have not tested them all, but all APIs I have worked with expose the same paradigm. One API call to start execution of the query batch. One call to fetch rows until there are no more. And then you need to request the next result explicitly. Unless you do that, you will not see the re-raised error from SQL Server. For one example with a different client API, see the chapter Errors While Returning a Result Set in Appendix 1, where I discuss this situation in the context of linked servers. Here the API is OLE DB, and not SqlClient.

Since SqlClient is widely used, it it is worth having a closer look at the various options, and there is one thing I particularly likely to raise a caution for. In the attached file AllResultSets.cs, you find a program that runs the query above in various ways with SqlClient. Beside the two variations with ExecuteReader, there are a few more, which I summarise here:

ExecuteScalar: Does not detect the re-raised error and returns successfully without raising an exception. Since ExecuteScalar is intended to be called when you expect a single value, this behaviour can be considered to be acceptable.
ExecuteNonQuery: Detects the re-raised error and throws an exception. (Exactly because it designed to find all errors of the batch that it executes.)
SqlDataAdapter.Fill(DataTable): This is the ugly one. It knows that it is supposed to fill only one DataTable, so it only reads the first result set and does not detect the re-raised error. Avoid this one, and use any of the two below instead.
SqlDataAdapter.Fill(DataSet): Will detect the re-raised error and throw an exception. (Quite naturally, since you ask for all result sets, when you pass a DataSet.)
DataTable.Load(SqlDataReader): This serves the same purpose as SqlDataAdapter.Fill(DataTable), but it is implemented differently, and it gets through all results sets (or at least past the first result set), and therefore it detects the re-raised error and throws an exception. I am under the impression that this method is less well-known and therefore not widely used. I was unaware of it myself, until I learnt it from Karen Payne as we were helping the same user in a forum thread. A second flip side of this call is that you can evade the need of having an SqlDataAdapter object. See the file AllResultSets.cs for an example of how to use it.

Conclusion, Feedback and Acknowledgements

You have now reached the end of Part Three. If you so wish, there are also three appendixes: one for linked servers, one for the CLR (which includes CLR-specific details on SqlEventLog the CMD Catch Handler), and one about Service Broker. If not, your journey through the amazing and frightening world of error handling in SQL Server is over. Or is it? Well, if you have decided that you make better friends with DB2 after all, it is. But else, no, your journey will continue with your daily chores of writing T‑SQL code. Hopefully, this series of articles have given you some insight on how you should implement your error handling, what you should do and what you should not. Undoubtedly, you will every once in while encounter some new confusing situation with regards to error handling that I have not covered in these articles. Maybe because I was not aware of it, or maybe because I could not just include all crazy things that I have seen. The total length of this series of articles is, well, scary.

But, please, if you run into something bewildering, let me know. And for that matter, if you have a question related to these articles, drop me a line on esquel@sommarskog.se. And it cannot be enough stressed, if you spot an error – and that most emphatically include spelling and grammar errors – don't hesitate to get in touch! On the other hand, if you have a question of a more general nature, for instance how you should solve a particular problem that you are facing, I strongly recommend that you post your question to a public forum, where more people can help you.

I have not written these articles in a vacuum, but all through the years there have been people who have provided valuable input. As it took quite a few years to complete this series, I'm afraid I may not remember each and every one who has helped me one way or another. If you feel excluded, please accept my apologies. Here are some people I like to thank for the input, help and ideas. My fellow SQL Server MVPs: Adam Machanic, Bob Beauchemin, Dan Guzman, Brad Schulz, John Huang, Paul Randall, Ola Hallengren, Ronen Ariely and Alex Kuznetsov. And not to forget the all the other people: wBob, Daniel Verzellesi, Paul Bradshaw, Jorge Inverso, Deepak Goyal, Carlos Fatureto, Dave Mason, Arnold Lieberman, Maria Zakourdaev, Nick Reilingh, Pietro Toniol, Paulo Santos, Robin Otte, Kurt Gunther, Karen Payne, Richard Bradley, Naomi Nosonovsky, Cathy Xu, Alexei Galeev and Reece Goding. And a special thanks to Geovanny Hernandez for providing a translation in Spanish of Part One and Alexey Guzev for the Russian translation.

Below follows a revision history for Part Three.

Revision History

2024-08-04

Dave Mason contacted me and informed me about the new location of his blog posts, and I have updated the beginning of the section Using Extended Events with the URLs to his posts.

2024-05-12

Arnold Lieberman alerted me that Dave Mason's blog posts on how to trap all errors from BACKUP/RESTORE no longer are available. He also shared his adaption of the Dave's solution. From Arnold's work, I made my own adaption, which I present in the new section Using Extended Events in chapter seven.

2022-12-26

Rewrote the section The Virtue of Getting All Result Sets after having learnt about DataTable.Load(SqlDataReader) from Karen Payne. This call serves the same purpose as SqlDataAdapter.Fill(DataTable) but will not miss re-raised exceptions that occurs in a query.

2022-11-26

Much belated, I have updated the section Doing the Loopback with regards to CLR strict security and updated one of the build scripts, slog_loopback_assem.sql, for SqlEventLog accordingly.

2022-04-14

In the section Other Approaches, added links to two blog posts by Adrian Buckman that provide another solution/kludge to the section to the problem with getting both error messages from BACKUP/RESTORE when you use TRY-CATCH.

2022-03-06

A small clarification to the section Query Timeouts on suggestion from Naomi Nosonovsky.

2021-10-01

Updated the examples in the section Handling Anticipated Errors to use parsename on error_procedure(), since error_procedure() now (well, since SQL 2017), can return a three-part name.

2021-08-20

The download for SqlEventLog now includes a loopback solution for Azure SQL Database that Daniel Verzellesi was kind to share with me.

2021-03-27

There was an issue with the CLR loopback in SqlEventLog that Kurt Gunther ran into. It appears there have been some changes to .NET, so that all calls through the loopback resulted in a CAS exception. I have removed the offending lines from slog_loopback.cs. I have also updated the subsection Invoking the Loopback from the CLR in the CLR Appendix to reflect this.

2017-03-26

Added a new section The Virtue of Getting All Result Sets in Chapter 9 on client-side considerations.

2016-11-01

Dave Mason has come up with possibly is the best solution to the problem with the multiple error messages from BACKUP/RESTORE and DDL: use extended events. I describe his solution briefly in the section Other Approaches which includes links to his blog posts.

2015-07-11

Pietro Toniolo pointed out there was a small flaw in the column logproc in the sqleventlog table: the data type needs to be nvarchar(257) to permit for maximum length of both schema and procedure and also fit the dot – I had only nvarchar(256). This also affects the procedures log_insert_sp and translate_procid_sp.
Added a note that on SQL 2005, the first error message from BACKUP appears to bypass the CATCH handler entirely.
Ronen Ariely made me aware of that the trick of using DBCC OUTPUTBUFFER that I describe in one my of old articles for SQL 2000 is applicable if you want to get hold of the first error messages from BACKUP/RESTORE and DDL commands. See section 7.5, Other approaches for more details.

2015-05-03

First version.

Back to my home page.