Error and Transaction Handling in SQL Server

Appendix 1: Linked Servers

An SQL text by Erland Sommarskog, SQL Server MVP. Latest revision: 2015-05-03.
Copyright applies to this text. See here for font conventions used in this article.

Introduction

This is an appendix to the three-part series Error and Transaction Handling in SQL Server. More precisely, this appendix is an extension to the chapter Special Contexts in Part Two and it covers how error and transaction handling in SQL Server works when you access linked servers. When reading Part Two you may have been appalled by the great level of inconsistency in SQL Server, and you may have been confused by all the possible actions that SQL Server can take in case of an error. When you use linked servers, the stakes raise even higher. If you also use distributed transactions, you may find yourself in situations where it is difficult to have any civilised error handling at all.

I assume in this appendix that you have read Part One in whole and the first six chapters of Part Two, and that you have gathered a basic understanding of the material. If you have arrived at this page through a link or a web search, I recommend that you read these two parts first.

Table of Contents

Introduction

Index of All Error-Handling Articles

Some General Notes on Linked Servers

Setting Up a Linked Server

Distributed Transactions

How Errors from Linked Servers are Communicated

Errors Occurring Locally

Errors in the OLE DB Provider

Errors in the Remote Data Source

Calling Remote Procedures

Trying to Catch Remote Errors

Did the Remote Procedure Succeed or Not?

Query Timeout on Linked Servers

Stored Procedure Calls in Distributed Transactions

Accessing Remote Objects in Queries

Linked Servers and Deferred Name Resolution

Errors When Updating Remote Objects

Query Timeouts

Conclusion

Revision History

Index of All Error-Handling Articles

Here follows a list of all articles in this series:

Part One – Jumpstart Error Handling.

Part Two – Commands and Mechanisms.

Part Three – Implementation.

Appendix 1 – Linked Servers. (This article.)

Appendix 2 – CLR. (Extends both Parts Two and Three.)

Appendix 3 – Service Broker. (Extends Part Three.)

All the articles above are for SQL 2005 and later. For those who still are on SQL 2000, there are two older articles:

Error Handling in SQL Server 2000 – a Background.

Implementing Error Handling with Stored Procedures in SQL 2000

Some General Notes on Linked Servers

This chapter gives a general introduction to linked severs and distributed transactions. It also introduces a setup for the examples in this appendix.

When you access a linked server, SQL Server never talks to the remote data source directly. Instead it talks to an OLE DB provider that accesses the data source. Most OLE DB providers are product-specific. For instance, the OLE DB provider for SQL Server is SQL Server Native Client, SQLNCLI. (There is also SQLOLEDB, which is an predecessor to SQLNCLI.) There are OLE DB providers for data sources such as Oracle, DB2, Access/Jet and Active Directory. There is also the generic provider MSDASQL that talks to an ODBC driver, which makes it possible to set up a linked server to a data source for which there is no OLE DB provider.

It seems to me that the most common use of linked servers is to connect to another SQL Server instance, and this is what I will focus on in this appendix. Given the query syntax in SQL where a linked server is just one more dot, you can easily be lured to think that connecting to a remote server is no different than accessing an object in a different database in the local instance. But for various reasons this is not the case. SQL Server takes a fairly agnostic view on the linked server, and generally sees it as an "OLE DB data source". There do not seem to be very many shortcuts if the other instance is also running SQL Server. This results in a somewhat clunky behaviour with restrictions that are difficult to understand if you incorrectly think of the feature as a means to connect two instances running SQL Server. As one example, you cannot invoke a table-valued function on a linked server.

Interesting enough, in the particular field of error handling, there are such shortcuts. To wit, as we shall see in this appendix, SQL Server has a tailored error handling if the provider is SQLNCLI and a more generic behaviour with other providers.

Setting Up a Linked Server

If you want to define a linked server to the SQL Server instance ZAPPA running on the server FRANK, this is as simple as

EXEC sp_addlinkedserver 'FRANK\ZAPPA'

You can now access objects and perform actions on this instance, in four different ways:

  1. Direct query in four-part notation:
    SELECT * FROM [FRANK\ZAPPA].zappadb.dbo.tblDweezil
  2. A pass-through query with OPENQUERY:
    SELECT * FROM OPENQUERY([FRANK\ZAPPA], 'SELECT * FROM zappadb.dbo.tblMoon')
  3. Execute a stored procedure on FRANK\ZAPPA using four-part notation:
    EXECUTE [FRANK\ZAPPA].zappadb.dbo.AhmetSP
  4. Execute some arbitrary code with EXEC() AT:
    EXEC('CREATE TABLE DivaTbl(a int NOT NULL)') AT [FRANK\ZAPPA]

Note: The access to the linked server is like any other login to that server. That is, you need to have a valid login on that server or set up login mapping with sp_addlinkedsrvlogin. For brevity, I'm assuming that your login is valid on both instances.

Just because the remote instance is known as FRANK\ZAPPA, you don't have to use that name in your code. The linked server as such is really only an alias. For instance, this defines a linked server FZ that maps to the physical instance FRANK\ZAPPA:

EXEC sp_addlinkedserver 'FZ', '', 'SQLNCLI', 'FRANK\ZAPPA'

(For a complete discussion of the parameters to sp_addlinkedserver, please see Books Online.) I will use FZ as the name of the linked server in the examples in this article. If you want to run the examples yourself, define FZ to point to an instance in your environment. To be able to invoke stored procedures on the linked server, you need to run one more statement:

EXEC sp_serveroption FZ, 'rpc out', true

If you want to change FZ to point to a difference instance, first drop the current definition with sp_dropserver:

EXEC sp_dropserver 'FZ'

If you only have access to one single SQL Server instance, you can still run the examples by setting up a loopback instance like this:

EXEC sp_addlinkedserver 'FZ', '', 'SQLNCLI', @@servername

If you now run a query like:

SELECT * FZ.master.sys.objects

SQL Server will connect back to itself, thinking that it is talking a to different data source. This will work with most of the examples in the article, but there is one exception, which I will point out when we arrive there. An alternative is to add a second instance to your machine, for instance by downloading and installing an instance of SQL Server Express Edition.

Note: There is always a server definition for the local instance itself, and if your server is MYSERVER you can run things like

SELECT * MYSERVER.master.sys.objects
EXEC MYSERVER.master.sys.sp_who

However, SQL Server will recognise what is going on and shortcut these to local operations, why the quirky issues with linked servers will not exhibit. It is not smart enough, though, to look through a loopback server as set up as above.

Also, in preparation of the exercises, put error_handler_sp in tempdb on both instances as we use it all through the article. If you already have gotten your feet wet with SqlEventLog, I encourage you install it as well in tempdb, and if you like you can use it in place of error_handler_sp in the example scripts.

Distributed Transactions

When you access a linked server and you have a transaction in progress – either an explicit one started with BEGIN TRANSACTION, or an implicit one defined by a single local statement – the transaction is promoted to be a distributed transaction. This can be turned off in SQL 2008 and later versions with sp_serveroption by setting the option remote proc transaction promotion to false.

SQL Server does not implement distributed transactions itself, but it relies on an operating system component, Microsoft Distributed Transaction Coordinator (DTC). To be able to run some of the demos, you will need to know how to control DTC. Typically, DTC is not automatically started on desktop operating systems. To check the status, go the Control Panel, select Administrative Commands and in this folder select Services. Scroll down the list until you come to Distributed Transaction Coordinator. (Note that there may be other services of which the name also start in Distributed...) You can right-click the service to start or stop it from the context menu.

When you run a distributed transaction over two data sources that are on different machines, you need to have DTC running on both. For the two DTC services to talk to each other, they need to be able to log on to each other's operating systems. If you are in a domain and there are no trust issues, this should work out smoothly. On the other hand, if you are in a workgroup, it can be about impossible to get it to work. (It may work if you change the service accounts for DTC to be accounts with the same name and password on both machines.) For the examples in this article, you can avoid this problem if you define FZ to point to another instance on the same machine, as in this case you only need one instance of DTC.

How Errors from Linked Servers are Communicated

When accessing a linked server, an error can occur in any of these three places:

  1. The local SQL Server instance.
  2. The OLE DB provider.
  3. The remote data source.

In this section we will look at how these errors are reported locally and how you can trap them.

Errors Occurring Locally

These are errors that the local server itself can detect. For instance:

SELECT * FROM NOSUCHSERVER.master.sys.objects

results in:

Msg 7202, Level 11, State 2, Line 1

Could not find server 'NOSUCHSERVER' in sys.servers. Verify that the correct server name was specified. If necessary, execute the stored procedure sp_addlinkedserver to add the server to sys.servers.

These errors are just like any other local SQL Server error, and there is not much to say about them.

Errors in the OLE DB Provider

When errors occur in the OLE DB provider, they are typically not raised as errors as such. Instead the raw error messages from the provider are relayed as informational messages, which means that you cannot trap them in SQL Server with TRY-CATCH and the error_xxx functions. However, they are interpreted by SQL Server that raises its own generic error message so that your CATCH handler can fire.

We will look at some examples of OLE DB errors, starting with connectivity errors. For this particular example set up FZ to point to a non-existing machine:

EXEC sp_addlinkedserver FZ, '', 'SQLNCLI', 'NOSUCHMACHINE'
EXEC sp_serveroption FZ, 'rpc out', true

Then run this:

EXEC FZ.tempdb.sys.sp_helpindex 'sys.objects'

This produces an output (after some delay) like the below:

OLE DB provider "SQLNCLI11" for linked server "FZ" returned message "Login timeout expired".

OLE DB provider "SQLNCLI11" for linked server "FZ" returned message "A network-related or instance-specific error has occurred while establishing a connection to SQL Server. Server is not found or not accessible. Check if instance name is correct and if SQL Server is configured to allow remote connections. For more information see SQL Server Books Online.".

Msg 53, Level 16, State 1, Line 0

Named Pipes Provider: Could not open a connection to SQL Server [53].

In total there are three messages, whereof two are said to come from the OLE DB provider, and the last one looks like an SQL Server message. Looks like? Well, this situation is a little odd, which we see more clearly if we change FZ to point to a local non-existing instance:

EXEC sp_dropserver FZ
EXEC sp_addlinkedserver FZ, '', 'SQLNCLI', '(local)\NOSUCHINSTANCE'
EXEC sp_serveroption FZ, 'rpc out', true

To show that this error is trappable, we use TRY-CATCH this time:

BEGIN TRY
   EXEC FZ.tempdb.sys.sp_helpindex 'sys.objects'
END TRY
BEGIN CATCH
   EXEC error_handler_sp
END CATCH

The output on SQL 2012 and SQL 2014 is:

OLE DB provider "SQLNCLI11" for linked server "FZ" returned message "Login timeout expired".

OLE DB provider "SQLNCLI11" for linked server "FZ" returned message "A network-related or instance-specific error has occurred while establishing a connection to SQL Server. Server is not found or not accessible. Check if instance name is correct and if SQL Server is configured to allow remote connections. For more information see SQL Server Books Online.".

Msg 50000, Level 16, State 1, Procedure error_handler_sp, Line 20

*** <dynamic SQL>, Line 0. Errno -1: SQL Server Network Interfaces: Error Locating Server/Instance Specified [xFFFFFFFF].

There is one thing that is odd with the error message: the error number is -1. Error numbers produced by SQL Server itself are supposed to be between 1 and 49999. On SQL 2005 and SQL 2008 the error number is different, but not less odd: 65535. What happens in this particular case is that SQL Server takes one of the error messages from the OLE DB provider and dresses it up an SQL Server message, using the return code from OLE DB as its error number. If you open a command-line window and run

SQLCMD -S NOSUCHMACHINE
SQLCMD -S (local)\NOSUCHINSTANCE

you will see the same messages as above, but in somewhat different order and this time no error looks like an SQL Server error – all three clearly come from the client API.

For the next example with OLE DB, define FZ to point to an SQL Server instance on a different machine. (This example also gives you an occasion to test whether you are able to get this scenario working in your environment). Run this:

CREATE TABLE indexes (name    sysname        NOT NULL,
                      descr   nvarchar(200)  NOT NULL,
                      columns nvarchar(2000) NOT NULL)

BEGIN TRY
   INSERT indexes (name, descr, columns)
      EXEC FZ.tempdb.sys.sp_helpindex 'sys.objects'
END TRY
BEGIN CATCH
   EXEC error_handler_sp
END CATCH
go
DROP TABLE indexes

Recall that since the INSERT statement defines a system transaction, the stored procedure will always execute in the context of that transaction. And if this is a stored procedure on a linked server, the transaction is promoted to a distributed transaction. If it all works out smoothly and the procedure executes successfully, you will see this message:

The object 'sys.objects' does not have any indexes, or you do not have permissions.

I was less lucky. When I first ran the script, I got this message because I had forgotten to start DTC on the other server.

Msg 50000, Level 16, State 2, Procedure error_handler_sp, Line 20

*** <dynamic SQL>, Line 1. Errno 8501: MSDTC on server 'R2' is unavailable.

There is no OLE DB error here, why we will learn in the next section. Once I had started DTC on the other machine, I got this output:

OLE DB provider "SQLNCLI11" for linked server "FZ" returned message "The partner transaction manager has disabled its support for remote/network transactions.".

 

(0 row(s) affected)

Msg 50000, Level 16, State 2, Procedure error_handler_sp, Line 20

*** <dynamic SQL>, Line 6. Errno 7391: The operation could not be performed because OLE DB provider "SQLNCLI11" for linked server "FZ" was unable to begin a distributed transaction.

If this happens to you, you have a problem with your DTC setup. The error message may lure you to think it is a configuration issue with DTC, so you may try both this and that. Not that I recommend it. I have tried myself without much success. I believe that real issue in an authentication failure.

There are a few things we can note about the output. The message from SQL Server is a generic one, but this time it is one of its own messages. The OLE DB provider has a more detailed error message. Unfortunately, neither that message is particularly clear, and this is nothing unique – error messages you get with linked servers are often very obscure, and this is far from the worst.

Errors in the Remote Data Source

When the errors occurs in the remote data source, this is presented completely differently depending on whether the remote data source is another SQL Server instance or not. If the linked server is another SQL Server instance, you cannot even tell from the message whether the error occurred locally or remotely. Make sure that you have FZ pointing to an existing SQL Server instance and run this:

EXEC FZ.master.sys.sp_helpdb 'NoSuchDb'
go
INSERT FZ.tempdb.sys.objects(object_id, name) 
   VALUES (1, 'The Grand Wazoo')

The output is:

Msg 15010, Level 16, State 1, Procedure sp_helpdb, Line 43

The database 'NoSuchDb' does not exist. Supply a valid database name. To see available databases, use sys.databases.

Msg 259, Level 16, State 1, Line 1

Ad hoc updates to system catalogs are not allowed.

If you remove FZ from the statements so that they are local to the server, the output is the same. This is possible, because the OLE DB provider for SQL Server has a provider-specific interface which exposes the seven components of an error message from SQL Server. Therefore the local SQL Server can dress up the error from the remote server as if it had occurred locally.

If the remote data source is not known to be an SQL Server instance, the output is radically different. It is possible to test this without getting a license for Oracle, learning Access or installing MySQL. You can set up FZ in this way:

EXEC sp_addlinkedserver FZ, '', 'MSDASQL', 
     @provstr = 'Driver={SQL Server};Trusted_connection=yes;Server=FRANK\ZAPPA'

This sets up a linked server using the generic OLE DB-over-ODBC provider MSDASQL with the old SQL Server ODBC driver. In the example above, replace FRANK\ZAPPA with the name of your remote SQL Server instance.

Note: I only show the possibility to set up a linked server over ODBC to be able show you how error handling works with another provider than SQLNCLI. While everything seems to work when you set up a linked server this way, Microsoft does not support running linked servers against other SQL Server instances over ODBC; they only support linked servers using SQLNCLI.

If you run the script above with this definition of FZ, you get this output instead:

OLE DB provider "MSDASQL" for linked server "FZ" returned message "[Microsoft][ODBC SQL Server Driver][SQL Server]The database 'NoSuchDb' does not exist. Supply a valid database name. To see available databases, use sys.databases. ".

Msg 7212, Level 17, State 1, Line 1

Could not execute procedure 'sp_helpdb' on remote server 'FZ'.

OLE DB provider "MSDASQL" for linked server "FZ" returned message "[Microsoft][ODBC SQL Server Driver][SQL Server]Ad hoc updates to system catalogs are not allowed.".

Msg 7343, Level 16, State 2, Line 1

The OLE DB provider "MSDASQL" for linked server "FZ" could not INSERT INTO table "[FZ].[tempdb].[sys].[objects]".

Because MSDASQL does not expose the components from an SQL Server message like SQLNCLI does, SQL Server uses its standard method to relay error messages from linked servers. The actual errors from the remote instance appears as informational messages from the OLE DB provider and they are passed directly to the client with no chance for us to pick them up in a CATCH block with the error_xxx functions. Instead SQL Server produces a generic message which only tells us that the operation failed. As long as you only want to trap the error to avoid further execution and roll back any open transaction, this is good enough. But if you also want to log the error into something like SqlEventLog, this is somewhat meagre. Note also the severity level for the first error: it is 17 which is pretty vile; this is above the range for regular user errors.

Before you move on, restore FZ to be defined with SQLNCLI as a regular linked server for SQL Server. Save your script for using MSDASQL, however, because we will occasionally come back to this setup.

Calling Remote Procedures

In this chapter and the next one, we will run commands both on the local server and the remote server FZ. Here in the text, I will show things piece by piece for clarity. To make it easier to run the examples, I have also prepared scripts with the same contents that I show in the text. These scripts all follow this structure:

  1. Set up the linked server (Either comment this out, or change to your server/instance names). This section also includes a definition using MSDASQL which is commented out.
  2. Create objects on FZ through sp_executesql.
  3. Create local stored procedures.
  4. The actual test run of the local stored procedure and inspection of the result set.
  5. Clean-up section.

In this way you can run an example in one go without having to jump forth and back between query windows connected to different instances. As the examples are variations of each other, the same script is good for several examples, but you will have to make small changes as we move on. The first script to work with is linkedserverdemo1.sql and we will work with this script until I tell you to move to the next.

Trying to Catch Remote Errors

We will first look at how we can catch remote errors. On the remote server FZ we have this table and stored procedure:

CREATE TABLE remotetbl(a int NOT NULL PRIMARY KEY,
                       b int NOT NULL)
go
CREATE PROCEDURE remote_sp @a int, @b char(1) AS
   INSERT remotetbl(a, b) VALUES(@a, @b)
   INSERT remotetbl(a, b) VALUES(@a + @b, 0)'

On the local server we have a stored procedure that calls the procedure on FZ:

CREATE PROCEDURE local_sp AS
   SET NOCOUNT ON
   SET XACT_ABORT OFF
   BEGIN TRY
      EXEC FZ.tempdb.dbo.remote_sp 1, '1'
      EXEC FZ.tempdb.dbo.remote_sp 1, '2'
   END TRY
   BEGIN CATCH
      PRINT 'Entering CATCH handler in local_sp'
      IF @@trancount > 0 ROLLBACK TRANSACTION
      EXEC error_handler_sp
   END CATCH

Before we run anything, let's try to understand what will happen when we run local_sp. The first call to remote_sp will insert two rows into remotetbl with the values (1, 1) and (2, 0). The second call will attempt to insert the rows (1, 2) and (3, 0). However, the row with (1, 2) will cause a primary-key violation. Had remote_sp been a stored procedure in the same database, or at least a stored procedure in a database on the same instance, the PK violation would have fired the CATCH block in local_sp, and thus neither row in the second call would have been inserted.

But what happens when we try this on a linked server? The same? Something else? Are you ready to place your bets? OK, let's try it!

EXEC local_sp
go
SELECT a, b FROM FZ.tempdb.dbo.remotetbl

This is the output:

The statement has been terminated.

Msg 2627, Level 14, State 1, Procedure remote_sp, Line 2

Violation of PRIMARY KEY constraint 'PK__remotetb__3BD0198E22AA2996'. Cannot insert duplicate key in object 'dbo.remotetbl'. The duplicate key value is (1).

a           b

----------- -----------

1           1

2           0

3           0

 

(3 row(s) affected)

We can observe two things: 1) The row with (3, 0) was inserted into to remotetbl. 2) The CATCH handler in local_sp was not invoked. (The text Entering CATCH handler in local_sp was not printed, and the error message is not reformatted by error_handler_sp.) Instead the error message went directly to the client. And on the remote side, execution continued after the PK violation, which only is a statement-terminating error.

So this means that errors raised in remote stored procedures cannot be trapped by TRY-CATCH? Don't jump to conclusion. Change the second call in local_sp to read:

EXEC FZ.tempdb.dbo.remote_sp 1, 'b' -- '2'

This will cause a batch-aborting conversion error when trying to insert the first row. So row (3, 0) should not be inserted this time. This is the output:

Entering CATCH handler in local_sp

Msg 50000, Level 16, State 1, Procedure error_handler_sp, Line 20

*** [remote_sp], Line 2. Errno 245: Conversion failed when converting the varchar value 'b' to data type int.

a           b

----------- -----------

1           1

2           0

 

(2 row(s) affected)

As expected, the row (3, 0) is not there, but what is more interesting, this time the CATCH handler in local_sp fired. This may seem like one of those willy-nilly things with SQL Server error handling, but for once there is a rational explanation. Before I come to it, let's look at one more thing. Change the second call to remote_sp to once again read:

EXEC FZ.tempdb.dbo.remote_sp 1, '2'

Furthermore, change the setting in local_sp to read SET XACT_ABORT ON and try again. Now the output is:

Entering CATCH handler in local_sp

Msg 50000, Level 14, State 1, Procedure error_handler_sp, Line 20

*** [remote_sp], Line 2. Errno 2627: Violation of PRIMARY KEY constraint 'PK__remotetb__3BD0198E47DBAE45'. Cannot insert duplicate key in object 'dbo.remotetbl'. The duplicate key value is (1).

a           b

----------- -----------

1           1

2           0

 

(2 row(s) affected)

This time the row with (3, 0) was not inserted and the local CATCH handler was invoked. Let consider the fate of (3, 0) first. We changed the XACT_ABORT setting locally, but what says that this would affect the remote procedure? However, that is precisely what is happening. (If you want a proof of this, set up a Profiler trace to see what is going on at FZ.) Maybe there is a pattern? If the error on the remote side is batch-aborting, the local CATCH handler is invoked, but not if the remote error was only statement-terminating? That is indeed a correct observation. Remains to understand why.

There is a logical lapse that is easy to make: you may think that the call to the remote procedure is just like any other call, but it isn't. When the PK violation occurs in remote_sp, the FZ instance first checks if there is a CATCH handler within the procedure. There isn't and then it goes further up the call stack. Except that there is not much of a call stack, since remote_sp is the top procedure. To wit, FZ knows nothing about local_sp or whatever CATCH handler the client may have. You see, for FZ the local server is just another client. So this is the rub: there is not one call stack, but there are two, one each server. This explains why execution continued in remote_sp in the first case: there is no CATCH handler, and the error is statement-terminating.

Let's now move over to the local server. The local server sees that there is an error coming from the linked server. A gut reaction would be to jump to the CATCH handler. But wait, the remote procedure is still executing. So at very least, local_sp has to wait until remote_sp has completed, as it may produce more output, which could be result sets or more error messages. How should the local instance deal with these possibilities? Invoking the CATCH handler would give a false promise that the error in remote_sp was taken care of and that no further disaster happened. Then again, ignoring the CATCH handler makes TRY-CATCH seem to be unreliable.

When I first discovered this behaviour I debated with myself, and eventually I arrived on the opinion that I did not like it. I was about to file a bug on Connect, but I found that Jack J Zou had already beaten me to it. Microsoft responded to his item, saying that the behaviour by design and that it is also documented in Books Online in the topic Handling Errors in Server-to-Server Remote Stored Procedures.

Before we move on, let's look at what happens when we use MSDASQL for the linked server. Change local_sp back to have SET XACT_ABORT OFF. The output is:

OLE DB provider "MSDASQL" for linked server "FZ" returned message "[Microsoft][ODBC SQL Server Driver][SQL Server]The statement has been terminated.".

OLE DB provider "MSDASQL" for linked server "FZ" returned message "[Microsoft][ODBC SQL Server Driver][SQL Server]Violation of PRIMARY KEY constraint 'PK__remotetb__3BD0198E1AD3FDA4'. Cannot insert duplicate key in object 'dbo.remotetbl'. The duplicate key value is (1).".

Entering CATCH handler in local_sp

Msg 50000, Level 17, State 1, Procedure error_handler_sp, Line 20

*** [local_sp], Line 7. Errno 7212: Could not execute procedure 'remote_sp' on remote server 'FZ'.

a           b

----------- -----------

1           1

2           0

3           0

 

(3 row(s) affected)

In this case the CATCH handler was fired. Here, SQL Server does not have that special knowledge that the other instance is an SQL Server instance, and raised an error despite that procedure continued executing on the other side. We can clearly see that the behaviour when the linked server is known to be SQL Server is a special case, and as noted above it is by design.

Change local_sp to have SET XACT_ABORT ON and run again. The output is the same. That is, this time the setting for XACT_ABORT ON is not propagated to the other side; that's another special case that only happens with SQLNCLI. (And that is not very surprising. With the exception of Sybase, a close relative of SQL Server, I would not expect any other database product have the command SET XACT_ABORT or something similar.)

Before you leave this section, restore the definition of FZ to the normal setting with SQLNCLI, as we will continue to use this script.

Did the Remote Procedure Succeed or Not?

It may seem that we have the answer: when we have XACT_ABORT in place, we can rely on TRY-CATCH, but alas not all errors abort the batch, even when XACT_ABORT is in effect. The most noticeable of them is RAISERROR, which the error from the remote procedure very well may have been produced by. Not the least if the remote procedure uses a standard CATCH handler with error_handler_sp or SqlEventLog. The simple remote_sp we had in the previous section is certainly not the norm.

Here is a version of remote_sp where we use a standard CATCH hander with some variations commented out.

CREATE PROCEDURE remote_trycatch_sp @a int, @b char(1) AS
   SET NOCOUNT ON
   BEGIN TRY
      INSERT remotetbl(a, b) VALUES(@a, @b)
      INSERT remotetbl(a, b) VALUES(@a + @b, 0)
   END TRY
   BEGIN CATCH
      --PRINT 'The CATCH handler in remote_sp was entered'
      IF @@trancount > 0 ROLLBACK TRANSACTION
      --; THROW
      EXEC error_handler_sp 
      RETURN 999
   END CATCH

To call this new remote procedure, we have this local procedure which is the same as local_sp, except that it has XACT_ABORT ON and one more diagnostic PRINT:

CREATE PROCEDURE local_sp2 AS
   SET NOCOUNT ON
   SET XACT_ABORT ON
   BEGIN TRY
      EXEC FZ.tempdb.dbo.remote_trycatch_sp 1, '1'
      EXEC FZ.tempdb.dbo.remote_trycatch_sp 1, '2'
      PRINT 'local_sp2 jogs along'
   END TRY
   BEGIN CATCH
      PRINT 'Entering CATCH handler in local_sp2'
      IF @@trancount > 0 ROLLBACK TRANSACTION
      EXEC error_handler_sp
   END CATCH

Both remote_trycatch_sp and local_sp2 are included in linkedserverdemo1.sql. All you need to change is the test section to read:

EXEC local_sp2
SELECT a, b FROM FZ.tempdb.dbo.remotetbl

This is the output:

Msg 50000, Level 14, State 1, Procedure error_handler_sp, Line 20

*** [remote_trycatch_sp], Line 4. Errno 2627: Violation of PRIMARY KEY constraint 'PK__remotetb__3BD0198E03BB8E22'. Cannot insert duplicate key in object 'dbo.remotetbl'. The duplicate key value is (1).

local_sp2 jogs along

a           b

----------- -----------

1           1

2           0

 

(2 row(s) affected)

We can't tell from looking at the error message whether it was passed directly from FZ to the client, or if it was reraised anew in local_sp. However, we can tell from the diagnostic message local_sp2 jogs along, that the CATCH handler in local_sp2 was not invoked.

If your remote server is running SQL 2012 or later, you can test the effect of uncommenting the ;THROW command in remote_trycatch_sp:

Entering CATCH handler in local_sp2

Msg 50000, Level 14, State 1, Procedure error_handler_sp, Line 20

*** [remote_trycatch_sp], Line 4. Errno 2627: Violation of PRIMARY KEY constraint 'PK__remotetb__3BD0198EC5142E37'. Cannot insert duplicate key in object 'dbo.remotetbl'. The duplicate key value is (1).

a           b

----------- -----------

1           1

2           0

 

(2 row(s) affected)

;THROW aborts the batch when there is no CATCH handler higher up on the call stack, why the CATCH handler in local_sp2 is invoked this time. (Please recall that remote_trycatch_sp and local_sp2 are on different call stacks in different processes). In the main parts of this series of articles on error handling, I have favoured the solutions based on RAISERROR over ;THROW, but it seems that it's time to chalk one up for ;THROW here.

But unfortunately, life is not that easy. Uncomment that diagnostic PRINT, so that CATCH handler in remote_trycatch_sp reads:

BEGIN CATCH
   PRINT 'The CATCH handler in remote_sp was entered'
   IF @@trancount > 0 ROLLBACK TRANSACTION
   ; THROW
   -- EXEC error_handler_sp 
   -- RETURN 999
END CATCH

And then we run local_sp2 again:

Msg 2627, Level 14, State 1, Procedure remote_trycatch_sp, Line 4

Violation of PRIMARY KEY constraint 'PK__remotetb__3BD0198EE7FBB646'. Cannot insert duplicate key in object 'dbo.remotetbl'. The duplicate key value is (1).

The CATCH handler in remote_sp was entered

local_sp2 jogs along

a           b

----------- -----------

1           1

2           0

 

(2 row(s) affected)

What? As I was cooking up the demos for this section, I added that PRINT statement to make it clearer which CATCH handler that was entered. When I ran the test with ;THROW, my jaw dropped when the CATCH handler in local_sp2 was not invoked. To be honest, I have no idea what is going on, but note that there is another small mystery in the output: The message The CATCH handler in remote_sp was entered comes after the error message, although the messages were produced in reverse order. Something somewhere changes the order and that lures the local SQL Server to think that the message from ;THROW was not batch-aborting, why it cannot invoke the CATCH handler.

Thus, if we want to be really sure to detect whether the remote procedure failed, we cannot rely on TRY-CATCH, but we have to look for something else. The reader may here object that a PRINT message in a CATCH handler is quite much of an edge case, and that is true. However:

If TRY-CATCH cannot be relied on, what is left to us are return codes and @@error. We first run an exploratory test with local_retcode_ataterror_sp so that we get a feeling of what we can expect:

CREATE PROCEDURE local_retcode_ataterror_sp AS
   SET NOCOUNT ON
   SET XACT_ABORT ON
   BEGIN TRY
      DECLARE @ret int
      EXEC @ret = FZ.tempdb.dbo.remote_trycatch_sp 1, '1'
      SELECT @ret AS [@ret], @@error AS [@@error]
      EXEC @ret = FZ.tempdb.dbo.remote_trycatch_sp 1, '2'
      SELECT @ret AS [@ret], @@error AS [@@error]
   END TRY
   BEGIN CATCH
      PRINT 'Entering CATCH handler in local_sp2'
      IF @@trancount > 0 ROLLBACK TRANSACTION
      EXEC error_handler_sp
   END CATCH

We try this procedure with different CATCH handlers in remote_trycatch_sp. We can stay with the one we tested last, that is:

BEGIN CATCH
   PRINT 'The CATCH handler in remote_sp was entered'
   IF @@trancount > 0 ROLLBACK TRANSACTION
   ; THROW
   -- EXEC error_handler_sp 
   -- RETURN 999
END CATCH

We get these values for @ret and @@error (I'm ignoring the error message for brevity):

@ret        @@error

----------- -----------

0           0

 

@ret        @@error

----------- -----------

NULL        0

@@error contains nothing of use, but we can see that @ret is explicitly is set to NULL for the failing call. That is, we know that it was 0 prior to the call, so it was not a matter of @ret just retaining its value. (If that sounds funny, recall that this can happen in procedure calls, as I discussed in the section RETURN and the Return Values from Stored Procedures in Part Two.)

Let's now try with the original CATCH handler:

BEGIN CATCH
   -- PRINT 'The CATCH handler in remote_sp was entered'
   IF @@trancount > 0 ROLLBACK TRANSACTION
   -- ; THROW
   EXEC error_handler_sp 
   RETURN 999
END CATCH

We get these values for the second call to remote_trycatch_sp:

@ret        @@error

----------- -----------

999         0

In this case we got the value specified the RETURN statement. Still no value in @@error. What happens if we comment out the RETURN statement, or leave it blank? This time the result is:

@ret        @@error

----------- -----------

-4          0

The value of -4 may seem surprising, but this is something that I discussed in the section RETURN and the Return Values from Stored Procedures. When there is no explicit RETURN value, but an error has been raised during the execution of the stored procedure, the return value is 10 minus the severity level of the error.

To summarise this experiment, we see that:

With this knowledge, we can write a local procedure which is bulletproof – or so we like to hope!

CREATE PROCEDURE local_retcode_sp AS
   SET NOCOUNT ON
   SET XACT_ABORT ON
   BEGIN TRY
      DECLARE @ret int = 112
      EXEC @ret = FZ.tempdb.dbo.remote_trycatch_sp 1, '1'
      IF isnull(@ret, 1) <> 0 
         RAISERROR('remote_trycatch_sp failed. Return status %d', 16, 1, @ret)

      SELECT @ret = 112
      EXEC @ret = FZ.tempdb.dbo.remote_trycatch_sp 1, '2'
      IF isnull(@ret, 1) <> 0 
         RAISERROR('remote_trycatch_sp failed. Return status %d', 16, 1, @ret)
   END TRY
   BEGIN CATCH
      PRINT 'Entering CATCH handler in local_local_retcode_sp'
      IF @@trancount > 0 ROLLBACK TRANSACTION
      EXEC error_handler_sp
   END CATCH

That is, we check @ret after each call to the remote procedure, and if the value is anything but zero, we raise an error to take us to the CATCH block. As you see, I also set @ret to a non-zero value before each call. This may be overly ambitious. I have not seen a case when calling a remote store procedure where the return value has been unchanged. Or more precisely, I have not seen such a case when I also have TRY-CATCH. But I certainly did not try everything, so I cannot be sure that the pre-assignment of @ret is unnecessary.

The output (using the version of remote_trycatch_sp we used most recently) is:

Msg 50000, Level 14, State 1, Procedure error_handler_sp, Line 20

*** [remote_trycatch_sp], Line 4. Errno 2627: Violation of PRIMARY KEY constraint 'PK__remotetb__3BD0198E6E8B6712'. Cannot insert duplicate key in object 'dbo.remotetbl'. The duplicate key value is (1).

Entering CATCH handler in local_retcode_sp

Msg 50000, Level 16, State 1, Procedure error_handler_sp, Line 20

*** [local_retcode_sp], Line 13. Errno 50000: remote_trycatch_sp failed. Return status -4

a           b

----------- -----------

1           1

2           0

Query Timeout on Linked Servers

Here is a really ugly one. For this example, you should comment out the creation and drop of remotetbl from linkedserverdemo1.sql. Instead runs this script in a query window directly connected to your linked server:

CREATE TABLE remotetbl(a int NOT NULL PRIMARY KEY,
                       b int NOT NULL)
go
BEGIN TRANSACTION
SELECT * FROM remotetbl WITH (SERIALIZABLE)

The implication of the serializable isolation level is that if you re-run a query in the same transaction, you are guaranteed to get back the exactly same result. That is, no rows will have been modified, deleted or inserted. In other words, when we run any of the remote procedures in the demo, they will be blocked as long as this transaction is open.

As you may recall from Part Two, many client APIs has a default query timeout of 30 seconds. If no result has been produced within this amount of time, they send an attention signal to SQL Server and communicate an error to the application. When SQL Server accesses a linked server, it also employs a query timeout. However, the default is ten minutes, why the risk you would experience one is less. For this test, we don't want to wait ten minutes, so run this command to set the timeout to five seconds:

EXEC sp_serveroption FZ, 'query timeout', 5

Note: the default value for the server option is 0, but that does not mean "wait forever", but instead SQL Server uses the server-wide configuration parameter remote query timeout (s), which by default is 600, that is, ten minutes.

Let's first test this with local_retcode_ataterror_sp. The output is below.

OLE DB provider "SQLNCLI11" for linked server "FZ" returned message "Query timeout expired".

@ret        @@error

----------- -----------

NULL        0

No error was raised! All we get is an informational message from the OLE DB provider. This is no less than horrifying! This means that if you are not prepared for this, not only will your local stored procedure be unaware of that the call to the remote procedure failed, but furthermore the application will think that everything is nice and cosy. This must be construed as a bug, and I have reported it on Connect. The response so far from Microsoft is non-committal, but the item is at least still open.

Nevertheless, there are some good news. While there is no error message, we can note that the return value is NULL, so that way we can tell that something went wrong, and we can verify this by running local_retcode_sp:

OLE DB provider "SQLNCLI11" for linked server "FZ" returned message "Query timeout expired".

Entering CATCH handler in local_retcode_sp

Msg 50000, Level 16, State 1, Procedure error_handler_sp, Line 20

*** [local_retcode_sp], Line 8. Errno 50000: remote_trycatch_sp failed. Return status (null)

a           b

----------- -----------

 

(0 row(s) affected)

It seems that this bad behaviour is confined to the case when the OLE DB provider is SQLNCLI, that is, when then the remote server is known to be another SQL Server instance. To wit, if you change FZ to be defined through MSDASQL instead, you get this output:

OLE DB provider "MSDASQL" for linked server "FZ" returned message "[Microsoft][ODBC SQL Server Driver]Query timeout expired".

Entering CATCH handler in local_retcode_ataterror_sp

Msg 50000, Level 16, State 1, Procedure error_handler_sp, Line 20

*** [local_retcode_ataterror_sp], Line 6. Errno 7214: Remote procedure time out of 600 seconds exceeded. Remote procedure 'remote_trycatch_sp' is canceled.

In this case, SQL Server produces a generic error message. (Never mind that the number 600 is incorrect; apparently it takes the value from the server-level configuration parameter and ignores the setting for the linked server.)

Commit that open transaction and make sure that you drop remotetbl before you move on.

Stored Procedure Calls in Distributed Transactions

By now, we have changed so much forth and back in linkedserverdemo1.sql, that we will leave that script behind and move on to linkedserverdemo2.sql. This script contains the same tables and procedures as linkedserverdemo1.sql with the addition of BEGIN and COMMIT TRANSACTION in the local stored procedures. Here is the new version of local_sp:

CREATE PROCEDURE local_sp AS
   SET NOCOUNT ON
   SET XACT_ABORT OFF
   BEGIN TRY
      BEGIN TRANSACTION
      EXEC FZ.tempdb.dbo.remote_sp 1, '1'
      EXEC FZ.tempdb.dbo.remote_sp 1, '2'
      COMMIT TRANSACTION
   END TRY
   BEGIN CATCH
      PRINT 'Entering CATCH handler in local_sp'
      IF @@trancount > 0 ROLLBACK TRANSACTION
      EXEC error_handler_sp
   END CATCH

remote_sp looks the same as it did before. We try to run this batch:

EXEC local_sp
go
SELECT a, b FROM FZ.tempdb.dbo.remotetbl

My first attempt ended thusly:

Entering CATCH handler in local_sp

Msg 50000, Level 16, State 3, Procedure error_handler_sp, Line 20

*** [local_sp], Line 7. Errno 8501: MSDTC on server 'NATSUMORI\CATORCE' is unavailable.

That is, I had forgotten to start DTC. I put in that message, in case you are equally oblivious. Having started DTC, I got this output:

The statement has been terminated.

Msg 2627, Level 14, State 1, Procedure remote_sp, Line 2

Violation of PRIMARY KEY constraint 'PK__remotetb__3BD0198EA3F8EDDC'. Cannot insert duplicate key in object 'dbo.remotetbl'. The duplicate key value is (1).

 

a           b

----------- -----------

1           1

2           0

3           0

 

(3 row(s) affected)

The outcome was the same as without a transaction. That is, the error was not caught in local_sp, but execution continued in remote_sp and the transaction was committed at the end. If you change local_sp to have SET XACT_ABORT ON, the error in remote_sp is caught, the transaction is rolled back and the tables are empty. There is nothing radical about this; but the behaviour is quite expected. As long as XACT_ABORT is OFF in remote_sp, it will continue execution on a statement-terminating error like a PK violation.

Let's now try with local_sp2, which looks like this after adding a user-defined transaction:

CREATE PROCEDURE local_sp2 AS
   SET NOCOUNT ON
   SET XACT_ABORT ON
   BEGIN TRY
      BEGIN TRANSACTION
      EXEC FZ.tempdb.dbo.remote_trycatch_sp 1, '1'
      EXEC FZ.tempdb.dbo.remote_trycatch_sp 1, '2'
      PRINT 'local_sp2 jogs along'
      COMMIT TRANSACTION
   END TRY
   BEGIN CATCH
      PRINT 'Entering CATCH handler in local_sp2'
      IF @@trancount > 0 ROLLBACK TRANSACTION
      EXEC error_handler_sp
   END CATCH

remote_trycatch_sp looks as the same as before, I repeat it for reference:

CREATE PROCEDURE remote_trycatch_sp @a int, @b char(1) AS
   SET NOCOUNT ON
   BEGIN TRY
      INSERT remotetbl(a, b) VALUES(@a, @b)
      INSERT remotetbl(a, b) VALUES(@a + @b, 0)
   END TRY
   BEGIN CATCH
      --PRINT ''The CATCH handler in remote_sp was entered''
      IF @@trancount > 0 ROLLBACK TRANSACTION
      -- ; THROW
      EXEC error_handler_sp
      RETURN 999 
   END CATCH

We run local_sp2 and then look at what's in remotetbl. This is the output you get when the local server runs SQL 2005:

Msg 266, Level 16, State 2, Procedure remote_trycatch_sp, Line 0

Transaction count after EXECUTE indicates a mismatching number of BEGIN and COMMIT statements. Previous count = 1, current count = 0.

Entering CATCH handler in local_sp2

Msg 50000, Level 14, State 1, Procedure error_handler_sp, Line 20

*** [remote_trycatch_sp], Line 4. Errno 2627: Violation of PRIMARY KEY constraint 'PK__remotetb__3BD0198EE91D9F5C'. Cannot insert duplicate key in object 'dbo.remotetbl'. The duplicate key value is (1).

a           b

----------- -----------

 

(0 row(s) affected)

(If you get a completely different message, comment out BEGIN and COMMIT TRANSACTION and run local_sp2. Once you have done this, restore BEGIN and COMMIT and run again; you should now see an output similar to the above.)

Do you recall what happened when we ran local_sp2 without a transaction? That time the CATCH handler in local_sp2 was not fired, but the SP jogged along. This time, however, the local CATCH handler was fired. Why, I don't know exactly, but the explanation may be first in the output: there is an extra error message due to that remote_trycatch_sp rolled back the transaction which caused a trancount mismatch. This message is sent directly to the client, while the PK error is visible in the CATCH handler. If you run local_sp2 on SQL 2008 or higher, no message is sent to the client directly; all you see is the PK violation which is caught by the CATCH handler.

So this looks quite good then? We get the error message and even our CATCH handler works now, and there is no need to look at return codes? May I remind you of that 1974 smash-hit from Bachman Turner Overdrive, You Ain't Seen Nothing Yet!

Notice that the first line in remote_trycatch_sp says SET NOCOUNT ON. Change that to SET NOCOUNT OFF and try again. This is the output on SQL 2005:

Msg 50000, Level 14, State 1, Procedure error_handler_sp, Line 20

*** [remote_trycatch_sp], Line 4. Errno 2627: Violation of PRIMARY KEY constraint

PK__remotetb__3BD0198EDA073A2C'. Cannot insert duplicate key in object 'dbo.remotetbl'. The duplicate key value is (1).

Entering CATCH handler in local_sp2

Msg 50000, Level 16, State 2, Procedure error_handler_sp, Line 20

*** [remote_trycatch_sp], Line 0. Errno 266: Transaction count after EXECUTE indicates a mismatching number of BEGIN and COMMIT statements. Previous count = 1, current count = 0.

a           b

----------- -----------

 

(0 row(s) affected)

The messages are reversed! (It's a little confusing to read, but the message about the PK violation comes from the call to error_handler_sp on the remote server FZ.) That we could possibly live with, but look what happens on SQL 2008:

Entering CATCH handler in local_sp2

Msg 50000, Level 16, State 2, Procedure error_handler_sp, Line 20

*** [remote_trycatch_sp], Line 0. Errno 266: Transaction count after EXECUTE indicates a mismatching number of BEGIN and COMMIT statements. Previous count = 1, current count = 0.

Msg 0, Level 11, State 0, Line 0

A severe error occurred on the current command. The results, if any, should be discarded.

a           b

----------- -----------

 

(0 row(s) affected)

The error about the PK violation is nowhere to be seen. Instead there is this other message about a severe error, which presumably is due to that SQL Server sends bad TDS. I say so, because when the local server runs SQL 2012 or SQL 2014 this message is not there:

Entering CATCH handler in local_sp2

Msg 50000, Level 16, State 2, Procedure error_handler_sp, Line 20

*** [remote_trycatch_sp], Line 0. Errno 266: Transaction count after EXECUTE indicates a mismatching number of BEGIN and COMMIT statements. Previous count = 1, current count = 0.

a            b

----------- -----------

 

(0 row(s) affected)

Confusing, eh? I think I can explain the difference between SQL 2005 and the others. On SQL 2008, all error messages from the remote procedure are handled in the same way, but since only one message can be displayed the error_xxx functions, you can only see one of them. You can convince yourself that both are there if you are running local_sp2 on SQL 2012 or later by inserting a ;THROW to the CATCH handler in local_sp2. You will see both error messages, but the order will be different depending on the NOCOUNT setting in remote_trycatch_sp.

But what has NOCOUNT to do with it? I think I have hunch of what is going on, but the explanation goes far beyond the scope of this series of articles, so I let it suffice to say that it has to do with the internals of the TDS protocol and the OLE DB provider.

If we try to escape our confusion and instead look at the practical implications, we can see that there is quite a serious one. There is a fifty-fifty chance that the error message we see is message 266 and that the real error is hidden. This means that our possibilities to understand what went wrong are seriously diminished.

But there is more to it. Run this on the server FZ:

EXEC sp_recompile error_handler_sp

And then run local_sp2 again:

Entering CATCH handler in local_sp2

Msg 50000, Level 16, State 1, Procedure error_handler_sp, Line 20

*** [remote_trycatch_sp], Line 11. Errno 8525: Distributed transaction completed. Either enlist this session in a new transaction or the NULL transaction.

a           b

----------- -----------

 

(0 row(s) affected)

This time the output is the same on all versions of SQL Server. There are not two messages, because this is a batch-aborting message that occurs in the CATCH block in remote_trycatch_sp before the original message is reraised.

More precisely the error occurs when error_handler_sp is invoked. Why? I don't have the full story, but judging from the error message we are in the grip of DTC. The transaction has been rolled back, but if we try to start some local SQL Server transaction, DTC tells us to start a new distributed transaction or an explicit NULL transaction (which I don't think is possible from T‑SQL, but please tell me if you find out). In this particular case, what triggers the wrath of DTC is the attempt to compile a stored procedure which is not in the plan cache.

As long as we are using error_handler_sp, we could possibly consider this to be a small problem, because the odds are decent that the procedure already has a plan in the cache. However, in Part Three of this series, I introduce SqlEventLog and the stored procedure slog.catchhandler_sp. I have not been able to use this procedure successfully in remote_trycatch_sp, but I get the error above, even if there is a plan for the procedure in the cache. When trying to understand why, I have found that all of these trigger error 8525:

To sum this up, if you are inside a transaction and call a remote stored procedure that uses proper error handling, that is, a CATCH block as suggested in this series of articles, all you may see is this very unhelpful message. To find the original error message, you would need to use Profiler on the remote server, and include the events Error:Exception and Error:User message in the trace. You could also do the corresponding in an X-Event session. Whichever, this requires that you are able to reproduce the problem. If it was something that only happens intermittently, you are left in the dark of what is going on.

This behaviour is utterly bad, and I filed bug on Connect, although I had little hope that it would be fixed; it was more like a protest action. Indeed the bug has been closed as Won't Fix. (They first tried with By Design, but I did not buy that design.) The situation would have been somewhat better if it had been possible to detect that you are in a distributed transaction. You can tell this from the DMV sys.dm_session_transactions, but to access this DMV requires the permission VIEW SERVER STATE, so it is not good to put in application code. I have a Connect item calling for something better, but it has also been closed as Won't Fix.

If you are calling a stored procedure which you have written to be called over a linked server, there is a simple strategy you can apply to avoid this problem: don't use any CATCH block, but rely solely on SET XACT_ABORT ON (which you should have in the remote procedure).

But the problem is that in many cases you want to call an existing stored procedure, which might be calling other procedures in its turn, why you don't have any control over how they do their error handling. I'm afraid that I don't have any advice for you.

What happens if we use a linked server with MSDASQL for the examples in this section? I'm not showing the output, but here is a quick summary. As long as the plan for error_handler_sp is in the cache, we get both messages from remote_trycatch_sp, but we get them as informational messages, so the client must take care of them and not drop them on the floor. The case where error_handler_sp is not the cache is not any different; we get the same error message about a completed distributed transaction.

To conclude this section on distributed transactions on the lighter side, here is what happens when you have SET NOCOUNT ON in remote_trycatch_sp, and you uncomment the diagnostic PRINT and the ;THROW statement in the CATCH handler:

Msg 2627, Level 14, State 1, Procedure remote_trycatch_sp, Line 4

Violation of PRIMARY KEY constraint 'PK__remotetb__3BD0198E09AB25D0'. Cannot insert duplicate key in object 'dbo.remotetbl'. The duplicate key value is (1).

The CATCH handler in remote_sp was entered

local_sp2 jogs along

Entering CATCH handler in local_sp2

Msg 50000, Level 16, State 1, Procedure error_handler_sp, Line 20

*** [local_sp2], Line 9. Errno 3930: The current transaction cannot be committed and cannot support operations that write to the log file. Roll back the transaction.

a           b

----------- -----------

 

(0 row(s) affected)

This time, the CATCH block in local_sp2 was not fired directly, but the SP tried to jog along. However when it tried to commit the transaction, it was detected that the transaction was doomed, and there was not any actual harm.

Accessing Remote Objects in Queries

We will now turn to accessing remote objects directly in queries, by four-part notation or through OPENQUERY.

Linked Servers and Deferred Name Resolution

In this section we will look at what happens when we try to access non-existing objects. There is no pre-cooked script for this section.

In Part Two, you learnt that SQL Server has deferred name resolution, which has the consequence that you can get compilation errors at run-time. That is, if you run this:

CREATE PROCEDURE ForCalvin AS
   SELECT * FROM HitchHikers

The procedure is created successfully whereupon it fails at run-time, because the table is missing. With linked servers it is a completely different matter. If you try this:

CREATE PROCEDURE ForCalvin AS
   SELECT * FROM AndHis.Next.Two.HitchHikers
go

The procedure is not created, but you get this error:

Msg 7202, Level 11, State 2, Procedure ForCalvin, Line 2

Could not find server 'AndHis' in sys.servers. Verify that the correct server name was specified. If necessary, execute the stored procedure sp_addlinkedserver to add the server to sys.servers.

What if the server exists, but not the table? Assuming that you still have FZ defined try this:

CREATE PROCEDURE ForCalvin AS
   SELECT AndHis FROM FZ.Next.Two.HitchHikers

Again the procedure is not created, but the error message is a different one:

Msg 7314, Level 16, State 1, Procedure ForCalvin, Line 2

The OLE DB provider "SQLNCLI11" for linked server "FZ" does not contain the table ""Next"."Two"."HitchHikers"". The table either does not exist or the current user does not have permissions on that table.

If you try to use OPENQUERY:

CREATE PROCEDURE ForCalvin AS
   SELECT * FROM OPENQUERY(FZ, 'SELECT AndHis FROM Next.Two.HitchHikers')

The error message is again different, but still no procedure:

OLE DB provider "SQLNCLI11" for linked server "FZ" returned message "Deferred prepare could not be completed.".

Msg 8180, Level 16, State 1, Line 1

Statement(s) could not be prepared.

Msg 208, Level 16, State 1, Line 1

Invalid object name 'Next.Two.HitchHikers'.

I find this ironic. While I don't like deferred name resolution in general, I could see a point in having it for linked servers. If I run a large deploy script on my production server in a tight maintenance window, and the remote server is down because they are applying a service pack, do I want my script to fail? I guess the reason for this inconsistence is that when deferred name resolution was introduced in SQL 7, linked servers were scoped out because of time constraints or some other reason. But it is in moments like this you easily start to think that SQL Server has been designed for maximum confusion.

Say now that you are able to create the procedure, but at run-time the table is missing for whatever reason. What happens in this case? Here is a script that creates a table at the remote server, creates a local procedure, and then drops the table before the procedure is executed:

EXEC ('CREATE TABLE tempdb.dbo.Cleetus(Awreety int NOT NULL)') AT FZ
go
CREATE PROCEDURE Awrighty AS
SET NOCOUNT ON
BEGIN TRY 
   PRINT 'Awrighty starting'
   SELECT Awreety FROM FZ.tempdb.dbo.Cleetus
END TRY
BEGIN CATCH
   PRINT 'Entering the CATCH handler'
   IF @@trancount > 0 ROLLBACK TRANSACTION
   EXEC error_handler_sp
   RETURN 901
END CATCH
go
--EXEC Awrighty
EXEC ('DROP TABLE tempdb.dbo.Cleetus') AT FZ
go
EXEC Awrighty
go
DROP PROCEDURE Awrighty

The output is:

Msg 7314, Level 16, State 1, Procedure Awrighty, Line 5

The OLE DB provider "SQLNCLI11" for linked server "FZ" does not contain the table ""tempdb"."dbo"."Cleetus"". The table either does not exist or the current user does not have permissions on that table.

That is, since the characteristic output of the error_handler_sp is not there, we can tell that the CATCH handler is not invoked, not too different from when a local table is missing. But if you look closer, it's not exactly the same. With a local table, the procedure starts running, and you get the error when you reach the SELECT statement. But since the message Awrighty starting is not printed, we can tell that we never entered the TRY block. No wonder that the CATCH block is not fired! The error occurs already when the optimiser builds a plan for the procedure.

What if the procedure is in the cache when the table is dropped? If this happens with a table in a database on the local server, SQL Server directly invalidates all plans referring to this table. But that cannot happen when a table is dropped on the remote server, so the local instance will run the procedure and not discover until it reaches the statement that the table is gone. This is exactly what happens if we uncomment the first call to Awrighty in the script above:

Awrighty starting

Awreety

-----------

 

Awrighty starting

Entering the CATCH handler

Msg 50000, Level 16, State 1, Procedure error_handler_sp, Line 20

*** [Awrighty], Line 5. Errno 7314: The OLE DB provider "FZ" for linked server "FZ" does not contain the table ""tempdb"."dbo"."Cleetus"". The table either does not exist or the current user does not have permissions on that table.

Indeed, the procedure starts executing, and it tries to run the SELECT statement which fails, et voilà! The error is caught and handled like any other run-time error. (Which, as you may recall, is in contrast to when a table local to the server is missing. In this case, the error cannot be caught in the scope where it occurs, only in outer scopes.) If you peel off the TRY block, add a transaction and run some more tests, you will find that this is a batch- and transaction-aborting error. I'm not including an example to show this, but I leave it as an exercise to the reader.

The remote table being dropped is only one example of what could cause the query to fail. Columns could have been dropped or renamed. Or for that matter the database itself. The server could be down. These situations result in different errors, but as long as the plan is the cache already, the effect is the same: you get a run-time error that you can trap with TRY-CATCH. On the other hand, if the definition of the linked server is dropped, this flushes the plan cache, why you get a scope-aborting compilation error that you cannot trap in the local scope.

Errors When Updating Remote Objects

Here are some examples of what can happen if you attempt to update remote objects through four-part notation. A full script for this section is available in the file linkedserverdemo3.sql.

On the remote server, we have the same old table:

CREATE TABLE remotetbl(a int NOT NULL PRIMARY KEY,
                       b int NOT NULL)

On the local side we have two procedures, one calling the other:

CREATE PROCEDURE EatThatQuestion AS
SET NOCOUNT ON
SET XACT_ABORT OFF
BEGIN TRY
   -- BEGIN TRANSACTION
   INSERT FZ.tempdb.dbo.remotetbl (a, b) VALUES (1, 1)
   INSERT FZ.tempdb.dbo.remotetbl (a, b) VALUES (1, 1)   
   -- COMMIT TRANSACTION
END TRY
BEGIN CATCH
   PRINT 'Entering CATCH handler in EatThatQuestion.'   
   IF @@trancount > 0 ROLLBACK TRANSACTION
   EXEC error_handler_sp
END CATCH
go
CREATE PROCEDURE BlessedRelief AS
BEGIN TRY
   EXEC EatThatQuestion   
END TRY
BEGIN CATCH
   PRINT 'This is the CATCH handler of Blessed Relief'
   IF @@trancount > 0 ROLLBACK TRANSACTION
   EXEC error_handler_sp
END CATCH
go

The test as such is simple:

EXEC BlessedRelief

The output with the above is:

The statement has been terminated.

Entering CATCH handler in EatThatQuestion.

This is the CATCH handler of Blessed Relief

Msg 50000, Level 14, State 1, Procedure error_handler_sp, Line 20

*** <dynamic SQL>, Line 1. Errno 2627: Violation of PRIMARY KEY constraint 'PK__remotetb__3BD0198E6CD828CA'. Cannot insert duplicate key in object 'dbo.remotetbl'. The duplicate key value is (1).

The output is not too unexpected, but we can note that the procedure name and line number are incorrect. This is not that mysterious; the local server connects to the remote server and it is the remote server that runs the query, presumably through some sort of ad-hoc batch. The local server then only relays what it gets. (Recall that the text about dynamic SQL is added by error_handler_sp when error_procedure() returns NULL.) We can also note that this is different from when we called a remote stored procedure: in that case a PK violation went directly to the client and it was not trappable with TRY-CATCH. But in this case, SQL Server has full control over what it emits to the remote server, and knows that this is a single operation, why invoking the CATCH handler is safe.

Now let's uncomment the BEGIN and COMMIT TRANSACTION statements and see what happens. This is a case where it matters whether you have a loopback server, or whether your linked server is a different instance. If your linked server is a loopback server, you get this message:

Entering CATCH handler in EatThatQuestion.

This is the CATCH handler of Blessed Relief

Msg 50000, Level 16, State 2, Procedure error_handler_sp, Line 20

*** <dynamic SQL>, Line 1. Errno 3910: Transaction context in use by another session.

Mysterious as it may seem, this falls under the category you are not supposed to do that. If you have set up FZ to point to a different instance, you will instead see this output:

OLE DB provider "SQLNCLI10" for linked server "FZ" returned message "Cannot start more transactions on this session.".

Entering CATCH handler in EatThatQuestion.

This is the CATCH handler of Blessed Relief

Msg 50000, Level 16, State 2, Procedure error_handler_sp, Line 20

*** [EatThatQuestion], Line 7. Errno 7395: Unable to start a nested transaction for OLE DB provider "SQLNCLI10" for linked server "FZ". A nested transaction was required because the XACT_ABORT option was set to OFF.

Here is a case where you first get a message from the OLE DB provider and another from SQL Server. The message from SQL Server is quite clear: we need to change EatThatQuestion to have SET XACT_ABORT ON. So let's do that. You might see this output:

Msg 0, Level 11, State 0, Line 0

A severe error occurred on the current command. The results, if any, should be discarded.

This message indicates an internal error somewhere, either in SQL Server or the client API. Often when the level is 11, the error has occurred is in the client API, although the bug does have to be there; often it is SQL Server that has emitted bad TDS. If you try SQLCMD or OSQL instead of SSMS, you may get no output at all – despite that we know that EatThatQuestion generates a PK violation.

If you run the script a number of times, you may sometimes see this message instead:

Msg 1206, Level 18, State 118, Procedure EatThatQuestion, Line 10

The Microsoft Distributed Transaction Coordinator (MS DTC) has cancelled the distributed transaction.

There are all sorts of mysteries here. None of the CATCH blocks have been entered, and the message comes directly from SQL Server, without passing through error_handler_sp. The severity level is 18, beyond the normal range for user errors which 11 to 16. If you count lines in EatThatQuestion, what is on line 10? This statement:

BEGIN CATCH

That is, the PK violation is trapped and execution is sent to the CATCH block, but upon entering the CATCH block something really bad happens, and an entirely uncatchable error is produced. (It does not help if you wrap EatThatQuestion in an outer procedure.) And not only that: again we face a situation where the original error is masked by a new error, meaning that we have very little clue about went wrong originally. If the situation is easily reproducible, the workaround is the same as above: run a trace or an X-event session and include the events for exceptions and user messages.

When I discovered this, I reported a bug on Connect. Microsoft closed it with Won't fix but they were kind to offer an explanation, which I reproduce here:

We looked at the interaction of DTC and DML for this particular error & reached conclusion that we cannot fix this behavior. For one, the DML statement has to be atomic in nature so we cannot yield during the DTC operation since that will result in breaking the transaction semantics. So given this specification, when an error happens on the remote server DTC sends abort requests to all participants. This is done asynchronously so it is possible that there is a race condition under which we might be able to execute code in the CATCH block if the DTC abort notification has not been processed yet and in other cases we will not be able to run code in the CATCH block. Note that the abort request comes in the form of an attention signal which is essentially a request abort so it cannot be caught on the server.

One possible workaround is to use a stored procedure to perform the INSERT. This way you could wrap the INSERT in a TRY...CATCH on the remote side if needed and handle errors there. In any case, you need to be aware of the DTC error handling when dealing with DML operations on linked servers.

In their Connect response, Microsoft also referred to the topic Using TRY...CATCH in Transact-SQL in Books Online, which contains this relevant passage in a note:

If a distributed transaction executes within the scope of a TRY block and an error occurs, execution is transferred to the associated CATCH block. The distributed transaction enters an uncommittable state. Execution within the CATCH block may be interrupted by the Microsoft Distributed Transaction Coordinator which manages distributed transactions. When the error occurs, MS DTC asynchronously notifies all servers participating in the distributed transaction, and terminates all tasks involved in the distributed transaction. This notification is sent in the form of an attention, which is not handled by a TRY…CATCH construct, and the batch is ended. When a batch finishes running, the Database Engine rolls back any active uncommittable transactions. If no error message was sent when the transaction entered an uncommittable state, when the batch finishes, an error message will be sent to the client application that indicates an uncommittable transaction was detected and rolled back For more information about distributed transactions, see Distributed Transactions (Database Engine).

The race condition with multiple processes at play, two SQL Server instances and one or two DTC instances, explains why the output is not always the same. The full story about A severe error... is not revealed, but maybe SQL Server has started sending something when the attention signal interrupts why the client gets get an incomplete message.

In their initial response to the bug, Microsoft say that they handle the situation for stored procedure calls correctly, so supposedly we should not see this error when calling remote stored procedures. (But as we have seen, there are other issues.)

Interesting enough, if I run this scenario with FZ set up to use MSDASQL, I consistently get the correct error message:

OLE DB provider "MSDASQL" for linked server "FZ" returned message "[Microsoft][ODBC SQL Server Driver][SQL Server]The statement has been terminated.".

OLE DB provider "MSDASQL" for linked server "FZ" returned message "[Microsoft][ODBC SQL Server Driver][SQL Server]Violation of PRIMARY KEY constraint 'PK__remotetb__3BD0198E6383C8BA'. Cannot insert duplicate key in object 'dbo.remotetbl'. The duplicate key value is (1).".

Entering CATCH handler in EatThatQuestion.

This is the CATCH handler of Blessed Relief

Msg 50000, Level 16, State 2, Procedure error_handler_sp, Line 20

*** [EatThatQuestion], Line 8. Errno 7343: The OLE DB provider "MSDASQL" for linked server "FZ" could not INSERT INTO table "[FZ].[tempdb].[dbo].[remotetbl]". The data violated the integrity constraints for one or more columns.

However, since there is a race condition here, this does not really prove anything. It might be working by chance only, and you may see a different result in your environment.

Query Timeouts

For completeness sake, let's look how this works with direct queries. On FZ run in tempdb (after having created remotetbl):

BEGIN TRANSACTION
SELECT * FROM remotetbl WITH (SERIALIZABLE)

Make sure that the BEGIN and COMMIT TRANSACTION commands in EatThatQuestion are commented out and run BlessedRelief. This was the output I got:

OLE DB provider "SQLNCLI11" for linked server "FZ" returned message "Query timeout expired".

Entering CATCH handler in EatThatQuestion.

This is the CATCH handler of Blessed Relief

Msg 50000, Level 16, State 2, Procedure error_handler_sp, Line 20

*** [EatThatQuestion], Line 6. Errno 7343: The OLE DB provider "SQLNCLI11" for linked server "FZ" could not INSERT INTO table "[FZ].[tempdb].[dbo].[remotetbl]".

So in this case, the timeout error is trapped properly.

Conclusion

We have now looked at the peculiarities with error and transaction handling when you use linked servers and we have focused on the most common case: the linked server is another SQL Server instance and we use the SQLNCLI provider. We have also made some tests using MSDASQL + the SQL Server ODBC driver to get a feeling for what you can expect when you access other data sources than SQL Server.

We have seen that in the case with SQLNCLI, SQL Server takes some shortcuts and presents errors on from the remote instance as if they are local messages. This does not always fall out well, and you can face situations where you lose the original error message, not the least if you engage in distributed transactions. And there is the really ugly case when you get a timeout in a remote procedure and this does not raise any error at all.

In contrast, when you use MSDASQL, the behaviour is squared: the error message you can trap in SQL Server is a generic one telling you that some operation failed. The real error messages always go to the client as informational messages. For some cases, like the query timeout, the behaviour with MSDASQL is certainly better than with SQLNCLI. However, please keep in mind that the examples with MSDASQL + SQL Server were for demonstrational purposes only. Microsoft does not support linked servers with MSDASQL for connection to other SQL Server instances; they only support SQLNCLI.

You have seen several examples of confusing behaviour in this appendix. Have you gotten the full story? I cannot make any such promises. This appendix came about when I started to write something a lot briefer intended to go into the chapter Special Contexts in Part Two. However, as I explored various combinations, I found more and more confusing things, and I quickly had more material than I could fit into the main article. And at some point I just had to stop. Thus, I am quite sure that if you use linked servers a lot that you will find more absurdities than you have seen here.

Overall, I am not a friend of linked servers and even less of distributed transactions. They often mean hassle, and then I am not only thinking of error and transaction handling, but more of things in general like connectivity. Not only may the remote server be down, but it can also stop working because changes in the AD which prevents delegation from working. Performance can also be a problem, since compilation of distributed queries is quite a challenge for the optimiser. Distributed transactions also incur several restrictions. You cannot use snapshot isolation, and you cannot use the command SAVE TRANSACTION. Distributed transactions are not supported with mirroring or availability groups. And on top of all, you cannot easily check whether you are in a distributed transaction.

What are the alternatives? Well, if you can put the databases on the same instance do so – that will make your life so much easier. If the databases must be on different servers, look into asynchronous solutions with Service Broker or similar. Or have a client program to talk to both servers. True, many such solutions require more effort up front. But linked servers are one of those things that look simple in the beginning but keep on causing grief as you use them.

We are the end of Appendix 1. If you have questions on the contents, or you have noticed any spelling or grammar errors, you are welcome to mail me on esquel@sommarskog.se. If you have problems related to something you are working with, I recommend that you ask in a public forum for quicker response. For a list of acknowledgements, please see the end of Part Three. Below is a revision history for this appendix.

Revision History

2015-05-03
First version.

Back to my home page.