Why Read-only Table Parameters is Not Enough

An SQL text by Erland Sommarskog, SQL Server MVP. Copyright applies to this text.

Abstract

In this article I discuss a new feature in SQL 2008, table-valued parameters and particularly the restriction that they have to be read-only. I argue that this makes this feature considerably less useful than it could be, and that in order to build scalable applications, it's essential to be able to pass read-write table parameters between stored procedures. Assuming that the read-only restriction is related to the communication with client API, I suggest that it would be acceptable that stored procedures with table parameters that are read-write cannot be called from client APIs. However, I then go on and discuss why read-write table parameters would be interesting also for clients.

The ultimate purpose for this article is to whip up support for an item that I have filed on Microsoft Connect. If you agree with my argumentation I urge you to go to https://connect.microsoft.com/SQLServer/feedback/ViewFeedback.aspx?FeedbackID=299296 and cast your vote. Originally, my hope was to persuade Microsoft to lift the current restriction before the final release of SQL Server 2008, but unfortunately that did not happen. So the Connect item still stands, and there is all reason to vote to persuade Microsoft to complete the implementation of table variables in the next version of SQL Server. (You need to login through Windows Live to vote. You vote by clicking on the stars in the Vote box.)

Introduction

When I first heard that SQL 2008 would have table parameters, I was very excited, because in the system I work with there are several places where we pass data between stored procedures, using some of the techniques discussed in my article How to Share Data Between Stored Procedures. While those methods work, they are not optimal, and all through the years I have felt that table-valued parameters is what we really need.

But when I learnt that the table parameters in SQL 2008 are read-only, I was equally disappointed. In almost all the cases that I share a temp table or use a process-keyed table (techniques discussed in the aforementioned article), I want to pass data back, or at least use the input table as a working area. Essentially, SQL 2008 is not bringing me any closer to a real solution.

Don't get me wrong, table-valued parameters is still a valuable addition to SQL Server even when read-only when you want to pass structured data from the client. In my article Arrays and Lists in SQL Server, I discuss a whole bunch of methods to do this in SQL 2005 and earlier. Table parameters could make history of all of them. But this article is about passing data between stored procedures.

A Business Case

This week, I have been working with designing a set of interfaces in our system where read-write table-valued parameters would be very very useful to make this is a truly scalable solution. (Since this is for SQL 2005, I will have to use process-keyed tables no matter what.)

First a short background of the business problem: clients of a brokerage firm may place orders to buy fund units. Commonly when you buy funds, you buy for a given amount, for instance €1000. At some point during the day, the brokerage firm makes a cut-off and aggregates all orders for one or more funds, and sends a total order to the fund company. At the same time the order amount is withdrawn from the client accounts and paid to the fund company. A few days later, the fund company confirms the transaction and informs the broker how many units he received and at which price. These units are then deposited into the client accounts.

My system keeps track of the cash holdings and the positions of instruments (funds, stocks, bonds, etc.) The purpose of the interface I have been designing is to permit an external system to handle the logic to aggregate the orders, communicate with the fund company etc. Currently, I know of two such external systems, and there could be more in the future. My interface is intended to be general, and I expect that for about every external system that will communicate with us, we will need some glue code specific to that system. That code would also reside in the database. Thus, the interface is a set of stored procedures that receives and returns data.

For this article, I will focus on the interface get_fund_orders. The external fund system passes me one or more funds to retrieve orders for, and in return it should get:

At the same time, in our system I should perform some updates:

A typical number of funds in a single call would be 100-200. The number of client orders can easily be 100 000 – 200 000 at the end of the month.

Implementing the Business Case

Ten years ago, I would maybe have defined this procedure as simply as this.

CREATE PROCEDURE get_fund_orders @fundid int,
                                 @batchreference int OUTPUT AS
...                          

The orders would have been returned as a result set, and any errors would have been communicated by RAISERROR and a RETURN with a non-zero value.

But I have learnt since long, that this does not scale. Retrieving the orders and performing the updates per fund, is likely to take about 100 times longer for 100 funds, than retrieving all orders and performing all updates at once. OK, maybe not 100 times, but easily 80 times longer.

And while returning the orders as a result set would be perfectly OK, if the fund system really is external and runs in a different machine, it would exclude the possibility that the fund system would be a component of our own system or at least running in a database on the same server. Well, not really since INSERT-EXEC could come to the rescue, but as I argue in my article How to Share Data..., INSERT-EXEC is not very robust.

Had I had access to read-write table parameters, I would define the interface as:

CREATE TYPE fundlist_type AS TABLE (fundid   int NOT NULL PRIMARY KEY,
                                    batchref int NULL,
                                    errmsg   nvarchar(255) NULL)

CREATE TYPE orderdata_type AS TABLE (batchref int           NOT NULL,
                                     orderid  int           NOT NULL,
                                     orderamt decimal(10,2) NOT NULL,
                                     currency char(3)       NOT NULL,
                                     PRIMARY KEY(batchref, orderid))
CREATE PROCEDURE get_fund_orders @funds  fundlist_type  OUTPUT,
                                 @orders orderdata_type OUTPUT AS                         

Clean and simple. And scalable.

I like to stress that it is essential that errmsg is added as a column in fundlist_type. If one fundid is in error, for instance there is no such fund, it's completely unacceptable that the procedure fails for all funds. get_fund_orders must return data for all funds that are in order. Of course, the procedure could still fail for all funds if there is an unanticipated error like a foreign-key violation or a deadlock, but the key is that since this is communication with an external system, some errors are anticipated, and the effects of such errors should be limited. After all, that was how the interface of ten years ago behaved, so why not now?

The Alternatives and their Drawbacks

As I've mentioned, in my article How to Share Data Between Stored Procedures, I look at several methods to pass data between stored procedures, and in this section I will take a look at some of them, and see how they could be used in my business case, and what disadvantages they have. Here I do not describe the methods, so if you are unfamiliar with them, please refer to the article for reference.

Table-valued Functions

Table-valued functions cannot be used at all in this scenario, because I need to perform updates.

Sharing Temp Tables

There are two major problems with this method here:

Process-keyed Tables

The major gain here is that the table definition is one place. But there are other drawbacks:

INSERT-EXEC

INSERT-EXEC permits for a workaround for read-only table parameters, as illustrated in this example:

CREATE TYPE my_table_type AS TABLE(a int NOT NULL PRIMARY KEY,
                                   b int NOT NULL,
                                   c int NULL)
go
CREATE PROCEDURE the_callee @indata my_table_type READONLY AS
DECLARE @copy my_table_type
INSERT @copy SELECT * FROM @indata
UPDATE @copy 
SET    c = a + b
FROM   @copy c
SELECT * FROM @copy
go
go
CREATE PROCEDURE the_caller AS
DECLARE @indata  my_table_type,
        @outdata my_table_type
INSERT @indata (a, b)
    VALUES (5, 7), (9, 2), (12, 23)
INSERT @outdata
   EXEC the_callee @indata

Normally, it's bad practice to use SELECT * and INSERT without explicit column lists, but since we are using a table type we have full control over that we copy all columns.

While this workaround can make table-valued parameters a serious option for passing data between stored procedures, there are still drawbacks:

The latter point is actually more important than it may look at first glance. A couple of years ago, a colleague of mine got the task of rewriting a stored procedure from single-row operation to be multi-row capable. This procedure was some 700 lines long and called several other procedures, of which the most also worked with scalar parameters. When he was done, he had one big monolith of 3000 lines of code. He had to incorporate the sub-procedures into the main procedure, as it was not realistic to define process-keyed tables for all of them. Good coding practice of regular programming languages would of course call for the modularisation we had with the scalar procedure. But you need read-write table-valued parameters for this.

XML

Yes, I could use parameters of the xml data type to pass data in and out of get_fund_orders, but seriously!. If creating and shredding XML to pass data between T-SQL stored procedures does not spell k-l-u-d-g-e, nothing does. In a relational database we work with tables.

If you find that argument too much on the emotional side, consider the overhead for copying to XML and back to a table.

Also, keep in mind that the maximum size for a value of the xml data type is 2 GB, a limit that is not too unrealistic to hit in this context.

Use Cases for Table Parameters

I think you already have a clear picture of how I would like to work with table parameters, but I still like to summarise the different cases:

Actually, while table-valued parameters may look new and exotic to some, there are probably a lot of stored procedures in this world that takes scalar input that really should take table input. There is a lot of code out there that loops over a table to call a stored procedure for one row at a time – when that procedure should accept a set of data for the application to be scalable.

Understanding the Read-only Restriction

Of course, Microsoft did not introduce the restriction that table parameters have to be read-only on a whim, but obviously they ran into some technical problem which time did not permit them to overcome.

What those problems were, I don't know, but I do have a theory: the difficulties lie in passing the data of read-write table parameters back to client APIs. And also defining methods in the client API to retrieve the data.

Working from that theory, my suggestion to make it a viable option to add read-write table parameters in SQL 2008 is to replace the current restriction with a less severe restriction: it is only permitted to call a stored procedure with a read-write table parameter from T-SQL, not directly from a client API.

I believe this restriction is far more acceptable than the current one. If you want to pass a table from a client application to a stored procedure to have some rows updated in it, you can always return the data as regular result set. In difference to communication between stored procedures, this does not incur any overhead, since the data will have to cross the wire one way or another, so whether it is a result set, or a table parameter that is updated should not matter. Note also, that in an example such as my get_fund_orders, handling multiple result sets is trivial in the communication between client and server.

Strictly speaking, a CLR module in SQL Server is not a client, but since CLR modules uses the same API, SqlClient, that real client programs use, it is obvious that my restriction would apply to CLR modules as well. An author of a CLR module that would like to work with read-write table parameters would have to return the data as result sets from the T-SQL stored procedure. This would admittedly mean one round of copying that would not be needed with read-write table parameters. But I think passing data between T-SQL procedures is far more common. Note that SQL 2008 does not support passing table parameters to CLR modules.

It could be questioned whether it is acceptable to have stored procedures that only can be called from T-SQL code, but not from a client API. However, there is a precedent: the cursor data type. Stored procedures with cursor parameters cannot be called from a client API.

Finally a quick note about linked servers. When you access a linked server, your local server is after all a client to the remote server. It would certainly be nice to be able pass read-write parameters when calling a remote procedure. However, currently SQL 2008 does not permit you to pass table parameters to remote procedures at all.

But Read-Write Table Parameters Could be Useful to Clients Too!

As I just argued, with the current paradigm how clients retrieve data from SQL Server, having table parameters as read-only in client-server communication is not a serious restriction. However, when you think of it, that paradigm is ridden with some quite serious problems. To wit, there is no contract that the client can rely on. A stored procedure can return differently structured result sets each time. But there are several high-level client features that relies on the result set being the one and the same. Furthermore, they want to know the structure of the result set(s) at design time. Example of this includes strongly typed data sets, report wizards not to mention how SQL Server itself access linked servers.

What these tools do today, is to run the procedure with SET FMTONLY ON. With FMTONLY on, SQL Server runs in a peculiar mode, where it does not run queries, only return metadata about result sets. FMTONLY is extremely fragile and not at all reliable. As just one example: if a procedure creates a temp table as a work area, the temp table is not created in FMTONLY mode, queries that refer to the temp table fail. There is a whole lot more. Overall, I find astounding that such an unrobust feature, hardly even worth to be called a "hack", exists in a product that aims at being enterprise-class.

What SQL Server needs is a way to define a contract for a stored procedure. I have floated some ideas with MVP colleagues. One was something like:

CREATE PROCEDURE some_sp @par1 ... 
RETURNS <description of result goes here>

And the procedure exists before the result is produced, there is an exception.

Another idea was to use table parameters. I have not analysed all alternatives in details myself, but from the contract perspective, table parameters are very appealing, as the contract definition is already in place. The client would call the procedure, and if the procedure produces a result set, the client wouldn't care. (Or it would raise an exception, because the client is not supposed to return result sets at all.) The data the client wants is already in the table parameter. There are still some issues to iron out, though. The procedure would have to do:

INSERT @returntable (...) SELECT ...

where it today performs a simple SELECT. The implementation would have to be smart and know that @returntable was passed from a client, and directly put the data on the network channel, so that data does not have to be bounced. But to do this, SQL Server would have to know that there will be no further operations on the table variable. Maybe there should be a special syntax for like SELECT ... FOR RETURN and it would be an error if this was executed twice.

As you realise, this discussion goes far beyond the original topic for this article, and we should probably compose a separate Connect item for achieving reliable contracts between server and client.

Suggestions for the Introduction of Read-write Table Parameters

By default, parameters of scalar types are input-only, but not read-only; the called procedure may modify the parameter as it sees fit. A parameter can be made an input/output parameter by specifying the OUTPUT keyword both in the procedure definition and in the call to the procedure. That is, even if a parameter is declared as OUTPUT, a caller can still opt to pass the parameter as input-only.

This pattern is not really suitable for table parameters, as it would more or less require SQL Server to pass table variables by value, that is copy on input and copy back on output.

Therefore, I propose that a table-valued parameter that is not defined as READONLY, is always an input-output parameter. This could be amplified by requiring that the OUTPUT keyword is specified for table parameters when READONLY is not present. Likewise, it could be a requirement that OUTPUT is specified for actual parameters if the formal parameter is declared as OUTPUT. This would thwart the risk that a programmer thinks that his table variable will not be affected when he passes it to a stored procedure that takes a non-read-only table-valued parameter.

Obviously, if a table parameter is read-only, you cannot pass it to an inner procedure as an output parameter.

A special note about functions: I don't think the restriction that table-valued parameters must be read-only should be lifted for functions. It's a basic rule of user-defined functions in SQL Server that they should not have side effects.

Conclusion

I hope this text has convinced the reader that in order to build really scalable applications with SQL Server requires passing data between stored procedures through tables, both for input and output. Therefore the restriction that table-valued parameters must be read-only should be lifted in the next version of SQL Server. And if you agree with this, go to https://connect.microsoft.com/SQLServer/feedback/ViewFeedback.aspx?FeedbackID=299296 and vote!