My SQL Server Wishlist
Version 2012

An SQL text by Erland Sommarskog, SQL Server MVP. Most recently updated 2013-05-18.

Introduction

As you may imagine, my wishlist for new features in of SQL Server exceeds what most kids send to Santa Claus. This article is a collection of suggestions filed on Microsoft Connect that I think Microsoft should implement. This is not to say that this is all that I wish for; to keep a focus, I have constrained this list to development-related items.

The list consists of four sections. There is one for general programming, one for improvements to querying and there is a section with DDL improvements. But the most important section is the first section which focuses on area where I feel that Microsoft rather have gone backwards over the years: facilitating robust system development. Robustness implies that you detect errors as early at possible and the first gate is compile time. My first and most import wish is for much stronger compile-time checks, and I have compiled these ideas in a separate article, Ideas for SET STRICT_CHECKS ON. (Updated with new ideas in May 2012.) Robustness is also about error handling at run-time, an area in which Microsoft made some improvements in SQL 2005, but it's still a mess. Robustness is also about expressiveness, for instance to have domains with rules attached to them and that cannot be converted by mistake to another domain. Another such example is using identifiers, e.g. enums, instead of constants, so that spelling errors can be detected at compile-time. It's also about contracts, for instance being able to guarantee the shape of the result set returned from a stored procedure. All these are wishes you find in the section on Robust System Development.

Many of the Connect items that I list in this article are my own, but I have also given attention to many good suggestions from my fellow MVPs and other users. In case you don't know, anyone can submit bugs and suggestions for SQL Server on Microsoft Connect, all you need is a Windows Live ID. You can also go to Connect to vote on other people's suggestions, and the chief reason that I publish this article is indeed to whip up more votes for good proposals. And, yes, Microsoft does care about these votes.

I first published this wishlist when SQL 2008 was released. With the release of SQL 2012 I have updated the wishlist and removed items that have been fulfilled, as well as a few where Microsoft has closed the Connect item with Won't Fix, and I decided to give it up. I've also added a few new items to emphasize the focus on robustness. If you are missing a wish from the old wishlist, you can view the SQL 2008 version with annotations to see why I have deleted it.

Here is a table of contents with all the wishes:

   Introduction
      Note on Old Connect Items
   Robust System Development
      SET STRICT_CHECKS ON – Find More Errors at Compile Time
      TRY-CATCH Should Always Work!
      Permit Recoverable Error Handling – Don't Doom Transactions Willy-nilly
      Savepoints and Distributed Transactions
      Enums in T‑SQL
      Contracts for Stored Procedures
      Give us Real Domains!
      Immutable Column Values
      Add a Virtual Errors Table
      SET TRANSACTION ISOLATION LEVEL READ COMMITTED LOCK
      Make YYYY-MM-DD Safe
   Other General Programming Improvements
      Inline Scalar UDFs
      Make Table Parameters Read-Write
      Get the Call Stack!
      Autonomous Transactions
      Module-Level Table Expressions
      Improvements for Bulk Load
      Deleting Lots of Data Efficiently
      Get Position within an XML Document
      Really Small Things
   Improvements for Querying
      A Built-in Table of Numbers
      IS [NOT] DISTINCT FROM
      Type the Literal!
      Row Constructors
      QUALIFY for More Concise Queries
      Remove Illogical Restrictions for IF EXISTS()
      Regular Expressions
      Dynamic Pivot
   DDL Improvements
      Server-Side CLR Compiles
      Synonyms for Databases and Linked Servers
      BEFORE triggers
      ALTER TABLE Improvements
   Revision History

Note on Old Connect Items

There appears to be a problem with very old Connect items, those with a number in the 12xxxx range. First I was not able to see them I was signed in to Connect, and this was reported by others as well. This problem has been resolved, but when you hover of the vote buttons you get Voting is longer enabled for this item and it is indeed not possible to vote for these items.

There a couple of such old Connect items on the wishlist, and as far as possible I have tried to find alternate items to vote for. However, the old items are often the main entries for Microsoft with the highest number of votes, so it would be desireable if it was possible to vote for these. I have logged a bug with the Connect site itself to find out whether this a mishap or intentional.

Robust System Development

SET STRICT_CHECKS ON – Find More Errors at Compile Time

Could you imagine programming in C# and not be told until run-time that you left out a mandatory parameter in a procedure call? Or that you accidently assigned an integer variable a string value and your program crashes when the string is not a number? Or that the compiler does not care if you refer to a non-existent class? The very idea is of course ludicrous. Yet, this is exactly the reality that T‑SQL developers have to work in. This means increased cost of development and testing, because it takes longer time to find silly errors. It also mean a risk that a critical function in a production system errors out, because of a typo that could have been caught within seconds from when it was made.

I see no reason why we T‑SQL programmers should be content with such a primitive environment that has a such distinct flavour of, um, Fortran. For this reason, a couple of years ago I submitted Connect item 260762, entitled Add optional checks for more robust development where I suggest a new command SET STRICT_CHECKS ON. When strict checks are in force, you would get compile-time errors for a lot of things that does not blow up until run-time today – or even worse produce incorrect results.

To further help Microsoft to get this right, I have a separate article, Ideas for SET STRICT_CHECKS ON, where I discuss what could be included in this command. I published this article originally in 2008, and with the release of the 2012 version of the wishlist, I have updated the strict-checks document with more and improved ideas and deleted some I no longer believe in.

Here is a brief summary, with direct links to the various sections in that article:

TRY-CATCH Should Always Work!

If you try this:

CREATE PROCEDURE bad_sp AS
BEGIN TRY 
   PRINT 'This prints'  
   SELECT * FROM doesnotexist
END TRY
BEGIN CATCH
   PRINT 'This does not print'
END CATCH

When you run the procedure you find that the output is:

This prints
Msg 208, Level 16, State 1, Line 3
Invalid object name 'doesnotexist'.

That is, the CATCH block is never entered. The root problem here is deferred name resolution, the demise of which I call for in my strict-checks document. SQL Server never reports the table as missing until it actually tries to execute the statement. But it is still a compilation error, and today a compilation error cannot be caught in the same scope as the error occurs. (This is fully documented in Books Online.) You can catch the error in an outer scope, that is:

BEGIN TRY
   EXEC bad_sp
END TRY
BEGIN CATCH
   PRINT 'The error is: ' + error_message()
END CATCH

This will print:

This prints
The error is: Invalid object name 'doesnotexist'.

When TRY-CATCH first appeared I was so happy with the big improvement from SQL 2000 that I was prepared to accept this glitch. But two releases later? No way! This is a serious problem. Deferred name resolution itself is bad, because it defers checking of silly typos until run-time. But the fact that you cannot handle it appropriately adds insult to injury. Who says that you always have that surrounding stored procedure?

I see posts about this about every week on the forums. It may be people who are trying TRY-CATCH for the first time, and what impression do they get of SQL Server when they find that it does not seem to work? That Oracle after is all is not such a bad product?

Carlos Fatureto submitted item 496758, Try-catch should capture the parse errors in October 2009. Microsoft has closed this item as Won't Fix! To keep this issue alive, I've submitted a new item 739957, TRY-CATCH should always work.

Note: even if STRICT_CHECKS would be implemented and deferred name resolution would be killed, TRY-CATCH still needs to be fixed, since a table may be dropped after a procedure is created.

Permit Recoverable Error Handling – Don't Doom Transactions Willy-nilly

In many situations when an error occurs in your code, all you want to do is to roll back everything and make sure that you exit as soon as possible. Maybe you want to log the error on the way out, but that's that.

But there are also situations where you for one reason or another want to continue the transaction, either by performing some alternate action, or roll back to a savepoint – which is quite different from rolling back the entire transaction. Say that you write a stored procedure that perform some well-defined piece of work on its own, and thus defines a transaction for this work. However you expect your procedure to be used as part of a bigger plot, and if an error occurs in your code you don't want to pull the rug for the caller and roll back its transaction. Rather, you could define a savepoint in the beginning of your procedure, and if there is an error, you roll back to that savepoint.

A specific situation is when you work with Service Broker. In this case you want a transaction that always commits so that the message is taken off the queue. If the normal processing fails, you want to roll back to the point just when you had taken the message off the queue to store the message in an error table.

So can't you do this today? Sure, the command SAVE TRANSACTION permits you to define savepoints in your code, but you may not be able to roll back to them for two reasons:

  1. The error is not caught not all, as I discussed in the previous section.
  2. The error may doom the transaction. That is, the transaction is not committable and must be rolled back entirely; you cannot roll back to a savepoint in this situation.

The fact that a transaction can be doomed is not very strange as such. If there is a deadlock and you are the deadlock victim, you should be a good server citizen and roll back. Likewise, there is little reason to continue if there is a resource error like a full disk. The problem is that it there are many errors that doom the transaction despite there is not good reason for it. For instance why would

SELECT convert(date, '20001131')

doom the transaction when

SELECT convert(datetime, '20001131')

does not? Overall, it is almost impossible to predict whether a certain error dooms the transaction or not, because as the example shows, there is such a big inconsistency. Well, there is one way to get more consistent behaviour: run with XACT_ABORT ON. With this setting almost all (but not all!) errors doom the transaction. Which really is a shame, because running with XACT_ABORT ON has to be considered to be best practice, since this setting ensures that if there is no TRY-CATCH, any error will abort execution and roll back any open transaction.

All this makes SAVE TRANSACTION and savepoints fairly useless concepts today. Why would you build functionality that only works sometimes and you cannot rely on?

This is covered in item 740673, Don't doom transactions unless there is a logical reason for it.

Savepoints and Distributed Transactions

Even if the previous wish is fulfilled, there is still a problem: you cannot use SAVE TRANSACTION in a distributed transaction. Now, this restriction is not entirely without logic. You may be running a distributed transaction against a data source that does not support savepoints. And even if you have two SQL Servers, if one instance issues SAVE TRANSACTION that may have to be propagated to the other instance.

Then again, say that you have a stored procedure where you define a savepoint in the beginning of the procedure, and then roll back to that savepoint in case of an error. It does not seem inconceivable that this would be impossible to carry out even if the procedure is called inside a distributed transaction. However, I don't know exactly how much work that would be to implement for Microsoft. And there are so many other problems with linked servers and distributed transactions, that I could not really motivate submitting a Connect request for this.

Still... You may not anticipate that you will be in a distributed transaction, but someone may decide to use your procedure as a building block in a distributed transaction two years from now. So you may say "OK, if there is a distributed transaction, I skip the savepoint". Problem is – there is no way to tell that you are in a distributed transaction! For this reason I once submitted item 466739 – There should be a system function to tell whether the current transaction is a distributed transaction. Microsoft has closed this item as Won't Fix. I must say that this was really a slap in the face. Apparently they prefer us to write our code in a dark room.

Enums in T‑SQL

We probably all have lot of constants in our code. Maybe we use numbers, in which case we will have to know that 1 for order type means "Internet order", 2 means "telephone order" and so on. Or maybe we try to use mnemonic codes. Whatever, there is no safety-net if we mix up the numbers or spell the codes wrongly.

SQL Server MVP Simon Sabin has submitted a suggestion that T-SQL should permit us define enums to get away from this, a very interesting idea that has attracted almost 200 votes of this writing. Connect item 254293 Allow literals and read only table data to be represented as enums.

Contracts for Stored Procedures

I happened to find item 525653, Stored procedures should expose detailed contracts, submitted by SQL Server MVP Greg Low. He has a blog post that details his idea. In short, when you create a stored procedure you should be able expose which result sets it returns if any. If the procedure returns something else, this results in an error.

Greg's suggestion is exactly along my thinking that the programming environment should supply methods to make our systems robust. Contracts are an essential part of object-oriented programming in clients and middle tiers. So why would good virtues stop just because we enter a database?

To be fair, Microsoft did do some development along these lines in SQL Server 2012 when they added the clause WITH RESULT SETS. But this is a clause of the EXEC statement. Thus, it's the onus of the caller to state what result set it expects the callee to return, which is fairly clunky.

Give us Real Domains!

If you go to http://www.postgresql.org/docs/9.1/static/sql-createdomain.html you will find this syntax:

CREATE DOMAIN name [AS] data_type   
   [ DEFAULT expression ]      
   [ constraint [ ... ] ]    

where constraint is:    

[ CONSTRAINT constraint_name ]  
{ NOT NULL | NULL | CHECK (expression) }

This syntax is not proprieatry to PostgreSQL, but it's drawn from ANSI SQL-92. With this syntax you can for instance say:

CREATE DOMIAIN update_user AS nvarchar(128)
DEFAULT SYSTEM_USER
NOT NULL
VALUE = SYSTEM_USER

And then you can use that domain in a table definition:

CREATE TABLE sometable (col1        int   NOT NULL, 
... whodidit update_user,
...)

You can achieve this in SQL Server, using a different syntax:

CREATE TYPE update_user FROM nvarchar(128) NOT NULL
go
CREATE DEFAULT update_user_def AS SYSTEM_USER
go 
EXEC sp_bindefault 'update_user_def', 'update_user'
go
CREATE RULE update_user_rule AS @x = SYSTEM_USER
go 
EXEC sp_bindrule 'update_user_rule', 'update_user'
go

However, all commands but CREATE TYPE are deprecated. And not entirely without reason: you can use sp_bindxxx to bind a rule or a default directly to a table column, for which is little reason since we have constraints. But for types there is nothing better.

No matter whether you call them types or domains, they can help you in several way to implement more robust database systems. With a domain you confine set of rules that apply to all columns and variables of that domain. That is, length for string/binary columns, scale and precision for decimal data types, as well as default and constraints. For string columns, the collation could also be such a property. Furthermore, a good database system would not permit you to mix domains. If we have one domain for order IDs and one for product IDs, a condition like:

FROM OrderDetails OD JOIN Products P ON P.ProductID = OD.OrderID

would result in an error.

SQL Server supports some of these things I've mentioned but far from all. This is legal:

DECLARE @myvar my_domain

But no rules or defaults are applied for the variable. This is illegal:

SELECT cast(somecol AS my_domain)

For cast and convert you need to use the physical type definition. Furthermore, SQL Server imposes no restrictions of mixing user-defined types. As the alternate names suggests, they are just aliases.

Clearly, there is room for improvement. At a minimum, Microsoft should undeprecate the use of sp_bindxxx with user-defined types and then implement support for types with constraints or defaults where this is missing. (E.g., you cannot types with rules in table variables; a table cannot be the target for MERGE if the table has a column of a type with a rule.) Better is to use the standard-compliant syntax shown above, and even better is to go all the way and make all what a domain could be.

There are two Connect items in this area. A very old one is item 124645, Add CREATE DOMAIN command from Andrew Mackie. This one is currently closed for voting, and I have submitted a new one Implement real domains, number 739783.

Note: I don't know whether PostgreSQL (or any other product for that matter) implements the domain concept as far-reaching as I suggest in this section.

Immutable Column Values

Think of it, how many columns are there in your database that once they have been written must never change again? Or, when the row has passed some status change, it is impermissible to change the data. For instance, in an Orders table, you cannot permit the customer id to change, at least not if the order has been shipped and invoiced. How many of you enforce these rules consistently? Let me see a show of hands! Hmm, that was not many. I will have to admit that I don't do it very often myself, only days when I'm really paranoid, because it requires quite a lot of trigger code.

But if there was DRI to do it... I've submitted Connect item 346200, Add DRI to enforce immutable column values. The alias BinaryCoder has a similar request which also calls for DDL to enforce columns to have a certain value after an UPDATE, even if the UPDATE does not touch this column, which would be very good for auditing columns. Connect item 355461, Trigger Performance for Simple/Common Scenarios. (Despite the name it suggests new DRI.)

Unfortunately, Microsoft has closed my item as Won't Fix, while the other remains active. The arguments from Microsoft are not particularly persuasive. One is "this can be done in triggers". Sure, there was the days when we did foreign keys and CHECK constraints in triggers – I'm certainly not longing for these days to come back! Another argument is that this is not in the standards. But it could become part of the standards! A lot of what comes into the SQL standards is due to development that happens among the vendors on the RDBMS market. I feel that this is a feature that could make a difference – and why could not Microsoft drive the standards for a change?

Add a Virtual Errors Table

You are taught to use set-based statements, because they perform a lot better than a cursor. There is just one problem: if there is an error with a single row, the entire statement is rolled back. Sometimes this is what you want, but far from always. In many cases you want to take the rows that passes all constraints and triggers, and divert the other rows somewhere else.

Dan Holmes came up with this great suggestion: SQL Server should have a virtual table called errors to go together with inserted and output. That is rows that causes errors – conversion errors, string or binary data would be truncated, constraint errors would be diverted to this table, and provided that they are successfully inserted into that table, the statement as such would pass.

It shall not be denied that there is some work to get this feature right. And while not covered in Dan's suggestion, I would love to have access to this table inside a trigger – where I would insert bad rows. The prime area for this feature would be in ETL scenarios, but I can see lot of other situations where this feature would be super-useful.

Dan's connect item has number 774754 and is entitled New virtual table: errors. It would analogous to the deleted and inserted tables. Microsoft attempted to close it one occasion. The result was the number of votes four-doubled in a matter of days...

SET TRANSACTION ISOLATION LEVEL READ COMMITTED LOCK

This is a request is that looks at robustness from a perspective which is very important in database systems, to wit transaction consistency.

SQL 2005 introduced two ways to use a snapshot where readers and writes do not block each other. Setting your database into READ_COMMITTED_SNAPSHOT is a very quick operation for increasing the concurrency of your application. (And it also saves you from having to use the highly unsafe NOLOCK hint). However, there is a potential problem. Code that reads data to check integrity may read stale data. SQL Server MVP Hugo Kornelis discusses this in a series of blog posts, Snapshot isolation: A threat for integrity? Alex Kuznetsov also discusses this on his blog and in his book Defensive Database Programming.

A quick example: Assume two processes in the system one that enters orders and one that deregister products. There is a trigger on the OrderDetails table to check that ordered products are still active, and there is a trigger on Products that verifies that there are no open orders for deregistered products. At the same time as an order being entered, a product on that order is deregistered. When the two triggers run, both triggers read from the snapshot and both incorrectly think everything is fine. The end result is a violation of a business rule.

There is a workaround today, you can add the hint READCOMMITTEDLOCK to force reading directly from the data page and not the snapshot. But a hint is not really appropriate, because a hint is for a specific table. When you are in the situation you don't want to read stale data, it is much likely that it applies across the board, for instance in a trigger. Thus you need to use the hint for all tables accessed in your stored procedure or trigger which will litter the code. And there is a risk that you miss to add the hint for some tables.

For this reason I submitted Connect item 125573 where I call for a SET TRANSACTION ISOLATION LEVEL command to force READ COMMITTED to work against the data pages and not the snapshot. (This is one of the old Connect items for whicih voting is no longer enabled.) Not the least is this important in triggers – triggers should not use snapshots at all.

Make YYYY-MM-DD Safe

Consider this:

set language German
go
select convert(datetime, '1989-06-12'),
       convert(datetime2, '1989-06-12'),
       convert(date, '1989-06-12')

The output is:

1989-12-06 00:00:00.000   1989-06-12 00:00:00.0000000   1989-06-12

SQL Server has always interpreted XXXX-YY-ZZ as Year-Day-Month when the date format setting is DMY (which German implies), a very unnatural interpretation. Microsoft fixed this for the new date/time data types in SQL 2008, but did not have the guts to change it for existing types, although SQL Server MVP Steve Kass suggested this in Connect item 290971, Deprecate the date literal interpretation 'YYYY-DD-MM' except in compatibility modes <= 90.

Other General Programming Improvements

Inline Scalar UDFs

If you are a programmer with a background in traditional languages, you probably like to modularise your code, but you can easily overdo it in T-SQL. A common mistake is to put a query in a scalar UDF. No matter how simple your query is, you now have a cursor in disguise with disastrous consequences for performance.

For table-valued functions there two are varieties, inline and multi-statement functions. But there is no such thing as an inline scalar function today. This is a big shortcoming. Not only does this mean that the simplest scalar function requires a call overhead. But since the optimizer is blind to what's inside, it cannot do a proper job. With inline functions, the optimizer would consider the expanded query and would therefore be able to do a better job.

It is conceivable that Microsoft would add new syntax to permit us define inline functions. And that is probably a good idea, since that makes it possible for us to state our intent. However, Microsoft should also consider analysing functions using the current syntax, and if the body is no more complex than this:

DECLARE @ret

SELECT @ret = something
FROM ...

RETURN @ret
The function should be stamped as inline. Given the number of such functions out there, this could magically make a lot of slow applications out there suddenly run much faster!

Connect items in this area are 273443, The Scalar Expression function would speed performance while keeping the benefits of functions, submitted by SQL Server MVP Andrew Novick, 524983, User defined function performance is unacceptable from SQL Server MVP Simon Sabin and 544121, Inline Scalar UDF as possible, especially that has no data access! from Jethro.

Note: there exists a workaround today. Make your function an inline table function and invoke it through CROSS APPLY. But this is fairly clunky. And it's certainly nothing that is adopted by the broad masses – who may not understand the problems with using scalar functions in the first place.

Make Table Parameters Read-Write

Table-valued parameters were a nice addition in SQL 2008, but since they are read-only they are not half as exciting as they could have been. With read-write table parameters, it would be possible to pass sets of data between stored procedures in a structured way. To do this today, you can employ various tricks that I discuss in my article How to share data between stored procedures.

During the beta of SQL 2008, I submitted a Connect item where I argued that Microsoft should lift this restriction for calls between stored procedures, but retain it for calls from client APIs. My thinking was that the problems that had caused MS to introduce the restriction lied in the API:s and the communication with them. Unfortunately, the request has not yet been honoured. But Connect item 299296, Relax restriction that table parameters must be readonly when SPs call each other is still active.

I argue further for my position in the article Why Read-only Table Parameters is Not Enough.

Get the Call Stack!

You can get the currently executing module through the @@procid function, and you can pass it as a parameter to a general logging function. But there are many cases, where you would like to know the entire call stack. For instance, say that you have a generic error-handling procedure that you call from your CATCH handlers. Today you can log error_procedure() and also @@procid if you take it as a parameter. But if the procedure that failed is called from several places, you may have very little knowing how you ended up there. Another situation is in a trigger where you would like to bypass a check if the trigger was invoked by a certain procedure that is permitted to break the rules.

When I originally submitted Connect item 124537, I called it Provide function to retrieve the entire call stack. Today, I think that a DMV might be better. Whatever, the information must be accessible for a plain user, else you cannot put it in application code. This is one of these old items that are currently possible to vote for. Unfortunately, I have not seen any similar Connect item you could vote on instead..)

Autonomous Transactions

Today if you nest BEGIN and COMMIT TRANSACTION, inner COMMIT:s just decrease the transaction count, and only when you come to the outermost COMMIT, the actual commit happens. And most of the time, this is what you want. But there are situations where you want to do things outside the current transaction. A typical example is when you log something to an error table. You want that error message to persist, even if the transaction is rolled back. Another example is given in Connect item 296870, Add support for autonomous transactions, submitted by Anthony Bloesch.

Module-Level Table Expressions

Some time back I found myself writing a stored procedure where I prepared data coming from an incoming source, which included a lot of validations and transformations. I had a bunch of UPDATE statements that went:

UPDATE mytable
SET    some_col = ...
FROM   mytable m
JOIN   ...
WHERE  m.fileid = @fileid
  AND  m.status = 'New'

And it made me a bit nervous. What if I accidently left out the conditions on fileid or status and updated rows I did not want to touch? It occurred to me that it would have been really nice, if I had been able to say:

DECLARE @mynewrows AS
   (SELECT *
    FROM   mytable
    WHERE  fileid = @fileid
      AND  status = 'New')

Then my UPDATE could be written as:

UPDATE @mynewrows
SET    some_col = ...
FROM   @mynewrows
JOIN   ...

Essentially, a Common Table Expression, but on module level. Note that a view would not cut it, due to the reference to the local variable. A inline-table function would work, but I would still have to repeat that @fileid again and again. And I would have to litter the system catalogue with something I need in one single stored procedure only.

That inspired me to Connect item 343067, Module-level table expressions.

Improvements for Bulk Load

Bulk load – i.e. BCP, BULK INSERT and OPENROWSET(BULK) - is an area that could serve from some enhancements.

The first and the last of these items have been closed as Won't Fix, but it's still possible to vote for them. And it is still possible for Microsoft to change their minds. The second item is as closed as External (I don't know what that means). This is one of these old items that are currently not available for voting.

Deleting Lots of Data Efficiently

A big pain-point in SQL Server is when you want to delete lots of data. There are two major scenarios. One is when you age out old data (and thus keep some data in the table). Another is when you empty a table entirely; the latter typically happens in ETL operations. For the first scenario there is partition switching, and the second there is TRUNCATE TABLE. In both cases, any foreign keys you have get in the way.

There are a number of Connect items around this problem suggesting different solutions. It's not that I wish Microsoft to implement all of them, but I list all here, so that you can cast your vote for the approach you think is the best.

Microsoft's own preference seems to be for item 509341 submitted by SQL Server MVP John Paul Cook, Provide a mechanism for minimally logged deletes that won't bloat the transaction log, This proposal has at this writing attracted 66 up votes, but also 13 down votes, including mine, which is a fairly high number. There are two things that make me uneasy with this proposal. One is that minimally logged deletes sounds something that would be difficult to implement. The other is that the proposal appears to call for non-recoverable operations. It is very clear to me that Microsoft will never permit non-recoverable operations, because if a non-recoverable operation fails, you have a corrupt database.

My own suggestion in this area is Connect item 312074,  Permit TRUNCATE TABLE when referencing tables are empty. This suggestion does not address all scenarios, but it sounds as something that would be fairly simple to implement. However, Microsoft has closed this item with Won't Fix in favour of the proposal from John Cook. I can't say that I agree with the reasoning they have in their comment. But then again, they know more about the internals than I do. In passing, I note that as of this writing my proposal has garnered 86 up votes and no votes against.

Another idea that floats up from time to time is that is should be possible to specify a batch size for DELETE (and UPDATE), like you can do for BULK INSERT. I found one Connect item COMMIT INTERVAL for Delete/Update by Jorj51 on this line. The response from Microsoft makes it clear that they are against the very idea.

Finally there is an item, Truncate partition of partitioned table from Wim SQL Server which calls for being able to do TRUNCATE TABLE on a single partition. This suggestion does not really address the problems I discussed above as Wim seems to be content with restoring foreign keys. Nevertheless the suggestion has a merit, since today if you want to truncate a partition, you need to create an empty table to use with ALTER TABLE SWITCH.

Get Position within an XML Document

When you work with an XML document, you often have reason to refer to the position in the document. For instance, consider an incoming document with Order and OrderDetails data in a business-to-business system, and you need to be able to correlate the order and the details. However, there is not really any good key in the document, at least not one that you can feel that you can trust. That is, it is perfectly possible that there are two Order elements that are identical on that level. (The details may differ.) If you look at the document, it's still a piece of cake: that's the third order and that's the fourth order and so on. But if you are shredding the documents with .nodes(), there is no reliable way you can find out which node that is the first, the second and so on.

There is a very good Connect item 383888, Fully support position() in xquery,  submitted by the signature "phe" which discusses this. The Connect item also suggests some workarounds, but as the author points out, they are inefficient with large XML documents. My thanks to SQL Server MVP Alejandro Mesa who pointed me to this item.

Really Small Things

Fix the awful error message String or binary data would be truncated so that it reports which value that would be truncated. There are at least three Connect items that directly aims at this error message: 125347 (SQL Server MVP Louis Davidson), 339410 (DWalker) and 533212 (GTrz). There is also general item Cleanup Error Messages submitted by Isaac Kunen, a Program Manager in the SQL Server team which covers this error message and many more.

124627 – Schema not reported in the ERROR_PROCEDURE function. As long as you have all your objects in dbo, this is no biggie, but the day you start using multiple schemas... Submitted by Louis Davidson. This issue is also included in Isaac Kunen's general item above.

320998 – Add datediff_big. A datediff that returns bigint is good when you have sysadm things that show long time processes have been connected in milliseconds, and the result is > 20 days.

293333 – Add sysdate(). A sysdate() function would make things easier when you want query a date column for things that happened today.

Improvements for Querying

A Built-in Table of Numbers

There are quite a few SQL problems where you need a suite of numbers to generate a dimension. For instance: Show me sales per day. Days for which there were no sales at all should appear with a 0. To generate the days missing from the Sales table, you need a calendar table – or a table with numbers from 1 to as many days there are in the period. If you have read my article Arrays and Lists in SQL Server, you have seen several methods to unpack a list into a table that employ a table of numbers. You can easily generate such a table, but if you generate too few numbers, you queries will produce an incorrect result when you run out of numbers. Another issue is that the optimizer thinks this that is just another table, and does not understand its special properties, and therefore does not make a very good job with it.

On the other hand, if there was such a built-in table (which of course would not be a physical table, but a logical concept), you would never run out of numbers, and the optimizer could make shortcuts for better performance. For this reason I have submitted Connect item 258733, Add a built-in table of numbers.

IS [NOT] DISTINCT FROM

Say that you want to compare the same tables in two databases with the same schema and find all rows for which there are differences. You need something like this:

SELECT ...
FROM   db1.tbl a
JOIN   db2.tbl b ON a.keycol = b.keycol
WHERE  (a.col1 <> b.col1 OR
        a.col1 IS NULL AND b.col1 IS NOT NULL OR
        a.col1 IS NOT NULL AND b.col1 IS NULL)
   AND ... -- The same expression for all other columns.

You need to write it this tediously, because the plain <> condition will not return rows where only one of the columns are NULL, since NULL is never equal or unequal to another NULL, but NULL <> NULL yields UNKNOWN.

SQL:1999 defines the operators IS DISTINCT FROM and IS NOT DISTINCT FROM which are the same as <> and = respectively, except that they also applies to NULL values. With this operator, the above becomes:

SELECT ...
FROM   db1.tbl a
JOIN   db2.tbl b ON a.keycol = b.keycol
WHERE  a.col1 IS DISTINCT FROM b.col1
   OR  a.col2 IS DISTINCT FROM b.col2
   OR  ... 

This is captured in Connect item 286422, Add language and optimizer support for ISO <distinct predicate>, submitted by SQL Server MVP Steve Kass.

Note there are some technical problem on Connect with this item. It is said to be closed as a duplicate, and if you go to the item which is listed as the duplicate, that is item is also closed as a duplicate. There is however a comment from Microsoft that this item is still very much on their radar. This note does not appear in the regular orange colour for Microsoft comments; due to the technical problems, Umachandar had to use the regular Connect interface and add the comment as a regular user.

Type the Literal!

Today there are three base types of literals in T-SQL: numbers, strings and binary strings (those starting with 0x). This means that for data types like date, uniqueidentifier etc, there is no way you can achieve a literal of that type. Add to this that it's more or less a science trying to figure out what is the data type of a numeric literal. You can work around this by using cast() or convert(), but it is quite bulky. SQL Server 2012 added a couple of new functions which are specifically geared for the date/time data types, for instance datefromparts(). And while they are welcome additions, you cannot say that datetimeoffsetfromparts() with its nine arguments is short and slender.

ANSI SQL suggests a path where you can prefix the literal with the type, for instance DATE '2008-09-21', a bit in analogy how we do it with Unicode literals today. SQL Server MVP Steve Kass has suggested this in Connect item 356470, Support syntax for typed literal values. While a bit of syntactic sugar, it could help to make our code more concise.

Row Constructors

If you want your UPDATE statements to be ANSI-compliant and update several columns from the same source you need to write:

UPDATE tbl
SET    col1 = (SELECT somecol FROM ...),
       col2 = (SELECT someothercol FROM ...),
       ...

Not only is this tedious; since SQL Server is not able to recognize that the subqueries are the same, performance can suffer as well. In true ANSI SQL, this is not an issue, because you can use row constructors:

UPDATE tbl
SET    (col1, col2, ...) = (SELECT somecol, someothercol, ... FROM ...)

The regular workaround in SQL Server is to use the proprietary syntax with the FROM clause, and personally I prefer that syntax in most situations, but sometimes it's more natural to use a subquery.

Another situation where row constructors can be handy is for queries like:

SELECT ... FROM tbl WHERE (col1, col2, col3) IN (SELECT col1, col2, col3 FROM somewhereelse)

In this case, too, there is a good alternative, the EXISTS clause, and it's not even proprietary. Nevertheless, there are situations where you may be more comfortable with the IN syntax.

The most compelling case for row constructors is the SET statement:

SET (@a, @b, @c) = (SELECT a, b, c FROM ...)

The alternative today is to use the SELECT statement, but the behaviour when the SELECT hits anything but exactly one row, is not always intuitive. With the addition of row constructors, SET would become the prime alternative for assignment.

There are several Connect items in this area. The main item with the most votes is Add support for ANSI standard row value constructors (299231), submitted by SQL Server MVP Hugo Kornelis. Other items are Support row constructors for conditional expressions (127024), submitted by Ahsukal, 262284 ("myudkin"), 127028 (Chris Hedgate) and 288772 (mimrsh).

Note: a potential problem with row constructors with many columns is that it can be difficult to keep the lists in the two constructors in sync. This problem exists today already with INSERT-SELECT. In my document on strict checks, I discuss ideas how SQL Server could help us from goofs here.

QUALIFY for More Concise Queries

It is very common to use row_number() (or rank() or dense_rank()) to number set of rows and then filter on the numbering to get the most recent row(s). To do this, you always need to use a CTE or a derived table, and put the row-number filter in the outer query. The reason for this is that an SQL query is evaluated in this order:

FROM-JOIN-ON
WHERE
GROUP BY
HAVING
SELECT
ORDER BY

Since the ranking is defined in the SELECT list, you cannot refer to it in WHERE, only in SELECT and ORDER BY.

While this is not very difficult to write, it still makes the code more tedious. Teradata implements a fourth filtering clause beside ON, WHERE and HAVING, which is evaluated after SELECT and after ORDER BY. They call it QUALIFY. While QUALIFY does not bring any new query powers, it opens the possibility to make queries more concise, since the need to use nested queries is reduced. Therefore it would be a very welcome addition to SQL Server.

SQL Server MVP Itzik Ben-Gan has submitted Connect item 532474, QUALIFY – Request for a New Filtering Clause. His Connect item also includes a more detailed discussion on QUALIFY as well as examples.

Remove Illogical Restrictions for IF EXISTS()

When you have a query in IN/WHILE EXISTS(), you cannot use CTE:s or query hints, restrictions that appear as illogical. That is, you cannot say:

IF EXISTS (WITH tab AS (SELECT 1 AS col)
           SELECT * FROM tab)
   PRINT 'okiedokie'
ELSE
   PRINT 'not ok';

or

IF EXISTS (SELECT * FROM tbl
           OPTION (MAXDOP 1))
    PRINT 'yes';
ELSE
    PRINT 'no';

The workaround is certainly simple: run a SELECT that assigns the result into a variable, and the test on the variable. But nevertheless the inconsistency sticks out like a sore thumb. And, assume that you are in a stressful situation where you need to address a performance problem with an IF EXISTS query. You realise that the MAXDOP hint is what you need. Certainly you get far less nervous if you could only insert the query hint, rather than having to actually edit the code to move the query, add a variable and all that.

SQL Server MVPs Aaron Bertrand and Hugo Kornelis have submitted Connect items 270480, Query hints not allowed after or inside IF EXISTS(), and 308744, Allow Common Table Expressions in all query expressions.

Regular Expressions

If you have any experience of regular expressions from the Unix world, Perl or the RegEx classes in .Net, you probably don't find LIKE in SQL very powerful. On top of that it uses wildcard characters that are a bit, well, unusual. SQL:1999 has a more powerful operator SIMILAR TO, but it's still not in par with the real McCoy, and it's still uses the odd wildcard characters. Microsoft could implement SIMILAR TO while they are at it, but the prime importance is that we get something that draws on the Unix traditions to bring the full power of regular expressions to the SQL Server users.

I'm not including an example of how this would look like, because if you already know about regular expressions you don't need any examples to be convinced. And if you never have seen one, you would only be frightened. :-) Admittedly, regular expressions have a bit of a learning curve, and they are not always pretty. But they are extremely powerful.

SQL Server MVP Simon Sabin has submitted Connect item 261342, Regex functionality in pattern matching. Alternate items are 261342, Add built-in support for regular expressions in T‑SQL (Steven Twitchell, Microsoft), 361357 Support POSIX-style regular expressions in T‑SQL (yours truly) and RegEx Column Property for Validation (Eidolon).

Dynamic Pivot

This is something I only do very rarely myself, but there are questions about dynamic pivot about every day on the forums, so I don't need to be convinced that there is a business case. What it is all about is that you want to make rows into columns for a report. This can easily be done with this pattern:

SELECT group_item,
       MAX(CASE WHEN month = 1 THEN qty END) AS JanQty,
       MAX(CASE WHEN month = 2 THEN qty END) AS FebQty,
       ...
       MAX(CASE WHEN month = 12 THEN qty END) AS DecQty
FROM   tbl
GROUP  BY group_item

You can also use the PIVOT operator, but I find it more difficult to use without any actual benefit. Whichever method you use there is an important restriction: you need to know in advance which columns – their names and how many there are of them – your query will produce. But in many cases, this is information that is not known until run-time. If there are sales numbers for five products, you want one column for each product. If there are 14 products, you want 14 columns.

One thing is important to observe: this can never be a regular SELECT, because a SELECT statement returns a table, and a table has a known set of columns. This makes it possible to reuse a SELECT as the inner part of a bigger query. Thus SQL Server would need to use a different keyword, for instance TRANSFORM (which is the command for dynamic pivot in in Access), and this command would only be able to return data to the client.

There is no end of Connect items: someone using the alias Pich submitted connect item 127071, PIVOT, long ago, which is the main entry which has attracted most votes. But since this item is not open for voting, you can instead vote for any of these items: 236237 (Mark Yudin), 296108 (moody31415), 420980 (Wim SQL Server), 635642 (dkangel21).

DDL Improvements

Server-Side CLR Compiles

The ability to create stored procedures and functions in .Net languages was one of the most spectacular additions to SQL 2005. And used correctly, this is also very powerful. One problem, though, is that it's fairly bulky to create a CLR module. You first have to compile your C#/VB code, load it into SQL Server with CREATE ASSEMBLY, and finally run CREATE PROCEDURE/FUNCTION/etc to create the object. Visual Studio can perform these steps for you, and if you create a larger assembly from which you define a whole number of procedures and functions that may be alright. But if all you want to do is to write a 50-line utility function, using Visual Studio is even bulkier than using the command line. Not the least if you are a DBA sort of person, who don't have Visual Studio installed at all.

It would be so much more convenient if you could say:

CREATE PROCEDURE my_proc @par1 int,
                         @par2 int WITH
EXTERNAL LANGUAGE C#, PERMISSION_SET = EXTERNAL_ACCESS AS
using System;
using System.Data;
using System.Data.SqlClient;
...

SQL Server MVP Greg Low has submitted the very compelling Connect item 265266, Add server-side compilation ability to SQL CLR about this. There is also item 318327 from BinaryCoder along the same lines with different syntax.

Synonyms for Databases and Linked Servers

Synonyms were added in SQL 2005, and they can be useful. Maybe the most common use for synonyms is when your application is split up on over several databases, so get your cross-databases queries running you can set up a synonym for every table, and a setup script to run with that. But it's a bit roundabout, it would be simpler if all you had to do was to set up a synonym for the other database, and then you would be done. SQL Server MVP Aaron Bertrand has submitted Connect item 311079, Expand synonym to other entities (database, linked server).

BEFORE triggers

SQL Server has AFTER and INSTEAD OF triggers. Other products have BEFORE triggers which is a more natural thing. INSTEAD OF triggers, mainly introduced to make more views updatable, are a little funky as they require you to redo the command. This can introduce a maintenance problem and sometimes other difficulties as well. With BEFORE triggers there are no such problems.

The advantage with BEFORE triggers is that you can perform validation before the command is executed, so a big bad statement will not need a big bad rollback, because it's stopped before it takes off. This is even more important for DDL triggers, not the least if you are using Policy-Based Management (PBM) and you are lured to use the alternative "On Change – Prevent". Today PBM lies through its teeth. It prevents nothing. If you say no index creation during office hours, PBM has no chance to stop this – but it will eventually roll back the command which just make matters worse.

There are four Connect items here. Item 243986, Add "instead of" DDL triggers is from SQL Server MVP Adam Machanic and is closed as Postponed. Chuck02323 has submitted item 285655, triggers before insert and row level. This item also calls for per-row trigger like in Oracle, a feature I'm cool to. The item with the most votes this far is 299328, INSTEAD Of Triggers and DMF from SQL Server MVP Greg Low. (DMF is an older name for PBM.) Both of these items are still active. Since none of these items in my opinion said the right thing, I submitted on my own item, 361223, Add BEFORE triggers – not the least for DDL statements which Microsoft has closed my item as Won't Fix.

ALTER TABLE Improvements

My favourite in this area is this oldie, item 124781, ALTER TABLE syntax for changing column order submitted by someone using the alias PNPwi. It's a bit controversial: every time someone asks in the forums how to do this, they are told that they shouldn't and that column order doesn't matter. It's true that today this is a complicated operation, since you have create a new table, copy data over, drop the old table and remember to recreate indexes, constraints and triggers. And move referencing foreign keys. However, I don't give much for the argument that column order does not matter. True, from a technical standpoint it does not matter, but it matters for humans. When I document a table, I want to describe the columns in the order they appear in the table, so you easily can map to what you see with SELECT *. But I also want to document the columns in a logical order, and not in the order they happened to be added to the table. This is one of these old items that are not open for voting currently. I have submitted a new item, 739788 which you can vote for.

What if you realise that you have a table with the IDENTITY property, and that was a mistake? Or the other way around, you should have made that column an IDENTITY from the beginning? Strangely enough can you use ALTER TABLE to change this in SQL Server Compact Edition. But in the "big" SQL Server your only option is to drop or reload the table. SQL Server MVP Adam Machanic has submitted Connect item 252226, Allow enabling and disabling of a column's IDENTITY property to call for a change. You may think that with the introduction of sequences in SQL 2012, this request is no longer equally important, but I say it's the other way round. You may want to move over to use sequences, but you may find that recreating and reloading your big table with IDENTITY is a roadblock.

A small thing, but nevertheless. When you change or drop a column you say ALTER TABLE tbl DROP/ALTER COLUMN, but when you add a column, COLUMN is not even permitted in the syntax! Connect item 124626, Add COLUMN keyword to the ADD COLUMN SYNTAX (as alternate), submitted by SQL Server MVP Louis Davidson.

Revision History

2013-05-18 – Added a new item Add a Virtual Errors Table.

2012-12-26 – Updated the list with regards to the old bugs with numbers starting in the 12xxxxx range. They are now visible for everyone – but voting is not enabled for them.

2012-05-07 – Released the SQL 2012 version of the wishlist. You can view the older versions with annotations here.

Back to my home page.