Dynamic Search Conditions in T‑SQL

An SQL text by Erland Sommarskog, SQL Server MVP. Most recent update 2014-12-27.

Introduction

It is very common in information systems to have functions where the users are able to search the data by selecting freely among many possible criterias. When you implement such a function with SQL Server there are two challenges: to produce the correct result and have good performance.

When it comes to the latter, there is a key theme: there is no single execution plan that is good for all possible search criterias. Rather, you want the query plan to be different depending on user input. There are two ways to achieve this. You can write a static SQL query and add the hint OPTION (RECOMPILE) which forces SQL Server to compile the query every time. Or you can use dynamic SQL to build a query string which includes only the search criterias the user specified. We will look at both these approaches in this article. They are both viable, and a good SQL programmer should have both in his toolbox since both have their strengths and weaknesses.

This article assumes that you are on SQL 2008 or later. A key feature of this article is OPTION (RECOMPILE), a query hint that was introduced already in SQL 2005, but which was implemented properly first in SQL 2008. And to be precise, you should be on at least Service Pack 2 of SQL 2008, or Service Pack 1 for SQL 2008 R2 to take benefit of this feature. For a full discussion of how this feature has changed forth and back, see the section The History of Forced Recompilation, which also discusses a bug with OPTION (RECOMPILE) fixed in the autumn of 2014.

If you are still on SQL 2005 or SQL 2000, there is an older version of this article where I cover additional techniques that are not equally interesting on SQL 2008 and later, thanks to OPTION (RECOMPILE).

I begin the article looking at some methods which are good for very simple cases where you only need to handle a very small set of choices and where the more general methods shoot over the target. The next chapter introduces the task in focus for the rest of the article: the requirement to implement the routine search_orders in the Northgale database which I introduce this chapter. The two main chapters of this article look at implementing this procedure with static and dynamic SQL.

Table of Contents

Introduction

Alternate Key Lookup

The Case Study: Searching Orders

The Northgale Database

Static SQL with OPTION (RECOMPILE)

Why Static SQL?

The Basic Technique

The Coalesce Trap

Handling Multi-valued Parameters

Choice of Sort Order

Optional Tables

Alternate Tables

When OPTION (RECOMPILE) Hurts You

The History of Forced Recompilation

Dynamic SQL

Why Dynamic SQL?

The Ways to Do Dynamic SQL

Permissions

Implementing search_orders with a Parameterised Query

Compilation and Caching

Special Search Conditions

Using the CLR

An Example with Unparameterised SQL

When Caching Is Not What You Want

Conclusion

Feedback and Acknowledgements

Revision History

Alternate Key Lookup

Using IF statements

Problems with dynamic search conditions come in several flavours. In the general case, there is a search form where the user can select between many search conditions, and this is also the main focus of this article. But sometimes you encounter problems with a small number of conditions that are more or less mutually exclusive. A typical example would be a form where a user can look up a customer by entering one of: 1) The name of the customer. 2) The customer number. 3) The customer's national registration number. (That is, what is called SSN, personnummer etc. depending on where you are.) There are indexes on all three columns.

None of the solutions in the main body in the article are not really suitable here. Forcing a recompile every time with OPTION (RECOMPILE) can add too much load to the system, particularly if these lookups are frequent. And dynamic SQL is just too much hassle for a simple problem like this one.

So let us look at more lightweight solutions that fit this problem. A very simple-minded way is to use IF:

IF @custno IS NOT NULL
   SELECT ... FROM customers WHERE custno = @custno
ELSE IF @natregno IS NOT NULL
   SELECT ... FROM customers WHERE natregno = @natregno
ELSE IF @custname IS NOT NULL
   SELECT TOP 200 ...
   FROM   customers
   WHERE  custname LIKE @custname + '%'
   ORDER  BY custname
ELSE
   RAISERROR('No search condition given!', 16, 1)

(The TOP 200 for the search on customer name limits the output in case the user would enter a very short search string, so that we don't return tens of thousands of customers.)

If you need to return data from other tables as well, and you don't want to repeat the join, you could enter all matching customer numbers into a table variable or a temp table, and then do your final join:

IF @custno IS NOT NULL
   INSERT @cust (custno) VALUES (@custno)
ELSE IF @natregno IS NOT NULL
   INSERT @cust (custno) SELECT custno FROM customers WHERE natregno = @natregno
ELSE IF @custname IS NOT NULL
   INSERT @cust (custno)
      SELECT TOP (200) custno
      FROM   customers
      WHERE  custname LIKE @custname + '%'
      ORDER  BY custname
ELSE
   RAISERROR('No search condition given!', 16, 1)

SELECT ...
FROM   @cust c
JOIN   customers cst ON cst.custno = c.custno
JOIN   ...

There is however a potential performance problem here. No matter which choice the user makes, we want the optimizer to use the index on the chosen search column. But the way SQL Server builds query plans, this may not always happen. When the procedure is invoked and there is no plan in the cache, SQL Server builds the plan for the entire stored procedure and "sniffs" the current input values for the parameters. Say that the first user to make the search enters a customer number. This means that the branches for national registration number and customer name are optimised for NULL and under unfortunate circumstances this could lead to a plan with a table scan, which is not what you want. (For an in-depth discussion on parameter sniffing, see my article Slow in the Application – Fast in SSMS.)

To prevent this from happening, there are a couple of precautions you can take. One is to push the three SELECT statements down into three subprocedures, but admittedly this is a bit bulky. Another approach is to add explicit index hints, but you should always be restrictive with index hints. For instance, what if someone renames the index? That would cause the query to fail.

Rather, the best option is probably to use the OPTIMIZE FOR hint:

SELECT TOP 200 custno
FROM   customers
WHERE  custname LIKE @custname + '%'
ORDER  BY custname
OPTION (OPTIMIZE FOR (@custname = N'ZZZZZZZ')) 

This hint causes SQL Server to build the query plan for the value you specify. Obviously you should pick a value which is selective enough.

Whatever strategy you choose, you should test on production-size data that you get the plans you expect. Due to the sniffing issue, your test should look something like this:

EXEC find_customer @custno = 123
EXEC find_customer @natregno = '1234567890'
EXEC find_customer @custname = 'ABC'
EXEC sp_recompile find_customer   -- flush the plan for the procedure
EXEC find_customer @natregno = '1234567890'
EXEC find_customer @custno = 123
EXEC find_customer @custname = 'ABC'
EXEC sp_recompile find_customer
EXEC find_customer @custname = 'ABC'
EXEC find_customer @custno = 123
EXEC find_customer @natregno = '1234567890' 

That is, you should test with all three parameters as the parameter "sniffed" when the plan is built.

In this particular example, there is one more issue with the @custname parameter that I have ignored so far: the user could add a leading %, in which case a scan would be a better choice. If you need to support searches with a leading %, the best is to split this into two branches:

IF left(@custname, 1) <> '%'
   -- query as above
ELSE
   -- same query, but with different value in OPTIMIZE FOR.

Using OR

If you don't like the multiple IF statements, you may be delighted to know that it is in fact perfectly possible do it all in one query as long as you can ignore leading % in @custname:

SELECT TOP 200 ...
FROM   customers
WHERE  (custno = @custno AND @custno IS NOT NULL) OR
       (natregno = @natregno AND @natregno IS NOT NULL) OR
       (custname LIKE @custname + '%' AND @custname IS NOT NULL)
ORDER  BY custname

The WHERE clause here essentially reads:

custno = @custno OR natregno = @natregno OR custname LIKE @custname + '%'

But the added conditions with IS NOT NULL serve a purpose. With them, the chances are good that the optimizer will pick a plan that seeks all three indexes using index concatenation. However, thanks to the IS NOT NULL conditions, SQL Server will add Filter operators with a startup expression so that at run-time, only one index is accessed. (I return to startup expression in the section Optional Tables later in this article.)

This strategy usually works well as long as the search terms are all in the same table and all are indexed, but rarely (if ever) if the search terms are in different tables. In any case, you should never use this strategy blindly, but always verify that you get the plan – and the performance – you intended.

The Case Study: Searching Orders

We will now turn to a more general case with many search terms. We will work with implementing a stored procedure that retrieves information about orders in the Northgale database, which is an inflated version of Microsoft's classic Northwind database. See later in this chapter how to install it.

This is the interface that we expose to the user (well rather to a GUI or middle-layer programmer):

CREATE PROCEDURE search_orders
                 @orderid     int          = NULL,
                 @fromdate    datetime     = NULL,
                 @todate      datetime     = NULL,
                 @minprice    money        = NULL,
                 @maxprice    money        = NULL,
                 @custid      nchar(5)     = NULL,
                 @custname    nvarchar(40) = NULL,
                 @city        nvarchar(15) = NULL,
                 @region      nvarchar(15) = NULL,
                 @country     nvarchar(15) = NULL,
                 @prodid      int          = NULL,
                 @prodname    nvarchar(40) = NULL,
                 @employeestr varchar(MAX) = NULL,
                 @employeetbl intlist_tbltype READONLY AS

SELECT o.OrderID, o.OrderDate, od.UnitPrice, od.Quantity,
       c.CustomerID, c.CompanyName, c.Address, c.City, c.Region,
       c.PostalCode, c.Country, c.Phone, p.ProductID,
       p.ProductName, p.UnitsInStock, p.UnitsOnOrder, o.EmployeeID
FROM   Orders o
JOIN   [Order Details] od ON o.OrderID = od.OrderID
JOIN   Customers c ON o.CustomerID = c.CustomerID
JOIN   Products p ON p.ProductID = od.ProductID
WHERE  ???
ORDER  BY o.OrderID

You see in the SELECT list what information the user gets. Here is a specification of the parameters:

ParameterFunction
@orderidRetrieve this order only.
@fromdateRetrieve orders made on @fromdate or later.
@todateRetrieve orders made on @todate or earlier.
@minpriceRetrieve only order details that cost at least @minprice.
@maxpriceRetrieve only order details that cost at most @maxprice.
@custidRetrieve only orders from this customer.
@custnameRetrieve only orders from customers whose name starts with @custname.
@cityRetrieve only orders from customers in this city.
@regionRetrieve only orders from customers in this region.
@countryRetrieve only orders from customers in this country.
@prodidRetrieve only order details with this product.
@prodnameRetrieve only order details with a product starting with @prodname.
@employeestr
@employeetbl

These two parameters serve the same purpose: return only orders for the specified employees. @employeestr is a comma-separated string with employee IDs, while @employeetbl is a table-valued parameter. See further the discussion below.

If the user leaves out a search condition, that search condition should not apply to the search. Thus, a plain EXEC search_orders should return all orders in the database. In this text I will discuss some different implementations of search_orders, unimaginatively named search_orders_1 etc. Some of them are included in whole in this text, others only in parts. All are available in the dynsearch-2008 directory on my web site. (The numbering of the procedures is quite out of order with the text due to the historic evolution of this article.)

The last two parameters in the parameter list serve the same purpose functionally. I have included both to illustrate two ways how to handle multi-valued parameters. When I see these questions on the forums, people almost always have a comma-separate list. This list must be cracked into a table to be usable in SQL Server, which easily can be done with a table-valued function. In my article Arrays and Lists in SQL Server 2005 you can find a whole slew of such functions.

My own preference is to use a table-valued parameter and I discuss this in detail in my article Arrays and Lists in SQL Server 2008. I have not included these two parameters in all search_orders procedures; they are only present when I want to illustrate a technique to handle them. Some procedures have only one of them. (In case you wonder why @employeetbl does not have a default value: Table-valued parameters can never have an explicit default value, but instead they always have the implicit default value of an empty table.)

The search_orders example is not overly complicated; each condition can be implemented with a single condition using =, <=, >= or LIKE. In a real-life application you may encounter more complex requirements:

In interest of keeping this article down in size, I have not included such parameters in search_orders. Nevertheless, I cover some of these points in the text that follows.

The Northgale Database

I wrote the first version of this article when SQL Server 2000 ruled the world. I worked from the Northwind database, which shipped with SQL 2000. However, since that database was so small, it was not possible for me to draw any conclusions at all about performance. For this reason, I composed the Northgale database, an inflated version of Northwind. Tables and indexes are the same, but I have exploded the data so that instead of 830 orders, there are 344035 of them.

To install Northgale, you first need to create the Northwind database on your server. Download the script to install it from Microsoft's web site. (If you are on SQL 2012 or later, beware that the script will fail if you have a surrogate-aware collation. Change the CREATE DATABASE statement to force a different collation, if this occurs to you.) Once you have Northwind in place, run Northgale.sql. To install Northgale, you need 4.6 GB of disk space. When the install has completed you can reclaim 4 GB by removing the log file Northgale_log2. (The script attempts to do this, but it always seems to fail. Rerunning the last statement in the script a little later seems to work.) By default, the database is installed in the same directory as the master database, but you can edit the script to change that. (There is no issue with surrogate collations for the Northgale script.)

The script for Northgale works by cross-joining the tables in Northwind and for important entities like IDs and names I have created new ones by combining the existing ids and the same goes for some of the names. I have reused the existing cities, countries and regions together with some extras that I needed for this article, so these columns do not have very good selectivity.

Northgale includes the table type intlist_tbltype used for the @employeetbl parameter as well as the table-valued function intlist_to_tbl to crack @employeestr into a table.

Keep in mind that Northgale still is a small database. It easily fits entirely into memory on a laptop with 4 GB of RAM. A poorly written query that requires a scan of, say, the Orders table, still returns within a few seconds. It's hopefully big enough to give a sense for how good or bad different solutions are, but I would advise you to not draw any far-reaching conclusions. It is also worth pointing out that the way the database was composed, the distribution of data is a bit skewed.

Static SQL with OPTION (RECOMPILE)

Why Static SQL?

Solutions for dynamic search conditions in static SQL almost always include the query hint OPTION (RECOMPILE), although there are a few simple cases where the hint is not needed, and we saw an example of this in the introducing chapter Alternate Key Lookup.

The advantages with these solutions are:

But there are also disadvantages:

In the following I will elaborate these points in more detail.

Note that this section assumes that OPTION (RECOMPILE) works like it does in SQL 2008 SP2, SQL 2008 R2 SP1 and later versions. See the section The History of Forced Recompilation for more details how the hint worked in older versions.

The Basic Technique

The basic technique for static SQL with OPTION (RECOMPILE) is illustrated by search_orders_3, which I initially show in a simplified form without the parameters @employeestr and @employeetbl.

CREATE PROCEDURE search_orders_3
                 @orderid   int          = NULL,
                 @fromdate  datetime     = NULL,
                 @todate    datetime     = NULL,
                 @minprice  money        = NULL,
                 @maxprice  money        = NULL,
                 @custid    nchar(5)     = NULL,
                 @custname  nvarchar(40) = NULL,
                 @city      nvarchar(15) = NULL,
                 @region    nvarchar(15) = NULL,
                 @country   nvarchar(15) = NULL,
                 @prodid    int          = NULL,
                 @prodname  nvarchar(40) = NULL AS

SELECT o.OrderID, o.OrderDate, od.UnitPrice, od.Quantity,
       c.CustomerID, c.CompanyName, c.Address, c.City, c.Region,
       c.PostalCode, c.Country, c.Phone, p.ProductID,
       p.ProductName, p.UnitsInStock, p.UnitsOnOrder
FROM   Orders o
JOIN   [Order Details] od ON o.OrderID = od.OrderID
JOIN   Customers c ON o.CustomerID = c.CustomerID
JOIN   Products p ON p.ProductID = od.ProductID
WHERE  (o.OrderID = @orderid OR @orderid IS NULL)
  AND  (o.OrderDate >= @fromdate OR @fromdate IS NULL)
  AND  (o.OrderDate <= @todate OR @todate IS NULL)
  AND  (od.UnitPrice >= @minprice OR @minprice IS NULL)
  AND  (od.UnitPrice <= @maxprice OR @maxprice IS NULL)
  AND  (o.CustomerID = @custid OR @custid IS NULL)
  AND  (c.CompanyName LIKE @custname + '%' OR @custname IS NULL)
  AND  (c.City = @city OR @city IS NULL)
  AND  (c.Region = @region OR @region IS NULL)
  AND  (c.Country = @country OR @country IS NULL)
  AND  (od.ProductID = @prodid OR @prodid IS NULL)
  AND  (p.ProductName LIKE @prodname + '%' OR @prodname IS NULL)
ORDER  BY o.OrderID
OPTION (RECOMPILE)

The effect of all the @x IS NULL clauses is that if an input parameter is NULL, then the corresponding AND-condition is always true. Thus, the only conditions that are in effect are those where the search parameter has a non-NULL value. Sounds simple enough, but there is a very big difference in performance with or without that last line present:

OPTION (RECOMPILE)

The hint instructs SQL Server to recompile the query every time. Without this hint, SQL Server produces a plan that will be cached and reused. This has a very important implication: the plan must work with all possible input values of the parameters. Due to parameter sniffing, the plan may be optimised for the parameter combination for the first search. That plan is likely to perform poorly with entirely different parameters, while it still would not be perfect for the initial parameter combination. For optimal response times when the user provides a single order ID, we want the optimizer to use the indexes on OrderID in Orders and Order Details and ignore everything else. But if the user performs a search on a product ID or a product name, we want to use the index on ProductID in Order Details and so on for other search criterias.

And this is exactly what we achieve with the hint OPTION (RECOMPILE). Since SQL Server is instructed to recompile the query every time, there is no need to cache the plan, why SQL Server can handle all the variables as constants. Thus, if the procedure is called like this:

EXEC search_orders_3 @orderid = 11000

SQL Server will in essence optimise this WHERE clause:

WHERE  (o.OrderID = 11000 OR 11000 IS NULL)
  AND  (o.OrderDate >= NULL OR NULL IS NULL)
  AND  (o.OrderDate <= NULL OR NULL IS NULL)
  AND  (od.UnitPrice >= NULL OR NULL IS NULL)
  AND  (od.UnitPrice <= NULL OR NULL IS NULL)
  AND  (o.CustomerID = NULL OR NULL IS NULL)
  ...

SQL Server is smart enough to remove all these NULL IS NULL from the query, so in essence it works with this WHERE clause:

WHERE o.OrderID = 11000

The choice of using the indexes on OrderID to drive the query becomes a no-brainer. And if you take this call:

EXEC search_orders_3 @custid = 'ALFKI'

The effective WHERE clause becomes:

WHERE  (o.OrderID = NULL OR NULL IS NULL)
  AND  (o.OrderDate >= NULL OR NULL IS NULL)
  AND  (o.OrderDate <= NULL OR NULL IS NULL)
  AND  (od.UnitPrice >= NULL OR NULL IS NULL)
  AND  (od.UnitPrice <= NULL OR NULL IS NULL)
  AND  (o.CustomerID = N'ALFKI' OR N'ALFKI' IS NULL)
  ...

The optimizer decides that the index on CustomerID is good.

Some more test cases that you can try and look at the query plan:

EXEC search_orders_3 @prodid  = 76
EXEC search_orders_3 @prodid  = 76, @custid = 'RATTC'
EXEC search_orders_3 @fromdate = '19980205', @todate = '19980209'
EXEC search_orders_3 @city = N'Bräcke', @prodid = 76

These two calls are particularly interesting:

EXEC search_orders_3 @custid = 'ERNTC',
                     @fromdate = '19980218', @todate = '19980218'
EXEC search_orders_3 @custid = 'BOLSR',
                     @fromdate = '19960101', @todate = '19961231'

The two calls have the same set of parameters, but yet they produce different query plans. The first call searches for a single day for a customer with many orders. The second call searches a full year for a customer with a single order.

To better understand the benefit of OPTION (RECOMPILE), you can play with search_orders_3b, which is identical to search_orders_3, except that it does not have the query hint. Run the examples above with search_orders_3b and compare with the query plans for search_orders_3. Here is an exercise that is particularly illuminating: First run

EXEC sp_recompile search_orders_3b

to make sure that there is no plan in the cache for the procedure. Now run search_orders_3b first with @orderid = 11000 and then with @prodid = 76. You will notice that the latter is search is a tad slow. Flush the query plan with sp_recompile again and run these two searches in reverse order. That is, first run with @prodid = 76, and then with @orderid = 11000. You may find that it takes 20 seconds to retrieve that single order. This happens because the plan was optimised for a search on product ID and that plan did not work well with a search on order ID – but it did produce the correct result eventually.

The Coalesce Trap

Rather than using OR like above, some people write one of:

o.orderID = coalesce(@orderid, o.OrderID)
o.orderID = isnull(@orderid, o.OrderID)

coalesce() is a function that takes a list of values as argument, and returns the first non-NULL value in the list, or NULL if there is no non-NULL value in the list. Thus, if @orderid is NULL, you get o.OrderID = o.OrderID, a complete no-op – or so it may seem. You can see a full example of this in search_orders_3a.

This yields code that is even more compact than using OR, but I strongly recommend that you stay away from this method, because there is a trap. Run this:

EXEC search_orders_3 @orderid = 10654
EXEC search_orders_3a @orderid = 10654

The first call return nine rows, but the last returns no rows at all! Why? The reason is that for the customer on this order, the column Region is NULL. When @region is NULL, the condition

c.Region = coalesce(@region, c.Region)

becomes in essence NULL = NULL. But in SQL, NULL is not equal to NULL. NULL stands for "unknown value", and any comparison with NULL yields neither true nor false in the three-valued logic of SQL, but unknown. Whence, no rows are returned.

To avoid this trap, some people write things like:

coalesce(c.Region, '') = coalesce(@region, c.Region, '')

This is not only more kludgy, but since the column is entangled into an expression, this may preclude the use of any index on the column.

Another "workaround" is to write this particular condition as (c.Region = @region OR @region IS NULL) like in the original search_orders_3. But that begs the question why you should use the construct with coalesce or isnull at all, when it only works under some conditions. Thus, the simple advice is: stay away from this trap entirely.

Handling Multi-valued Parameters

Let's now look at how the parameters @employeestr and @employeetbl are handled in search_orders_3. Here are the last two conditions in the WHERE clause:

AND  (o.EmployeeID IN (SELECT number FROM intlist_to_tbl(@employeestr)) OR
      @employeestr IS NULL)
AND  (o.EmployeeID IN (SELECT val FROM @employeetbl) OR @hasemptbl = 0)

intlist_to_tbl is a function that cracks a comma-separated list into table. This function is included in the Northgale database. @hasemptbl is a local variable which is defined first in the procedure:

 DECLARE @hasemptbl bit = CASE WHEN EXISTS (SELECT * FROM @employeetbl)
                               THEN 1
                               ELSE 0
                          END

This help variable is needed for the optimizer to understand that it can ignore @employeetbl when there are no rows in it. This does not work if the EXISTS test is in the main query itself.

Here are some test cases:

EXEC search_orders_3 @employeestr = '402,109,207', @custid = 'PERTH'

DECLARE @tbl intlist_tbltype 
INSERT @tbl (val) VALUES(402), (109), (207)
EXEC search_orders_3 @employeetbl = @tbl, @custid = 'PERTH'

Both should return 53 orders.

While this works, there is a difference with regards to simple scalar parameters. For the latter, the optimizer knows the exact parameter values and builds the plan accordingly. As we saw in the example with @custid and a date interval, different parameter values can yield different plans. But this cannot happen with the multi-valued parameters in search_orders_3. All the optimizer knows about @employeetbl is the number of rows in the table variable. For @employeestr, it does not even know that. As long as the distribution of the values in the search column is fairly even, this may not be much of an issue. But if there are skews, it certainly matters.

One way to deal with this is to insert the values in a temp table. A temp table has distribution statistics which gives the optimizer more information. But if there are just a small number of values, you may have to force statistics update yourself, as auto-stats for a temp table happens first when six rows have been inserted/modified.

A more elaborate method is illustrated in the procedure search_orders_3c. The idea is that most of the time the users only want to search for two or three values at a time. Therefore, if there are up to four elements in the list, the procedure uses an IN expression, else it uses use a table variable. The procedure has this initial code:

DECLARE @rowc   int,
        @emp1   int,
        @emp2   int, 
        @emp3   int,
        @emp4   int,
        @emptbl bit = 0

IF @employeestr IS NOT NULL
BEGIN
   INSERT @employeetbl (rowno, employeeid)
      SELECT row_number() OVER(ORDER BY (SELECT 1)), number
      FROM   intlist_to_tbl(@employeestr)
   SELECT @rowc = @@rowcount

   IF @rowc BETWEEN 1 AND 4
   BEGIN
      SELECT @emp1 = employeeid FROM @employeetbl WHERE rowno = 1
      SELECT @emp2 = employeeid FROM @employeetbl WHERE rowno = 2
      SELECT @emp3 = employeeid FROM @employeetbl WHERE rowno = 3
      SELECT @emp4 = employeeid FROM @employeetbl WHERE rowno = 4
   END
   ELSE IF @rowc > 4
      SELECT @emptbl = 1
END

(For simplicity, search_orders_3c only has the parameter @employeestr, and @employeetbl resurfaces as a local work table.) If @employeestr has a value, we unpack the list into a table variable and we number the rows. If there are four values or less, we populate the variables @emp1 to @emp4. The last two conditions in the WHERE clause look like this:

AND  (o.EmployeeID IN (@emp1, @emp2, @emp3, @emp4) OR @emp1 IS NULL)
AND  (o.EmployeeID IN (SELECT employeeid FROM @employeetbl) OR @hasemptbl = 0)

That is, the first condition applies if @emp1 has a value, which it has if the list had one to four values. The second condition only applies if @hasemptbl is 1, which happens if there are five or more values in the list.

You can compare the query plans for these three cases:

DECLARE @empl intlist_tbltype
INSERT @empl(val) VALUES(805) --, (304)
EXEC search_orders_3 @employeetbl = @empl
EXEC search_orders_3 @employeestr = '805', -- '805,304'
EXEC search_orders_3c @employeestr = '805', -- '805,304'

When I tested this on SQL 2012 with only EmployeeID 805 (the employee with the smallest number of orders), the first two calls resulted in a parallel plan which did not use the index on EmployeeID, whereas the last call produced a serial plan that used this index. Interesting enough on SQL 2014, where the new Cardinality Estimator comes into play, the first call produced a serial plan fairly similar to the plan for the third call. When I added 304 (the employee with the most number of orders) to the selection, the first call retained the same plan, whereas the call to search_orders_3c got a parallel plan. That is, with the strategy in search_orders_3c, you can get a plan which is tailored to the selected values.

I like to point out there is nothing magic with the number 4 here. You could set the limit to from any number from one to ten. Or even higher if you feel like, but I doubt that you will see any actual benefit with that long IN list over using a temp table or a table variable.

Choice of Sort Order

If users need to be able to choose the sort order, this can also easily be handled with static SQL. The basic pattern is:

ORDER BY CASE @sortcol WHEN 'OrderID'      THEN o.OrderID
                       WHEN 'EmployeeID'   THEN o.EmployeeID
                       WHEN 'ProductID'    THEN od.ProductID
         END,
         CASE @sortcol WHEN 'CustomerName' THEN c.CompanyName
                       WHEN 'ProductName'  THEN p.ProductName
         END,
         CASE @sortcol WHEN 'OrderDate'    THEN o.OrderDate 
         END

That is, you have a parameter that holds the column to sort by (or some other identifier that maps to the column name) and then you use CASE to select that parameter. One very important thing to observe is that all branches in a CASE expression must have a similar data type. Recall that a CASE expression has a static data type, which is determined according to the rules of data-type precedence in SQL Server. That is, THEN-expressions that are of types with lower precedence will be converted to the type with the highest precedence in the CASE expression. If you mix string and numeric columns in the same CASE expression, attempts to sort on a string column, will die with a conversion error. Thus, you need to have one CASE expression for numeric columns, one for string columns, one for dates etc.

While this looks a little daunting, the optimizer is able to reduce this to an ORDER BY with a single column and build the plan best suited for that sorting – as long as you use OPTION (RECOMPILE) of course.

If you want to support both ascending and descending sorts, you will need to double everything:

ORDER BY CASE WHEN @isdesc = 1 THEN
              CASE @sortcol WHEN 'OrderID'    THEN o.OrderID
                            WHEN 'EmployeeID' THEN o.EmployeeID
                            WHEN 'ProductID'  THEN od.ProductID
         END DESC,
         CASE WHEN @isdesc = 0 THEN
              CASE @sortcol WHEN 'OrderID'    THEN o.OrderID
                            WHEN 'EmployeeID' THEN o.EmployeeID
                            WHEN 'ProductID'  THEN od.ProductID
         END ASC

For brevity, I included only the numeric columns here, but as you see, it's starting to get a little ugly. Now, imagine that users should be able to select multiple sort columns with different data types and also ascending/descending for each column. If you would try the above strategy above, it would grow to something completely unmanageable. So while you can do user-selected sorting with static SQL, it is only practical if you only need to support a single column or if all your sort columns have the same data type. If you encounter anything beyond that, it is time to consider a solution with dynamic SQL instead. Or – this is not an option that should be overlooked – sort the data client-side.

Optional Tables

Sometimes you may have a situation that requires you to access a table only if a certain condition is given. Let's add one more parameter to our procedure: @suppl_country. If this parameter is provided, the procedure should only return information about products with a supplier from the given country. You could implement this by joining to the Suppliers table, but a suggestion that I originally got from Phillipp Sumi is that you should use an EXISTS clause in this way:

@suppl_country IS NULL OR EXISTS (SELECT *
                                  FROM   Suppliers s
                                  WHERE  s.SupplierID = p.SupplierID
                                    AND  s.Country    = @suppl_country)

To illustrate this, I wrote search_orders_9, which is the same as search_orders_3 and with the line above added (and without the parameters for search on employees). If you run this:

EXEC search_orders_9 @country = 'Sweden'
EXEC search_orders_9 @suppl_country = 'Sweden' 

and then look at the query plans, you will see that the first plan does not include Suppliers. This should not really come as a surprise, given what you have learnt about OPTION (RECOMPILE).

However, there is one more revelation to make, and to this end you need to comment out OPTION (RECOMPILE) from search_orders_9 and recreate it. Run the above again preceded by the command SET STATISTICS IO ON. If you look at the query plan for the first execution, you will see that Suppliers now appears in the plan. However, when you look at the output from SET STATISTICS IO ON, you will see something like this:

Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0.

Table 'Products'. Scan count 7, logical reads 238, physical reads 1, read-ahead reads 79.

Table 'Customers'. Scan count 7, logical reads 661, physical reads 0, read-ahead reads 218.

Table 'Orders'. Scan count 214, logical reads 37214, physical reads 6, read-ahead reads 352.

Table 'Order Details'. Scan count 7, logical reads 8996, physical reads 1, read-ahead reads 8935.

Table 'Workfile'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0.

Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0.

Note that Suppliers is not present in this output, whereas the output for the second execution starts off:

Table 'Suppliers'. Scan count 0, logical reads 11858, physical reads 0, read-ahead reads 0

That is, even without the RECOMPILE hint, SQL Server is able to avoid the access to Suppliers. If you look in the execution plan, you find a Filter operator above the Clustered Index Seek on Suppliers. If you hover over this Filter operator, you find that it has a Startup Expression Predicate. That is, SQL Server decides at run-time whether to access the table. This is nothing new; we saw the same thing, in the section Using OR for alternate key lookup.

This is something which is good to keep in mind if you encounter a situation where your only dynamic search condition is that depending on a parameter you should filter the data depending on rows (not) existing in a certain table. It may be overkill to use OPTION (RECOMPILE) if a filter with a startup expression works. I like to remind you that you should always inspect the query plan and test that performance is acceptable.

Note: some people with a background in languages like C++ may find this trite and think that this is just a matter of operator shortcutting. However, there is no operator shortcutting in SQL, but in SQL operands can be computed in any order, and the behaviour would be the same if the condition was written as EXISTS () OR @suppl_country IS NULL.

Alternate Tables

A scenario you may encounter is that depending on a parameter, you should read from different tables. For instance, say that there is a parameter @ishistoric. If this parameter is 1, you should read from the tables HistoricOrders and HistoricOrderDetails instead. There are no such tables in Northgale so I cannot show a full-fledged procedure. But here is how the FROM clause in search_orders_3 would be written to accomodate the situation. You replace the join between Orders and Order Details with a derived table which is a UNION ALL query of two join queries that both have a WHERE clause which includes or excludes the query depending on the variable:

FROM  (SELECT o.OrderID, o.OrderDate, od.UnitPrice, od.Quantity,
              od.CustomerID, od.ProductID
       FROM   Orders o
       JOIN   [Order Details] od ON o.OrderID = od.OrderID
       WHERE  @ishistoric = 0
       UNION  ALL
       SELECT o.OrderID, o.OrderDate, od.UnitPrice, od.Quantity,
              od.CustomerID, od.ProductID
       FROM   HistoricOrders o
       JOIN   HistoricOrderDetails od ON o.OrderID = od.OrderID
       WHERE  @ishistoric = 1) AS u
JOIN   Customers c ON o.CustomerID = u.CustomerID
JOIN   Products p ON p.ProductID = u.ProductID

With OPTION(RECOMPILE) only one set of order tables will be accessed at run-time. (Even without the hint, you could expect a plan with a startup expression.)

It is worth observing, that the query has a certain amount of repetition – the SELECT list and the join conditions. The more alternate tables there are, the case for dynamic SQL grows stronger, as with dynamic SQL you can avoid much of that repetition.

When OPTION (RECOMPILE) Hurts You

We have now seen many of the advantages with using OPTION (RECOMPILE), but before you start to use it all over town, you need to understand that too frivolous use of OPTION (RECOMPILE) can cause severe pain to your system. There is after all a reason why SQL Server in the normal case caches query plans. Without plan caching there would be many systems crumbling under the load of query compilation.

You need to understand how often your query will be executed. Say that there are users running search_orders once a minute in peak activity. In this case, the extra time we spend on compilation is clearly ignorable, not the least if 50 ms of compilation can reduce execution time from five minutes to 100 ms. But assume instead that there are over 100 calls to the procedure every second. And assume furthermore, that @orderID is the only input parameter in the vast majority of these calls. Recompiling the query every time to produce the same plan all over again is a perfect waste of CPU resources.

One way to alleviate this situation is to introduce an IF statement so that you have:

IF @orderid IS NOT NULL
BEGIN
   SELECT ...
   WHERE  O.OrderID = @orderid
     AND  -- Conditions on Order Details here.
   -- No OPTION (RECOMPILE) here!
END
ELSE
BEGIN
   SELECT ...
   WHERE  -- same conditions as before
   OPTION (RECOMPILE)
END

By adding a separate branch for @orderid that does not have OPTION(RECOMPILE), the common requests for a single order can be served from a cached plan, whereas other conditions still result in compilation every time. Assume now that once you have this in production, you find that the load from compilation is still a tad high, and you identify that there many are requests with @custid and @fromdate, with @fromdate typically being at most a week ago.

This calls for one more branch without OPTION (RECOMPILE), and you can see a complete solution in search_orders_4. However, this procedure is not likely to work out well. You may recall the discussion on parameter sniffing earlier and this strikes here. Say that the first call to search_orders_4 happens to be a search on @prodid. Now the two branches for order ID and recent orders for a customer will be optimised for a search on product ID, which will result in catastrophic performance. This can be addressed with index hints, or you can shove the SELECT without OPTION (RECOMPILE) into subprocedures, and this is what I have done in search_orders_4a.

But this is about as far this path can take you. If you find more common input combinations that cause too much compilation, this gets out of hand, and that's where you need to look into dynamic SQL as with dynamic SQL, you can get one cached plan for each combination of input parameters. More about that later.

The History of Forced Recompilation

The RECOMPILE option is somewhat confusing, because through the course of SQL Server, it has existed in several flavours and with different behaviour. Here is a short history.

For as long as I have used SQL Server, there has been an option on procedure level to force recompilation:

CREATE PROCEDURE search_orders ... WITH RECOMPILE AS

There is also a similar option on the EXEC statement:

EXEC some_other_proc @p1, @p2, ... WITH RECOMPILE

The problem with these options is that SQL Server optimises a stored procedure query by query and performs no flow analysis. Thus, even if the procedure is a single query, SQL Server cannot assume that the input parameters will have the same values when the query is executed as they have at compile time. Therefore it builds a plan which produces the correct results no matter the parameter values. Even if such a one-size-fits-all plan is optimised for the parameter values sniffed, it is not a very efficient plan. Thus, as long as we only had this option, static SQL was not a good solution for dynamic search conditions.

In SQL 2005, Microsoft introduced statement-level recompilation, and that included the hint OPTION (RECOMPILE). However, Microsoft had not drawn the inference that the variables in the query could be handled as constants, but the plan still was compiled to be correct with all possible values of the variables. So also in SQL 2005, static SQL was still not a good solution.

Microsoft addressed this flaw with the release of SQL 2008. Now variables were handled as constants when you used OPTION (RECOMPILE) and static SQL became a serious contender as a solution for dynamic search conditions. However, SQL 2008 had only been out for a few months when a serious bug was revealed: if two users were both running a query with OPTION (RECOMPILE) with different parameters in parallel, they could get each other's results. This was not exactly a simple bug to fix, so Microsoft saw no choice but to revert to the old behaviour, which happened in SQL 2008 RTM CU4.

But there was a customer in Bulgaria who had invested heavily in the new behaviour, and they were able to convince Microsoft that they had to fix the bug for real and restore the good behaviour of OPTION (RECOMPILE). This happened in SQL 2008 SP1 CU5.

When it comes to SQL 2008 R2, it was released a little too soon after SQL 2008 SP1 CU5 to include the fix. As a consequence, SQL 2008 R2 RTM shipped with the old behaviour of OPTION (RECOMPILE) and the bug fix got into SQL 2008 R2 RTM CU1.

Thus, there are some versions of SQL 2008 for which OPTION (RECOMPILE) does not work as touted in this article. To determine whether your server has a correct implementation of OPTION (RECOMPILE), issue this command:

SELECT serverproperty('ProductVersion')

It should return 10.0.2746.0 (SQL 2008) or 10.50.1702.0 (SQL 2008 R2) or higher. Note that these versions are very old by now, and preferably you should be running the latest and last service pack, which is Service Pack 4 for SQL 2008 and Service Pack 3 for SQL 2008 R2. (As SQL 2008 and SQL 2008 R2 have gone out of mainstream support, there will be no more service packs.)

If you are on SQL 2012 or later, OPTION (RECOMPILE) always works in the same way, so you don't have to worry. Well, almost. In the autumn of 2014, a new bug with OPTION (RECOMPILE) was uncovered. The issue is the same as with the first bug: users running the same query with different parameters can get each other's results. But whereas the first bug was very simple to repro, this bug requires quite specific circumstances to exhibit as described in KB2965069. I am not going to repeat the KB article here, but I certainly recommend that you read it and decide whether the scenario could apply to your system. If you think it does, you should certainly apply the applicable cumulative update. (Or any CU released later.) The KB article does not list SQL 2008 and SQL 2008 R2, but I would assume that this is due to that they have gone out of mainstream support. I have no information on the matter, but my assumption is that the bug is present in these two versions as well. That is, this is a bug has been there since the new behaviour was restored in SQL 2008 SP1 CU5, and it took four years before it was observed.

Dynamic SQL

Why Dynamic SQL?

It might as well be said directly: solutions with dynamic SQL require more from you as a programmer. Not only in skill, but foremost in discipline and understanding of what you are doing. Dynamic SQL is a wonderful tool when used correctly, but in the hands of the unexperienced it far too often leads to solutions that are flawed and hopeless to maintain.

That said, the main advantages with dynamic SQL are:

The disadvantages with dynamic SQL are:

It may seem from this list that there are more disadvantages with dynamic SQL than advantages, but if you look more closely, you see that the list is more or less complementary with the corresponding list for static SQL. Static SQL with OPTION (RECOMPILE) is good for many everyday situations, but when the requirements of what the users should be able to do become to complex, or some the searches are too frequent for recurring recompilations to be permissible, this is when you turn to dynamic SQL, fully aware of that it comes with a price.

The Ways to Do Dynamic SQL

Whereas for static SQL, there is a single basic pattern to work from, there is more than way to implement solutions for dynamic SQL. There are three different choices where to construct the SQL code:

  1. In a T‑SQL stored procedure.
  2. In a CLR stored procedure.
  3. Client-side.

And there are two choices for how to handle the input values from the users:

  1. Inline them into the query string, that is, do something like ' AND col = ' + convert(varchar, @value)'.
  2. Use parameterised queries with sp_executesql, so that the above fragment reads ' AND col = @value'.

Let's start with the latter, because this is not a matter of a choice in the normal manner. You don't pick one or the other as a matter of taste, phase of the moon, roll of a dice or the whim of the day. It cannot be enough emphasised: your norm and what you should use in 99.9 % of the time is parameterised queries. But there may be an occasional parameter where inlining is required for performance reasons. Once we have looked at the basic examples, we will look at such cases, as well as a discussion of why inlining is unacceptable in the general case.

On the other hand, the choice of where to construct the SQL code is to a large extent a matter of preference. None of them is intrinsically better than the other. Many people use T‑SQL procedures, and I will show you two implementations of search_orders using dynamic SQL in T‑SQL, one with a parameterised query, and one with inlined parameter values. It may seem natural to use T‑SQL to build a T‑SQL query, but the process of building a query is a matter of scalar logic and string manipulation for which traditional programming languages are better tools, and I will show implementations of search_orders as a CLR stored procedure, one in C# and one in VB .NET.

However, once you have made the realisation that it's better to use a traditional language to build the query string, you might ask yourself why it should be a stored procedure at all, why not do it client-side altogether? You would not be wrong in asking that question. Implementing your dynamic search entirely in client code is a perfectly legit choice as long as it fits in with the general development pattern for your application. That is, if the rule for the application is that all data access should be in stored procedures, you should stick to that pattern.

That said, you may encounter very complex requirements for your dynamic search. Not only should users be able to select search and sort criterias, but they should also be able to select the exact set of columns to return, they should be able to aggregate the data and freely choose aggregate function (e.g. AVG, SUM) and what columns to group by. For something as ultra-dynamic like this, there is really only one reasonable choice: build the dynamic query client-side and use the full powers of object-oriented programming. If you were to do this in a stored procedure, you would need umpteen parameters to describe the conditions, and only defining the interface would take a week – and it would still be a kludge. Because that is the straight-jacket of a stored procedure, be that T‑SQL or a CLR procedure, the parameter list.

You may think that you could do a hybrid, and build some parts of the query client-side and send for instance the WHERE clause as a parameter to a stored procedure. But this is the one thing you should never do. Either you build the query entirely client-side, or you build it in entirely in a stored procedure. If you mix, you create a strongly coupled dependency between client and stored procedure. In all software systems, you want components to be as loosely coupled as possible. Say that the DBA makes some changes to the data model: columns are renamed or moved to different tables. If you have a stored procedure accepts values for various search conditions, you only have to change the procedure; the client can remain affected and unaware of the change. But if you start to pass WHERE clauses, column lists and whatnots, you will have to change the client as well, and you have gained little with your stored procedure. In that case it's better to have it all in the client, fully aware of that the client code has be modified if the data model is changed.

I will not show any client-side examples of our order search in this article, since such an implementation should probably have an object-oriented interface with methods and properties quite different from the parameter interface of search_orders. Nevertheless, in the examples that follow, there are many points that are applicable also when you work client-side.

Permissions

With stored procedures using static SQL, you don't have to bother about permissions. As long as the procedure and the tables have the same owner, it is sufficient for the users to have rights to run the stored procedure because of a feature known as ownership chaining. But ownership chaining never applies to dynamic SQL. Even if you build the query string inside a stored procedure, the string is never part of the procedure itself, but constitutes its own owner-less scope.

If you build the query string client-side or in a CLR stored procedure, you will have no choice but to grant the users SELECT permissions on the tables and the views that appear in the query. This also includes as any list-to-table functions you may use.

Depending on your application, and how it is set up, granting SELECT permissions may be entirely uncontroversial or absolutely impermissible. Thankfully, in the case you build the query string in a T‑SQL procedure, there are two alternatives to arrange for the permissions:

  1. Create a certificate and sign the procedure with this certificate. Create a user from the certificate, and grant this user the required SELECT permissions.
  2. Add the clause EXECUTE AS 'someuser' to the procedure. This should be a user created WITHOUT LOGIN and which has been granted the required SELECT permissions.

I will not go into further details on these techniques here. Rather I refer you to my article Granting Permissions through Stored Procedures, where I discuss these techniques in detail.

Before you get any funny ideas, permit me to beat the dead horse of the hybrid solution a little more. You may think this sounds excellent, you could build the WHERE clause client-side and send it to a stored procedure which handles permissions problem for you. But this is again an awfully bad idea. You expose a stored procedure that accepts a WHERE clause. How do you in the stored procedure verify that this WHERE clause does not violate any security rules about what the user should see or not?

Implementing search_orders with a Parameterised Query

After these proceedings, it's time to look at search_orders_1 which is a T‑SQL procedure that builds a parameterised SQL statement. Because of its length, I have numbered the rows in the right margin:

CREATE PROCEDURE search_orders_1                                   --  1
                 @orderid     int          = NULL,                 --  2
                 @fromdate    datetime     = NULL,                 --  3
                 @todate      datetime     = NULL,                 --  4
                 @minprice    money        = NULL,                 --  5
                 @maxprice    money        = NULL,                 --  6
                 @custid      nchar(5)     = NULL,                 --  7
                 @custname    nvarchar(40) = NULL,                 --  8
                 @city        nvarchar(15) = NULL,                 --  9
                 @region      nvarchar(15) = NULL,                 -- 10
                 @country     nvarchar(15) = NULL,                 -- 11
                 @prodid      int          = NULL,                 -- 12
                 @prodname    nvarchar(40) = NULL,                 -- 13
                 @employeestr varchar(MAX) = NULL,                 -- 14
                 @employeetbl intlist_tbltype READONLY,            -- 15
                 @debug       bit          = 0 AS                  -- 16
                                                                   -- 17
DECLARE @sql        nvarchar(MAX),                                 -- 18
        @paramlist  nvarchar(4000),                                -- 19
        @nl         char(2) = char(13) + char(10)                  -- 20
                                                                   -- 21
SELECT @sql =                                                      -- 22
    'SELECT o.OrderID, o.OrderDate, od.UnitPrice, od.Quantity,     -- 23
            c.CustomerID, c.CompanyName, c.Address, c.City,        -- 24
            c.Region,  c.PostalCode, c.Country, c.Phone,           -- 25
            p.ProductID, p.ProductName, p.UnitsInStock,            -- 26
            p.UnitsOnOrder, o.EmployeeID                           -- 27
     FROM   dbo.Orders o                                           -- 28
     JOIN   dbo.[Order Details] od ON o.OrderID = od.OrderID       -- 29
     JOIN   dbo.Customers c ON o.CustomerID = c.CustomerID         -- 30
     JOIN   dbo.Products p ON p.ProductID = od.ProductID           -- 31
     WHERE  1 = 1' + @nl                                           -- 32
                                                                   -- 33
IF @orderid IS NOT NULL                                            -- 34
   SELECT @sql += ' AND o.OrderID = @orderid' +                    -- 35
                  ' AND od.OrderID = @orderid' + @nl               -- 36
                                                                   -- 37
IF @fromdate IS NOT NULL                                           -- 38
   SELECT @sql += ' AND o.OrderDate >= @fromdate' + @nl            -- 39
                                                                   -- 40
IF @todate IS NOT NULL                                             -- 41
   SELECT @sql += ' AND o.OrderDate <= @todate'  + @nl             -- 42
                                                                   -- 43
IF @minprice IS NOT NULL                                           -- 44
   SELECT @sql += ' AND od.UnitPrice >= @minprice'  + @nl          -- 45
                                                                   -- 46
IF @maxprice IS NOT NULL                                           -- 47
   SELECT @sql += ' AND od.UnitPrice <= @maxprice'  + @nl          -- 48
                                                                   -- 49
IF @custid IS NOT NULL                                             -- 50
   SELECT @sql += ' AND o.CustomerID = @custid' +                  -- 51
                  ' AND c.CustomerID = @custid' + @nl              -- 52
                                                                   -- 53
IF @custname IS NOT NULL                                           -- 54
   SELECT @sql += ' AND c.CompanyName LIKE @custname + ''%''' + @nl -- 55
                                                                   -- 56
IF @city IS NOT NULL                                               -- 57
   SELECT @sql += ' AND c.City = @city' + @nl                      -- 58
                                                                   -- 59
IF @region IS NOT NULL                                             -- 60
   SELECT @sql += ' AND c.Region = @region' + @nl                  -- 61
                                                                   -- 62
IF @country IS NOT NULL                                            -- 63
   SELECT @sql += ' AND c.Country = @country' + @nl                -- 64
                                                                   -- 65
IF @prodid IS NOT NULL                                             -- 66
   SELECT @sql += ' AND od.ProductID = @prodid' +                  -- 67
                  ' AND p.ProductID = @prodid' + @nl               -- 68
                                                                   -- 69
IF @prodname IS NOT NULL                                            --70
   SELECT @sql += ' AND p.ProductName LIKE @prodname + ''%''' + @nl-- 71
                                                                   -- 72
IF @employeestr IS NOT NULL                                        -- 73
   SELECT @sql += ' AND o.EmployeeID IN' +                         -- 74
                  ' (SELECT number FROM intlist_to_tbl(@employeestr))' + @nl
                                                                   -- 76
IF EXISTS (SELECT * FROM @employeetbl)                             -- 77
   SELECT @sql += ' AND o.EmployeeID IN (SELECT val FROM @employeetbl)' + @nl
                                                                   -- 79
SELECT @sql += ' ORDER BY o.OrderID' + @nl                         -- 80
                                                                   -- 81
IF @debug = 1                                                      -- 82
   PRINT @sql                                                      -- 83
                                                                   -- 84
SELECT @paramlist = '@orderid     int,                             -- 85
                     @fromdate    datetime,                        -- 86
                     @todate      datetime,                        -- 87
                     @minprice    money,                           -- 88
                     @maxprice    money,                           -- 89
                     @custid      nchar(5),                        -- 90
                     @custname    nvarchar(40),                    -- 91
                     @city        nvarchar(15),                    -- 92
                     @region      nvarchar(15),                    -- 93
                     @country     nvarchar(15),                    -- 94
                     @prodid      int,                             -- 95
                     @prodname    nvarchar(40),                    -- 96
                     @employeestr varchar(MAX),                    -- 97
                     @employeetbl intlist_tbltype READONLY'        -- 98
                                                                   -- 99
EXEC sp_executesql @sql, @paramlist,                               -- 100
                   @orderid, @fromdate, @todate, @minprice,        -- 101
                   @maxprice,  @custid, @custname, @city, @region, -- 102
                   @country, @prodid, @prodname, @employeestr, @employeetbl

Building the Query

On line 18, I declare the variable @sql which will hold my query string. The type should always be nvarchar(MAX). It should be MAX, so that you can fit any query in the variable. As for why it needs to be nvarchar, I will return to that. I'm ignoring the variable @paramlist for now. On line 20, I define the variable @nl which I set to the standard line-ending in Windows, CR-LF. While technically a variable, @nl is a constant in this procedure.

On lines 22-32, I compose the nucleus of the dynamic SQL query. That is, the query we will get when all input parameters are left out. Observe that I use two-part notation for the tables. You should always do this when you work with dynamic SQL for performance reasons. Exactly why, I will return to when I discuss caching and performance.

The condition WHERE 1 = 1 on line 32 is there so that all other conditions can be added as "AND something". I add @nl to the string, for reasons that will prevail.

On lines 34-72, I check all the single-valued search parameters. If a parameter is non-NULL, I add a condition for the corresponding column to the SQL string. I use the += operator which is a shortcut for @sql = @sql +. Note that if I want to include a quote in the query string, I need to double it, see for instance line 71. Again, I concatenate @nl after all conditions.

On lines 73-75, I handle the @employeestr parameter in the same manner as I did in search_orders_3. That is, I use the function intlist_to_tbl to crack the list into table format. You may have other ideas, but this is the way should do it as a first choice. As for alternatives, I will return to that later.

On lines 77-79, I handle the parameter @employeetbl, and this is perfectly straightforward. The only deviation is that I use EXISTS to see whether this parameter was specified rather than checking for NULL.

Finally, on line 80 I add the ORDER BY clause.

You may note that for parameters that match columns that appear in multiple tables (@orderid, @custid, @prodid), I add conditions against both tables. This something I learnt very early in my SQL Server career that you should do to help the optimizer. It mattered more in those days when SQL Server did not have foreign-key constraints. Today, with proper foreign-key constraints (and which have not been applied with NOCHECK), the optimizer can move around the condition even if you apply it on the "wrong" table. But I added it nevertheless, and it's not a bad habit, as you may have some denormalised database design with redundant columns without a foreign-key relation between them. Nevertheless, I did not do this in the procedures with static SQL. I felt that in those procedures, the extra conditions add extra noise, but this is a less of an issue in code that builds a query string.

The Debug PRINT

On line 82-83, I do something very important: if the parameter @debug is 1, I print the SQL string. This is one more of these things that cannot be enough emphasised: always include a parameter that permits you to print the SQL string. One of the distinct disadvantages with dynamic SQL is that you can happen to concatenate the strings incorrectly leading to syntax errors, maybe only with some combination of input parameters. Finding this error by looking at the code that generates the dynamic SQL is hard, but once you see the SQL string, the error may be immediately apparent. For instance, a typical error is a missing space, leading to code that reads:

WHERE  1 = 1 AND o.OrderDate <= @todateAND p.ProductName LIKE @xprodname

If you look closely in the procedure code, I have already take precautions to avoid such goofs by having a leading space in most string literals, and also by adding the @nl parameter to each condition. However, that is not the main purpose of @nl. The reason I add @nl is to avoid that the query string becomes one single long line which is very difficult to read when I have to look at it. Thus, not only should you keep the code that generates the SQL string tidy, you should also produce an SQL string that is reasonably tidy.

If the query string is very long, it may be appear to be truncated in SQL Server Management Studio because it only prints the first 4000 characters. A workaround is to do:

IF @debug = 1
   SELECT @sql FOR XML PATH(''), TYPE

By the default, SSMS displays the first 2 MB of a returned XML document, which hopefully should be sufficient. You will have to accept that characters that are special to XML, like < are replaced by sequences such as &lt;.

Running the Query

On lines 84 to 97, I set up the parameter list, and on lines 99 to 102 I execute the query using sp_executesql. It is extremely important to understand this system procedure. This is the vehicle to run parameterised queries.

sp_executesql is a system procedure that takes two fixed parameters. The first parameter is a batch of SQL statements and the second parameter is a parameter list. These parameters must be of the type nvarchar; you cannot pass varchar. (And thus I have declared the parameters @sql and @paramlist accordingly.) The remaining parameters are the actual parameters passed to the parameters in the parameter list. They must match the data types in the parameter list in the same way as when you call a regular stored procedure. That is, implicit conversion applies.

The system procedure executes the batch using the parameters values you pass it. If this sounds a little abstract, you can think of sp_executesql this way:

CREATE PROCEDURE sp <parameter list> AS
  <SQL batch>
go
EXEC sp <actual parameter values>

That is, you create a nameless stored procedure and execute it in one go.

In search_orders_1, the parameter names in the query string are the same as in the surrounding parameters in the stored procedure. But don't be lured. They are physically separate and I could have used completely different names and the procedure would still have worked. The key is that the dynamic SQL cannot see any variables in the calling procedure; the dynamic SQL is a procedure of its own.

You may note that the parameter list in @paramlist is static. That is, the set of parameters is always the same, despite that some of them may not appear in the actual query string. While it would be possible to extend the parameter list as new non-NULL search parameters are encountered, that would only be unnecessarily complicated. There is no law in T‑SQL against having unused parameters.

For more information and examples with sp_executesql, please see the section on sp_executesql in my article The Curse and Blessings of Dynamic SQL.

Trying it out

Here are some test cases:

EXEC search_orders_1 @orderid = 11000
EXEC search_orders_1 @custid  = 'ALFKI'
EXEC search_orders_1 @prodid  = 76
EXEC search_orders_1 @prodid  = 76, @custid = 'RATTC'
EXEC search_orders_1 @fromdate = '19980205', @todate = '19980209'
EXEC search_orders_1 @city = 'Bräcke', @prodid = 76
EXEC search_orders_1 @employeestr = '402,109,207', @custid = 'PERTH'

DECLARE @tbl intlist_tbltype 
INSERT @tbl (val) VALUES(402), (109), (207)
EXEC search_orders_1 @employeetbl = @tbl, @custid = 'PERTH'

If you try these and inspect the query plans, you will see that the available indexes on the search columns are used with two exceptions: The index on Orders.EmployeeID is ignored, and on SQL 2008 and 2012, the index on Customers.City is not used. However, SQL 2014 uses this index. If you compare with the plans for search_orders_3, you will see that these are identical, except for the search on @custid alone.

I also encourage you to run the procedures with @debug = 1 to see the generated SQL.

Compilation and Caching

You have learnt that sp_executesql defines a nameless stored procedure and executes it directly. You may ask if this procedure is saved in the database. No, it is not. But, and this is the key, the query plan is saved in the cache. So the next time a user runs your search procedure with exactly the same set of search parameters, SQL Server will reuse the existing plan for that query.

That is, when you use static SQL, and there are these three calls.

EXEC search_orders_3 @orderid = 11000
EXEC search_orders_3 @orderid = 11001
EXEC search_orders_3 @orderid = 11002

There are also three compilations, because nothing is cached and recompilation is forced every time. Whereas with:

EXEC search_orders_1 @orderid = 11000
EXEC search_orders_1 @orderid = 11001
EXEC search_orders_1 @orderid = 11002

There is a compilation the first time, and on the second and third calls, the existing plan is reused. But add one more parameter:

EXEC search_orders_3 @orderid = 11000, @prodid = 76

And this will be a new compilation and a new cache entry. The existing plan for the search on OrderID alone is unaffected.

There is one qualification to make here, and it is one we have already touched: the table names must be specified in two-part notation, that is, with schema and table. If the query has FROM Orders without schema, SQL Server needs to consider the possibility that there may be an Orders table in the default schema of the user. And even if there isn't one right now, it might appear later. Therefore, SQL Server needs to have different cache entries for users with different default schemas. Now, normally users have dbo as their default schema, in which case this is a non-issue, but this is nothing you should rely on.

Special Search Conditions

For static SQL, we looked at how to implement some special search conditions. Let's review these for dynamic SQL as well.

Columns with Filtered Indexes

This was nothing I brought up with static SQL, because this is a non-issue with OPTION (RECOMPILE). But with cached plans, you need to be aware of this. Say that there is a Status column in the Orders table for which there are four possible values: N – New order, P – In process, E – Error, C – Processing completed. 99 % of the rows have the value C, and there is a filtered index on the Status column:

CREATE INDEX status_ix ON Orders(Status) WHERE Status <> 'C'

Say now that the search has a parameter @status where the user can search for orders with a specific status. If you simply mimic the code above, and do:

IF @status IS NOT NULL
   SELECT @sql += ' AND o.Status = @status'

SQL Server will not use the filtered index, no matter what value the user passes, because it has a produce a plan that works with all parameter values, including @status = 'C'. You need to add some logic to add an extra condition so that the the filtered index can be used:

IF @status IS NOT NULL
   SELECT @sql += ' AND o.Status = @status' +
                  CASE WHEN @status <> 'C' 
                       THEN ' AND o.Status <> ''C'''
                       ELSE ''
                  END

Here I assumed for simplicity that @status is a single-valued parameter, but it seems likely that in a real-world application @status would be multi-valued; that is, the user is able to select a number of status values he is interested in. You would need to do the same checks and add extra conditions in the case the user's choice falls entirely within the filtered index (and you want it to be used).

Choice of Sort Order

It may seem that this could be done as simple as:

@sql += ' ORDER BY ' + @sortcol

This could easily handle multiple sort columns. That is, the client could pass a value like 'CustomerID, OrderID DESC'. This breaks the principle that the client should know nothing about the query. However, it is a little difficult to argue this point very strongly here. The syntax of the string is not very difficult. (And the assumption is that the application builds the string from users choices in a UI; the user does not write the call himself.) At first glance, some readers may think that the string above will result in an error with ambiguous column names. But since ORDER BY is evaluated after SELECT in an SQL query, you can use the column names defined in the SELECT list in the ORDER BY clause, and thus CustomerID and OrderID without prefix are legit here. Another argument against the client forming part of the query is that the data model may change. But these are columns in the result set which the client needs to have knowledge of anyway to be able bind them to columns in an output grid or similar. That is, if the DBA would decide that OrderID should now be order_id, you would have to change the query to read:

SELECT o.order_id AS OrderID, ...

to avoid breaking the client.

There is however the issue of SQL injection. Maybe it is a web application. Maybe the sort parameters are passed in a URL. Maybe the web application connects as sa. (Bad! Bad! Bad!). And then some user out on the evil Internet passes '1 SHUTDOWN WITH NOWAIT'. That is, this use of @sortcol opens for SQL injection. You can easily prevent this with quotename():

@sql += ' ORDER BY ' + quotename(@sortcol)

quotename adds brackets around the value, doubling any right brackets there may be in it, so that you still have a legal quoted identifier. But now you can no longer easily accept multi-column sort conditions or ASC/DESC in a single parameter. You would have to parse @sortcol to put in quotename where it's needed. Which is not trivial at all.

So, after all, I think it is much better to use CASE to map the input value to a sort column:

SELECT @sql += ' ORDER BY ' + 
               CASE @sortcol WHEN 'OrderID'      THEN 'o.OrderID'
                             WHEN 'EmplyoeeID'   THEN 'o.EmployeeID'
                             WHEN 'ProductID'    THEN 'od.ProductID'
                             WHEN 'CustomerName' THEN 'c.CompanyName'
                             WHEN 'ProductName'  THEN 'p.ProductName'
                             ELSE 'o.OrderID'
               END + CASE @isdesc WHEN 0 THEN ' ASC' ELSE ' DESC' END

This is reminiscent of how we did in search_orders_3, but it is still a lot simpler, since we don't need to have different CASE expressions for different data types, but we can use a single one. And it is also easy to handle a parameter to control ASC/DESC. If you want multiple sort columns, you will need to repeat the above, but it grows linearly and does not explode like it does for static SQL.

Observe that in the CASE for @sortcol I have an ELSE with a default sort order. Without the ELSE, the entire query string would become NULL if the application passes an unexpected value in @sortcol.

Optional Tables

The example with the @suppl_country parameter is of course trivial to handle with dynamic SQL:

IF @suppl_country IS NULL 
   SELECT @sql += ' AND EXISTS (SELECT *
                                FROM   Suppliers s
                                WHERE  s.SupplierID = p.SupplierID
                                  AND  s.Country    = @suppl_country)'

It is just another condition to add.

Alternate Tables

This is also something which is quite simple to handle with dynamic SQL. Say that we have this parameter @ishistoric. In this case, lines 27-28 in search_orders_1 would read:

FROM dbo.' + CASE @ishistoric
                  WHEN 0 THEN 'Orders'
                  WHEN 1 THEN 'HistoricOrders'
             END + ' o
JOIN dbo.' + CASE @ishistoric
                  WHEN 0 THEN '[Order Details]'
                  WHEN 1 THEN 'HistoricOrderDetails'
             END + ' od

You may get the idea that you should pass the table name from the application, but while it is almost acceptable for the sort column, it is entirely impermissible for the table names. The DBA should be free to rename a table without affecting an application that only uses stored procedures. And what if the application passes table name with schema and you wrap the full name in quotename()?

No, always map the application input to a table name through CASE.

Using the CLR

Let's now look at how implement search_orders in a CLR stored procedure. I've written two CLR procedures, search_orders_vb and search_orders_cs, that I will discuss in this section. As the code is fairly repetitive, I'm not including any of them in full here, but I only highlight some important points. (These points are also generally applicable for the data-access part of a client-side only implementation.)

Be aware that I will not go into any details on writing CLR stored procedures as such. If you have never worked with the CLR before, but are curious, I refer you to Books Online. At the end of this section there are instructions on how to create these two procedures in SQL Server.

The Parameter List

There is one deviation in the parameter list: the parameter @employeetbl is not there, since you cannot pass table-valued parameters to a CLR stored procedure. Thus, for CLR stored procedures, you will need to pass multi-valued parameters as a comma-separated string (or some other format like XML).

Setting up the Statement

This is how search_orders_cs starts off:

string Query;
SqlCommand Command = new SqlCommand();

Query = @"SELECT o.OrderID, o.OrderDate, od.UnitPrice, od.Quantity,
                 c.CustomerID, c.CompanyName, c.Address, c.City,
                 c.Region, c.PostalCode, c.Country, c.Phone,
                 p.ProductID, p.ProductName, p.UnitsInStock,
                 p.UnitsOnOrder
          FROM   dbo.Orders o
          JOIN   dbo.[Order Details] od ON o.OrderID = od.OrderID
          JOIN   dbo.Customers c ON o.CustomerID = c.CustomerID
          JOIN   dbo.Products p ON p.ProductID = od.ProductID
          WHERE  1 = 1 ";

As you can see this is very similar to search_orders_1, including the dbo prefix. The rule that you should use two-part notation to maximise query-plan reuse applies to CLR procedures as well.

Defining the Parameters

So this should be plain vanilla for anyone who has written the teeniest piece of data-access code with ADO .NET. Except that I have seen so many examples on forums where people inline the parameter values into the query string. And it is equally impermissible no matter you build the query in T‑SQL or some other language. Your queries should always be parameterised. (Except for the very few cases inlining may be needed for performance.)

This is however not the place to give a full coverage of the SqlParameter class in .NET. If you have never heard of this class before (and you call yourself a .NET programmer), you will have to look at the examples and then study the class further in MSDN Library.

Here is how the @custid parameter is added in search_orders_cs:

if (! Custid.IsNull) {
   Query += " AND o.CustomerID = @custid" +
            " AND c.CustomerID = @custid";
   Command.Parameters.Add("@custid", SqlDbType.NChar, 5);
   Command.Parameters["@custid"].Value = Custid;
}

As in the T‑SQL example, the query string is extended with the conditions for the parameter in both Orders and Customers.

What is different from T‑SQL is how we define the parameter list and supply the value. In T‑SQL the parameter list is a string, which includes all possible parameters. When working with the CLR, we only define the parameters that actually are in use. The reason for this difference is entirely a matter of convenience. We define a parameter by adding it to the Parameters collection of the Command object. The Add method has a number of overloads, but I prefer to use the ones that takes the parameter name and a type indicator from the SqlDbType enumeration. For parameters of the variable-length data types – char, varchar, nchar, nvarchar, binary and varbinary – I use an overload where I can also specify the length of the parameter. There is no means to pass precision or scale – used with the data types decimal, numeric, time, datetime2 and datetimeoffset – in the Add method, but the SqlParameter class has Precision and Scale properties which permits you to set these values after you have created the parameter.

Once the parameter is defined, I assign the value separately, although you could do all on a single line if you feel like:

Command.Parameters.Add("@custid", SqlDbType.NChar, 5).Value = Custid;

Here is the above in Visual Basic .NET:

If Not Custid.IsNull Then
   Query &= " AND o.CustomerID = @custid" & _
            " AND c.CustomerID = @custid" & VbCrLf
   Command.Parameters.Add("@custid", SqlDbType.NChar, 5)
   Command.Parameters("@custid").Value = Custid
End If

It's very similar to the C# example. Different operator for string concatenation, parentheses to address elements in the collection and no semicolons.

Don't Forget to Specify the Length!

There is one thing about the parameter definition, I like to highlight:

Command.Parameters.Add("@custname", SqlDbType.NVarChar, 40);

I explicitly specify the length of the string parameter. ADO .NET permits you leave out the length when you add the parameter. ADO .NET also supplies the method AddWithValue that permits you to define a parameter and provide the value in a single call whereupon ADO .NET guesses the data type. Do not fall into the trap of using these shortcuts! The reason these alternatives are bad is that when ADO .NET constructs the call to sp_executesql for you, it will use the length of the actual parameter value when it builds the parameter list. Thus, if one user enters Alfred, the parameter will be declared as:

@custname nvarchar(6)

But if another user enters Berglund, the parameter will be declared as

@custname nvarchar(8)

When SQL Server looks up a query in the cache, it hashes the query text and the parameter list and performs a lookup on that hash value. That is, differences in the parameter list will result in different cache entries and more compilations. Note here the difference to OPTION (RECOMPILE). With the latter you get compilation every time, but the plans don't take up space in the cache. This happens here leading to cache bloat, which under extreme circumstances can lead to degraded overall performance on the SQL Server instance.

If you feel that you don't want to hardcode the length of the column in case it could change in the future, rather than leaving out the length, use the maximum length for the type, that is 8000 for char, varchar, binary and varbinary and 4000 for nchar and nvarchar.

Handling the Multi-Valued Parameter

I will have to admit that I was lazy and used the list-to-table function:

    if (! Employeestr.IsNull) {
       Query +=  " AND o.EmployeeID IN" +
                 " (SELECT number FROM intlist_to_tbl(@employeestr))";
       Command.Parameters.Add("@employeestr", SqlDbType.NVarChar, -1);
       Command.Parameters["@employeestr"].Value = Employeestr;
    }

While this works, the optimizer has no information of how many elements the list will return and it will guess 1. It would be a little prettier to use the CSV_splitter class that I present in my article Arrays and Lists in SQL Server 2008 to pass the value to a table-valued parameter but I leave that as an exercise to the reader. That would permit SQL Server to have knowledge of the number of elements when it builds the plan. As long as the lists are typically short, this is not likely to be any major issue.

Running the Query

Here is the code to run the query in C#:

using (SqlConnection Connection =
   new SqlConnection("context connection=true"))
{
  Connection.Open();

  if (Debug) {
     SqlContext.Pipe.Send(Query);
  }

  Command.CommandType = CommandType.Text;
  Command.CommandText = Query;
  Command.Connection  = Connection;
  SqlContext.Pipe.ExecuteAndSend(Command);
}

This is very much is the standard way to run a query from a CLR procedure. You connect on the context connection, that is, the same connection you are already running on. SqlContext.Pipe.ExecuteAndSend runs the command and returns the result set to the client. SqlContext.Pipe.Send is how you do PRINT from a CLR procedure.

Loading the Examples

If you have any flavour of Visual Studio 2005 or later (including the Express editions), you can deploy search_orders_cs and search_orders_vb from Visual Studio. (But please don't ask me how to do it, Visual Studio just leaves me in a maze.)

Since the .NET Framework comes with Windows and includes compilers for the most common .NET languages, you can also load them without Visual Studio. First make sure that C:\WINDOWS\Microsoft.NET\Framework\v2.0.50727 (or corresponding) is in your path. Then run from a command-line window:

csc /target:library search_orders_cs.cs
vbc /target:library search_orders_vb.vb

To load the DLLs into SQL Server, you can use load_clr_sp.sql. You will have to change the path in the CREATE ASSEMBLY command to where you placed the DLL. Note that the paths are as seen from SQL Server, so if you don't have SQL Server on your local machine, you will have to copy the DLLs to the SQL Server box, or specify a UNC path to your machine.

Trying it out

The same test cases as for search_orders_1:

EXEC search_orders_cs @orderid = 11000
EXEC search_orders_cs @custid  = 'ALFKI'
EXEC search_orders_cs @prodid  = 76
EXEC search_orders_cs @prodid  = 76, @custid = 'RATTC'
EXEC search_orders_cs @fromdate = '19980205', @todate = '19980209'
EXEC search_orders_cs @city = N'Bräcke', @prodid = 76
EXEC search_orders_cs @employeestr = '402,109,207', @custid = 'PERTH'

The query plans should be identical to search_orders_1, as the queries are the same. Hint: if you want to look at the query plans, you need to use Profiler, and get the Showplan XML event. For some reason, you don't see query plans for queries submitted from CLR code in Management Studio.

An Example with Unparameterised SQL

We will now turn to an example that does the forbidden: it inlines all parameters and does not use parameterised SQL. One reason I show you this is example is pure deterrence: to make you see how much more difficult this is than parameterised SQL. But there are also situations when inlining an occasional parameter may be a good idea to resolve a performance problem. Thus, it may be good to know the technique if you need to do this, but I like to stress that this is not a step that you should take lightly.

The name of the procedure for this is example is search_orders_2. I don't include the procedure in full here, but only comment on certain parts.

General Notes

When building a non-parameterised query with the values inlined, you need to be very disciplined when you write your code. It's very easy to get lost in a maze of nested quotes. I often see people in SQL forms posting code like:

EXEC('SELECT col1, col2, ...
      FROM   ...
      WHERE  ' + CASE @par1 IS NULL THEN ' + col = ''' + @par  + ''' + ...)

This is difficult to read and maintain, and if it goes wrong, you have no idea what SQL you are actually generating. search_orders_2 aggregates the SQL code into a variable, and there is a @debug parameter so I can see the generate SQL code if needed. And to make the debug output easier to read I use the variable @nl like I did in search_orders_1.

To run the code I use EXEC(), but it works equally well with sp_executesql without any parameter list. Again, keep in mind that the dynamic SQL is a separate scope and cannot refer to variables in the surrounding procedure.

Handling String Values

To understand the problem, we will start with looking at a parameter in search_orders_2 which is unique to this procedure: @city_bad, which is bad implementation of the @city parameter that reflects what the naïve and inexperienced user may try:

SELECT @sql += @sql AND c.City = ''' + @city_bad + ''''

This has a number of problems. One is the missing N for nvarchar. Try this:

EXEC search_orders_2 @city_bad = N'Łódź'
EXEC search_orders_2 @city_bad = N'Luleå'

While there are orders from customers in both Łódź (a Polish city) and Luleå (in Sweden), at least one of the procedure calls will come back empty-handed, exactly which depends on your collation. String literals preceded by N are nvarchar, and those without are varchar and can only hold characters for the code page of the collation and other characters are replaced by fallback characters. You can see this by running this SELECT:

SELECT 'Łódź', 'Luleå'

Again, exactly what you see depends on your collation, but you may see Lódz, Lodz, Lulea, or even have question marks in some places.

That is, however, the small problem. Assume instead that we want to search for customers in the capital of Chad, N'Djamena:

EXEC search_orders_2 @city_bad = N'N''Djamena'

This ends with syntax errors:

Msg 102, Level 15, State 1, Line 11

Incorrect syntax near 'Djamena'.

Msg 105, Level 15, State 1, Line 11

Unclosed quotation mark after the character string ' ORDER BY o.OrderID'.

Not what you call a good user experience. And there is also, as I said a few times, the issue of SQL injection. Try this:

EXEC search_orders_2 @city_bad = '''SHUTDOWN --'

As it happens this will not shut down you server, but you will get the informational message The SHUTDOWN statement cannot be executed within a transaction or by a stored procedure. But the important point is: you were able to inject SQL code where there should have been a city name.

Let us now look at the proper way to inline a string parameter. We have already mentioned the function quotename(), but it is time to give it a closer look. This built-in function delimits a string with the delimiter(s) you specify, and, this is the important part: if the string includes the closing delimiter, this character is doubled. The default delimiter is [], and the main purpose of quotename is to quote identifiers when you generate queries from metadata, but you can use other delimiters as seen in these examples:

SELECT quotename('Order Details')                     -- => [Order Details]
SELECT quotename('Two o' + char(39) + 'clock', '''')  -- => 'Two o''clock'
SELECT quotename('abc{}def', '{')                     -- => {abc{}}def}

And using the second of these examples, we can now see how the @city parameter should be implemented:

IF @city IS NOT NULL
   SELECT @sql += ' AND c.City = N' + quotename(@city, '''') + @nl

Here I have added the N to mark a Unicode literal and there is also @nl to make the debug output prettier. And I use quotename with single quote(') as the delimiter. If you try these searches:

EXEC search_orders_2 @city = N'Łódź'
EXEC search_orders_2 @city = N'N''Djamena'
EXEC search_orders_2 @city = '''SHUTDOWN --'

You will find that the first two return lots of rows. The last does not return any rows – but nor does it attempt to shut down your SQL Server instance. (Credit for the idea of using quotename() this way goes my MVP colleague Steve Kass.)

You should always use quotename when you inline string parameters in your dynamic searches. There is a caveat, though: quotename is designed for quoting object names, and the input parameter is nvarchar(128). If the input value is longer, you get NULL back. However, I find it extremely unlikely that you can find a good case where it makes sense to inline a string parameter that long. (For a dynamic search, I should hasten to add. There are other situations you may need it, and in such case you can use my function quotestring() from my Curse and Blessings article.)

No, we have not said all there is to say about inlining string parameters yet. Let's return to that N. You should include that N only when the type of the parameter is nvarchar or nchar. It should not be there if you are comparing the value with a varchar or char column. Because of the rules for data-type precedence in SQL Server, the (var)char column will be implicitly converted to n(var)char and this affects how any index on the column can be used. If the column has a Windows collation, the index can still be used but in a slower way. If the column has an SQL collation, the index is of no use at all, why that N can cause serious performance problems when applied incorrectly.

So as you see, inlining string parameters is anything but easy.

Datetime Parameters

Compared to string values, date/time values are easier to deal with, but only in a relative way. People often go wrong with these as well. In search_orders_2, I handle the @fromdate parameter this way:

SELECT @fromdatestr = convert(char(23), @fromdate, 126)
...
IF @fromdate IS NOT NULL
   SELECT @sql += ' AND o.OrderDate >= ' + quotename(@fromdatestr, '''') + @nl

The whole key is the format code 126. The results in a string like this one 2003-04-06T21:14:26.627. This is the format mandated by the standard ISO 8601, and it is commonly used in XML. More importantly, it is one of the three formats for datetime literals in SQL Server of which the interpretation does not depend on the settings for date format and language. If you don't understand what I'm talking about, try these statements:

SET DATEFORMAT mdy
SELECT convert(datetime, '02/07/09')
SET DATEFORMAT dmy
SELECT convert(datetime, '02/07/09')
go
SELECT convert(datetime, '2002-12-19')   -- Fails!
go
SET LANGUAGE Swedish
SELECT convert(datetime, 'Oct 12 2003')  -- Fails! (It's "Okt" in Swedish.)

If you always use code 126 to produce a datetime string for your dynamic SQL, you don't have to worry about this. (When it comes to the newer date/time data types, there are a few more safe formats, but there is still a lot of datetime out there, so stick with code 126.) As for using an extra variable, this is mainly a matter keeping the code that produces the dynamic SQL clean.

Numeric Parameters

Numeric values are the easiest to inline, as you simply can apply convert:

IF @orderid IS NOT NULL
   SELECT @sql += ' AND o.OrderID = ' + convert(varchar(10), @orderid) +
                  ' AND od.OrderID = ' + convert(varchar(10), @orderid) + @nl

Although, you have to be careful to make the string long enough to fit all possible values.

For some numeric parameters in the procedure, I use an intermediate string variable:

SELECT @minpricestr = convert(varchar(25), @minprice)
...
IF @minprice IS NOT NULL
   SELECT @sql += ' AND od.UnitPrice >= ' + @minpricestr + @nl

This is purely a matter of aesthetics.

If you get the idea to inline float or real values, you will need to be more careful, since with a casual use of convert you can lose precision. But it's a little difficult to see a case where you would need to do this in the first place.

Multi-valued Parameters

Let's now look at how the parameters @employeestr and @employeetbl should be handled. @employeestr is supposed to be a comma-separated string, so it sounds dirt simple:

IF @employeestr IS NOT NULL
   SELECT @sql += ' AND o.EmployeeID IN (' + @employeestr + ')' + @nl

But this would be bad mistake, as this would open the code for SQL injection. And while it is kind of difficult to make a serious intrusion in the 15-character long @city_bad, the data type of @employeestr is varchar(MAX), so there is all the space in the world for an intruder. But I pass my comma-separated list from SSRS, how could there be SQL injection? Answer: when you write your stored procedure, you should assume the worst. That is, a web application that runs as sa and sends data from a URL right into your parameter. Don't laugh, it happens. When you write your stored procedure, you should make your procedure bullet-proof and make no assumptions.

So here is how I handle the @employeestr parameter:

IF @employeestr IS NOT NULL
BEGIN
   SELECT @employeestr =
          (SELECT ltrim(str(number)) + ','
           FROM intlist_to_tbl(@employeestr)
           FOR XML PATH(''))
END

SELECT @employeestr = substring(@employeestr, 1, len(@employeestr) - 1)
...
IF @employeestr IS NOT NULL
   SELECT @sql += ' AND o.EmployeeID IN (' + @employeestr + ')' + @nl

That is, I unpack the string into table format with that list-to-table function. Then I produce a new comma-separated list using the FOR XML PATH syntax. The intended purpose of FOR XML PATH is to generate an XML document, but Microsoft designed it so that if you don't give any element name (that's what the empty string after PATH means), you get a concatenated list of the values from a single-column query. This results in a list with a trailing comma, and I delete this comma in the subsequent SELECT. And then I use my local variable that I can fully trust with IN as in the original example. If @employeestr does not contain the comma-separated list as expected, it is likely that this will result in a conversion error inside intlist_to_tbl. Not very pretty, but it keeps the illegitimate input out.

The same solution is used for the parameter @employeetbl (which is not directly accessible from the SQL string):

IF EXISTS (SELECT * FROM @employeetbl)
BEGIN
   SELECT @employeestr =
          (SELECT ltrim(str(val)) + ','
           FROM   @employeetbl
           FOR XML PATH(''))
END

Effects on Caching

I said previously that an advantage of using dynamic SQL over static SQL with OPTION (RECOMPILE) is that it reduces the amount of compilation, since the plan is cached. But this applies to parameterised SQL only. These two calls:

EXEC search_orders_1 @orderid = 11000
EXEC search_orders_1 @orderid = 21000

Generates one query plan and one compilation. That is, the second call reuses the plan compiled by the first call. Whereas:

EXEC search_orders_2 @orderid = 11000
EXEC search_orders_2 @orderid = 21000

Generates two query plans and two compilations. This is because when SQL Server caches the plan for a query string it hashes the string, exactly as it stands, so the smallest difference produces a different hash value, why the plan for o.OrderID = 11000 cannot be reused when the query instead reads o.OrderID = 21000. The only time a plan can be reused is when someone makes a search on exactly the same parameter values.

That is, if you inline all parameters, the behaviour is quite similar to static SQL with OPTION (RECOMPILE). Except that you also litter the cache with all those plans that are rarely reused. That's a serious waste of memory. Although, this is a statement that requires some qualification, because there are some settings that changes this behaviour.

Conclusion

I have now shown you a bad example, search_orders_2, which inlines all parameters and I hope you have gathered why this is a really poor choice as a general solution. But as we shall see in the next section, there are situations where may need to do this for a specific parameter.

When Caching Is Not What You Want

An advantage with parameterised dynamic SQL is that you get less compilation because plans can be reused. But caching is not always to your advantage. Earlier on in the article, we had this pair:

EXEC search_orders_3 @custid = 'ERNTC',
                     @fromdate = '19980218', @todate = '19980218'
EXEC search_orders_3 @custid = 'BOLSR',
                     @fromdate = '19960101', @todate = '19961231'

The choice of parameters is the same, but the profile is very different. ERNTC is the most active customer in Northgale with 591 orders in total, but the date interval is only one day. BOLSR, on the other hand, has only placed a single order, but the interval is the full year. There are indexes on both CustomerID and OrderDate. For the first call, the index on OrderDate looks like the best bet, whereas the index on CustomerID sounds more appealing for the second call. And, indeed, if you look at the query plans, this is what we get. (Sort of; the call for ERNTC uses both indexes and joins these.)

But what happens if we use search_orders_1 instead?

EXEC search_orders_1 @custid = 'ERNTC',
                     @fromdate = '19980218', @todate = '19980218'
EXEC search_orders_1 @custid = 'BOLSR',
                     @fromdate = '19960101', @todate = '19961231'

The plan is the same for both. When I tested, there was a yellow triangle on a hash match operator in the execution for BOLSR, indicating that the hash table spilled to disk (which certainly is not good for performance).

In the following, I will discuss some tactics you can use when you work a search that uses dynamic SQL and where the choice of plans is sensitive to the actual values.

OPTION (RECOMPILE)

For this particular case, OPTION (RECOMPILE) could be a simple way out. If the search has both @custid, @fromdate, and @todate, add OPTION (RECOMPILE):

IF @custid IS NOT NULL AND @fromdate IS NOT NULL @todate IS NOT NULL
   SELECT @sql += ' OPTION(RECOMPILE)' + @nl

More generally, when you identify that you have search parameters that both relate to indexed columns, and where the selectivity is very dependent on the actual parameter values, add OPTION (RECOMPILE), so that you always get the best plan you can get from the statistics.

Obviously, this strategy will add quite a few unnecessary compilations.

Changing the Query Text

As I discussed previously, SQL Server looks up queries in the cache by hashing the query text. This means that two queries with different text are different entries in the cache, even if they are logically equivalent. There are many ways to alter the query text, for instance you could do something like:

IF @fromdate IS NOT NULL AND @todate IS NOT NULL
BEGIN
   SELECT @sql = @sql + CASE WHEN @fromdate = @todate
                             THEN ''
                             WHEN datediff(DAY, @fromdate, @todate) <= 7
                             THEN ' AND 2 = 2 '
                             WHEN datediff(DAY, @fromdate, @todate) <= 30
                             THEN ' AND 3 = 3 '
                             ...

The advantage with changing the query text depending on the parameter values over OPTION (RECOMPILE) is that you will not get as many compilations. But it also means that in some cases you will not run with the best plan, since predicting the exact breakpoint between different plans is hard. With some luck the damage is limited. Even if the plan for a single day is not the best for the span of a week, it may still yield acceptable performance. But you will have to know your data, and possibly tune as time goes.

Could this technique also be used to deal with the fact that different customers have a very different number of orders? Probably not. Counting the number of orders for a customer before we construct the query is taking it too far in my opinion, and it could be more expensive than what you save in the other end. If you really wanted to do this, you would probably have to maintain a separate table with order counts per customer. You would not have to maintain this table in real-time, but it would be sufficient to update it nightly or weekly.

Inlining Some Values

Sometimes you can resolve this sort of problems by inlining a specific parameter into the query. Typically, this would be a parameter with a very skewed distribution. Say that the search includes a @status parameter, and there are only four possible values for Orders.Status, whereof Completed accounts for 95 % of the values. A typical search may be for new orders, less than 1 % of the rows. This is a situation where it could make sense to inline the @status parameter. The four different values are really different searches.

You may recognize the situation from when I discussed filtered indexes where I presented a somewhat different approach. I think both approaches are valid. The solution I presented for filtered indexes, assumes that you know that there is a filtered index and that you also have some understanding of the distribution of the values. But maybe you don't know what the distribution between the small of handful of values will be in production, or what indexes a DBA may add in the future. By inlining you keep the doors open, and as long it is only a matter of small number of values, you will not produce that many extra plans.

What about the case where we had competing indexes like in the example with customer ID and a date interval? In this case, OPTION (RECOMPILE) works out better, since you avoid cache littering. Overall, inlining customer IDs on a general basis is likely to be a bad idea. But if you have five big-whiz customers that overtrump everything else, it may make sense to inline these customer IDs and parameterise the rest. The same applies to product IDs, if you have a very small number of huge-selling products and umpteen low-volume products. It's all about knowing your data.

Note: above I mentioned forced parameterisation. If your database is in this mode, inlining parameters is useless, since SQL Server will replace the constants with parameters anyway.

Index Hints and Other Hints

Sometimes index hints or query hints other than OPTION (RECOMPILE) can be useful. Returning to our example with ERNTC and BOLSR, we can make the observation that the index on CustomerID is always OK. It does not always give the best plan, but it does not give a really bad plan. Whereas, as I noted, the index on OrderDate resulted in hash spills for a long interval. So one strategy could be do:

FROM   dbo.Orders o ' + CASE WHEN @custid IS NOT NULL AND
                                  (@fromdate IS NOT NULL OR
                                  @todate IS NOT NULL) 
                             THEN 'WITH (INDEX = CustomerID) '
                             ELSE ''
                         END

I cannot say that this is my favourite strategy. Overall, index hints is something you should use sparingly. Casually used, they can cause performance problems because you force the optimizer to use the completely wrong index.

A hint that makes a little more sense is the hint OPTIMIZE FOR. If you want plans to always be compiled with the big customers in mind, you could add

IF @custid IS NOT NULL
   @sql += ' OPTION (OPTIMIZE FOR (@custid = ''ERNTC''))'

Or if you want to discourage the optimizer from producing different plans depending on what it sniffs for the date parameters, you could add:

IF @fromdate IS NOT NULL AND @todate IS NOT NULL
   @sql += ' OPTION (OPTIMIZE FOR (@fromdate UNKNOWN, @todate UNKNOWN))'

The optimizer will now apply its standard assumption for a closed interval, which is 10 % or so, and typically too much to make the index interesting.

Conclusion

You have now learnt that there are two ways to implement dynamic search conditions: static SQL and dynamic SQL. Solutions with static SQL almost always use OPTION (RECOMPILE), except in a very simple cases where the optimizer's use of startup filters can be good enough. You have also seen that solutions with static SQL are easy to implement as long as the requirements are moderately complex. Once the requirements increase in complexity, the solutions with static SQL easily becomes unwieldy. A second disadvantage with solutions using OPTION (RECOMPILE) is that very frequent searches can incur an unwanted load on the system because of all the compilations.

More than half of the text in this article was taken up by the discussion on dynamic SQL. This is a good indication of the fact that dynamic SQL is more difficult to work with and requires more understanding from you as a programmer. But dynamic SQL has the advantage that when the requirements for what conditions to handle increase in diversity, the complexity of the code grows more linearly than with static SQL. Correctly used, dynamic SQL reduces the amount of resources needed for query compilation. A special issue with dynamic SQL that you must not forget is that you need to cater for permissions on the tables accessed by the dynamic SQL. Certificate signing is the best way to resolve this when direct SELECT permissions on the tables are not acceptable.

It is important to stress that you cannot only apply the methods in this article on auto-pilot. You need to make your own judgements. And moreover, you need to test your queries, both for logic and for performance. Since you may have a multitude of parameter combinations, it may not be feasible to test all combinations, but you should at least test all parameters in isolation, as well as combinations you expect to be common. With dynamic SQL, you should be careful to test all parameters in combination with some other parameter, because with dynamic SQL you can easily slip so that a certain combination results in a syntax error.

Finally, when testing for performance, you need a database of some size. The article was based on Northgale, and while larger than its source Northwind, it is still a very small database.

Feedback and Acknowledgements

All through the years of this article, and its predecessor, a lot of people have given valuable suggestions and input. My fellow SQL Server MVPs: Steve Kass, Marcello Poletti, Simon Sabin, Alejandro Mesa, Adam Machanic, and Umachandar Jaychandran. And all the rest of you: Mark Gordon, Phillipp Sumi, Thomas Joseph Olaes, Simon Hayes, Sloan Holliday, Travis Gan and Eric Garza . I would also like to give a special credit to Tony Rogerson, Steve Kass, Ivan Arjentinski and Graham Kent for their involvement with the bug with OPTION (RECOMPILE) in SQL 2008 RTM. My big thanks to all of you.

If you have questions or comments on the contents in the article, feel free to mail me at esquel@sommarskog.se. (And that most emphatically includes any spelling or grammar error that you spot!) If you are working with a specific problem and need help, you can mail me too. However, I would encourage you in such case to post your question on a public forum for SQL Server, as there are more people who can answer your questions. (And you may get your answer more rapidly!)

Revision History

2014-12-27
Not a total makeover, but a general overhaul in an attempt to make the presentation clearer. The introductions for static and dynamic SQL now both begin with a summary of advantages and disadvantages. Some significant additions to the content: ...and then there is a formatting makeover to adapt to my new style sheet.
2013-11-02
Travis Gan pointed out that in the section Performance: The Fine Print, there is a risk for parameter sniffing and I've added a short discussion on this. Also, corrected a silly error in the demo procedure search_orders_4. (The arguments to datediff were in the wrong order.)
2011-08-26
Håkan Borneland was kind to point out that my update of 2011-08-01 was not complete, but I still said that there was no service pack for SQL 2008 R2 in one place.
2011-08-01
Updated the article to reflect the release of Service Pack 1 for SQL 2008 R2.
2010-10-16
Service Pack 2 for SQL 2008 has now been released, and I have updated the section A Tale of Service Packs and Cumulative Updates accordingly.
2010-05-23
There is now a Cumulative Update for SQL 2008 R2 with the corrected behaviour of OPTION (RECOMPILE).
2010-05-13
Updated the article for SQL 2008 R2. Note that RTM-version of R2 has the old behaviour of OPTION (RECOMPILE).
2009-11-22
Republished the article, since CU5 of SQL 2008 SP1 restores the RTM behaviour of OPTION (RECOMPILE), but now without the bug.
2009-02-14
Pulled the article, since Microsoft had reverted to the old behaviour of OPTION (RECOMPILE) because of a bug.
2008-08-03
The first version of the article for SQL 2008. A lot of the material from the older version has been ripped out, and a new section on Alternate Key Lookup has been added. Static SQL is now covered before dynamic SQL.

Back to my home page.