The Short Version
An SQL text by Erland Sommarskog, SQL Server
MVP. Latest Revision 2023-03-11.
Copyright applies to this text. See here for font conventions used in this article.
This is a short article directed to readers with a limited experience of SQL Server programming that discusses how to handle a list of values delimited by commas or some other separator. There is an accompanying article, Arrays and Lists in SQL Server, The Long Version, which includes many more ways to crack lists in to tables – and you could argue too many. This longer article is intended for an audience with a little more experience. Most likely, this short story should tell you all you need, but in a few places I will refer to the longer article for users with special needs.
Table of Contents
string_split – The Built-in Solution
Two Simple Multi-Statement Functions
What If You Cannot Use a Function?
Delimited Lists in a Table Column
I frequently see people asking in SQL forums why does this not work?
DECLARE @list varchar(23) = '1,2,3,4' SELECT ... FROM tbl WHERE col IN (@list)
The answer is that it does work: just look at this:
CREATE TABLE #test (id int NOT NULL, col varchar(23) NOT NULL) INSERT #test(id, col) VALUES(1, 'Something'), (2, '1,2,3,4'), (3, 'Anything') DECLARE @list varchar(23) = '1,2,3,4' SELECT id FROM #test WHERE col IN (@list)
The SELECT returns the row with id = 2, not the others.
People who ask why IN does not work have a misconception about IN. IN is not a function. IN is an operator and the expression
col IN (val1, val2, val3, ...)
is simply a shortcut for:
col = val1 OR col = val2 OR col = val3 OR ...
val1 etc here can be table columns, variables or constants. The parser rewrites the IN expression to a list of OR as soon as it sees it. (This explains why you get multiple error messages when you happen to misspell the name of the column left of IN.) There is no magical expansion of a variable values. The value
'1,2,3,4' means exactly that string, not a list of numbers.
Now you know why
IN (@list) does not work as you hoped for, but if you have a comma-separated list you still need to know how to work with it. That is what you will learn in this chapter.
The best approach in my opinion is to reconsider having a delimited list at all. After all, you are in a relational database, so why not use a table? That is, you should pass the data in a table-valued parameter (TVP) instead of a delimited list. If you have never used TVPs before, I have an article, Using Table-Valued Parameters in SQL Server and .NET, where I give a tutorial of passing TVPs from .NET to SQL Server., The article includes a detailed description of passing a comma-separated list to a TVP. You will find that it is astonishingly simple.
Unfortunately, not all environments support TVPs, so using a TVP is not always an option. In that case you need to split the list into table format, and that is what we will look at in the rest of this chapter.
If you are on SQL 2016 or later, there is a very quick solution:
SELECT ... FROM tbl WHERE col IN (SELECT convert(int, value) FROM string_split('1,2,3,4', ','))
string_split is a built-in table-valued function that accepts two parameters. The first is a delimited list, and the second is the delimiter.
There are two forms of string_split. Used as in the example above, it returns a result set with a single column value. Take this query:
SELECT * FROM string_split ('a|b|v', '|')
You get back this result set:
There are situations where you want to know the order of the elements in the list. For instance, you may have two lists that you want to keep in sync. (We will see an example of this at the end of the article.) To achieve this, you can add a third parameter which you specify as 1. This will add a second column, ordinal, to the result set. Here is an example:
SELECT * FROM string_split('a|b|v', '|', 1)
This is the output:
Beware that this parameter is not available in SQL 2019 and earlier, but only in SQL 2022 and later. It is also available in Azure SQL Database and Azure Managed Instance.
While string_split certainly is useful, it has a couple of shortcomings, so it does not always meet your needs:
'1,2,,4'. What do you want to happen in this case? string_split will return an empty string and when you apply the convert above, this will produce a 0 which is unlikely to be correct. You may prefer to get NULL back.
In the following sections we will look at alternatives to string_split.
If you search the web, there is no end of functions to split strings into table format. Here, I will present two simple functions that run on SQL 2008 or later, one for a list of integers and one for a list of strings. I should immediately warn you that these functions are not the most efficient and therefore not suitable if you have long lists with thousands of elements. But they are perfectly adequate if you are passing the contents of a multi-choice checkbox from a client where you rarely have as many as 50 elements.
I opted to share these functions because they are simple and you can easily adapt them if you want different behaviour with regards to the choices that I have made. In my long article, I describe methods that are faster, but they all require extra setup than just a function.
Below is a function to split a delimited list of integers. The function accepts a parameter for the delimiter which can be up to 10 characters long. The function returns the list positions for the elements. An empty element is returned as NULL. If there is a non-numeric value in the list, there will be a conversion error.
CREATE FUNCTION intlist_to_tbl (@list nvarchar(MAX), @delim nvarchar(10)) RETURNS @tbl TABLE (listpos int NOT NULL IDENTITY(1,1), n int NULL) AS BEGIN DECLARE @pos int = 1, @nextpos int = 1, @valuelen int, @delimlen int = datalength(@delim) / 2 WHILE @nextpos > 0 BEGIN SELECT @nextpos = charindex(@delim COLLATE Czech_BIN2, @list, @pos) SELECT @valuelen = CASE WHEN @nextpos > 0 THEN @nextpos ELSE len(@list) + 1 END - @pos INSERT @tbl (n) VALUES (convert(int, nullif(substring(@list, @pos, @valuelen), ''))) SELECT @pos = @nextpos + @delimlen END RETURN END
You are likely to be puzzled by the COLLATE clause. This is a small speed booster. By forcing a binary collation, we avoid that SQL Server employs the full Unicode rules when searching for the delimiter. This pays off when scanning long strings. Why Czech? The language does not matter here, so I just picked one with a short name.
And why datalength divided by 2 and not len? datalength returns the length in bytes, whence the division. len does not count trailing spaces, so it does not work if the delimiter is a space.
Here are two examples:
SELECT * FROM intlist_to_tbl('1,2,3, 677,7 , ,-1', ',') SELECT * FROM intlist_to_tbl('1<->2<->3<-> 677<->7<-><->-1', '<->')
Since the values are the same in both lists, the output is the same:
Here is an example of how you would use it in a simple query:
SELECT ... FROM tbl WHERE col IN (SELECT n FROM intlist_to_tbl('1,2,3,4', ','))
If you find that you are only using comma-separated lists, you may grow tired of having to specify the delimiter every time. To that end, this wrapper can be handy:
CREATE FUNCTION intlisttotbl (@list nvarchar(MAX)) RETURNS TABLE AS RETURN ( SELECT listpos, n FROM intlist_to_tbl(@list, ',') )
I leave it as an exercise to the reader to come up with a better name.
Here is a function for a list of strings. It accepts an input parameter of the type nvarchar(MAX), but the return table has both a varchar and an nvarchar column. I will return to why in a second. Like intlist_to_tbl it returns the list position. It trims leading and trailing spaces. In contrast to intlist_to_tbl, empty elements are returned as empty strings and not as NULL.
CREATE FUNCTION strlist_to_tbl (@list nvarchar(MAX), @delim nvarchar(10)) RETURNS @tbl TABLE (listpos int NOT NULL IDENTITY(1,1), str varchar(4000) NOT NULL, nstr nvarchar(4000) NOT NULL) AS BEGIN DECLARE @pos int = 1, @nextpos int = 1, @valuelen int, @nstr nvarchar(4000), @delimlen int = datalength(@delim) / 2 WHILE @nextpos > 0 BEGIN SELECT @nextpos = charindex(@delim COLLATE Czech_BIN2, @list, @pos) SELECT @valuelen = CASE WHEN @nextpos > 0 THEN @nextpos ELSE len(@list) + 1 END - @pos SELECT @nstr = ltrim(rtrim(substring(@list, @pos, @valuelen))) INSERT @tbl (str, nstr) VALUES (@nstr, @nstr) SELECT @pos = @nextpos + @delimlen END RETURN END
Here are two examples:
SELECT * FROM strlist_to_tbl(N'Alpha (α) | Beta (β)|Gamma (γ)|Delta (δ)|', '|') SELECT * FROM strlist_to_tbl(N'a///b///c///v///x', '///')
Here is the output:
listpos str nstr
----------- ---------- ----------
1 Alpha (a) Alpha (α)
2 Beta (ß) Beta (β)
3 Gamma (?) Gamma (γ)
4 Delta (d) Delta (δ)
listpos str nstr
----------- ---------- -----------
1 a a
2 b b
3 c c
4 v v
5 x x
Note in the first result set that the Greek characters has been replaced by fallback characters in the str column. They are unchanged in the nstr column. (If you have a Greek or a UTF‑8 collation, the two columns will be identical, though.)
Here are two examples of using this function:
SELECT ... FROM tbl WHERE varcharcol IN (SELECT str FROM strlist_to_tbl('a,b,c', ',')) SELECT ... FROM tbl WHERE nvarcharcol IN (SELECT nstr FROM strlist_to_tbl('a,b,c', ','))
These examples illustrate why there are two columns. If you are going to use the list against a varchar column, you need to use the str column. This is important because of the type-conversion rules in SQL Server. If you mistakenly compare varcharcol to nstr, varcharcol will be converted to nvarchar, and this can render any index on varcharcol ineligible for the query, leading to a performance disaster as the table must be scanned. And conversely, if you have a nvarchar column, you need to compare it to the nvarchar value, since else the result can be incorrect because of the character replacement with the conversion to varchar.
I like to point out that these functions are by no means cast in stone, but see them as suggestions. Feel free to modify them according to your preferences and needs.
If you are in the unfortunate situation that string_split does not work for you and you don't have the permission or authorisation to create functions, what can you do? One option is of course to incorporate the function body in your code, but that is not really appealing.
An alternative that is popular with some people is to convert the list into an XML document. This works on all versions from SQL 2005 and up:
DECLARE @list nvarchar(MAX) = '1<->99<->22<->33<->45', @xml xml SELECT @xml = '<x>' + replace(@list COLLATE Czech_BIN2, '<->', '</x><x>') + '</x>' SELECT X.x.value('.', 'int') AS val --, row_number() OVER(ORDER BY X.x) AS listpos FROM @xml.nodes('/x/text()') X(x)
To give you an idea of what is going on, here is the resulting XML:
You can use the XML query in your main query directly, but it is probably easier to stick the result in a temp table and work from there.
As you can see, there is a listpos column in the query, but I have commented it out. This is because while seems to give the desired result, it is to my knowing not something which is documented and you can rely on. That is, it could stop working at some point.
If you on are SQL 2016, SQL 2017 or SQL 2019, and you need the list position but you cannot write your own function, there is an option that is easier to use than XML, to wit JSON:
DECLARE @list nvarchar(MAX) = '1,99,22,33,45' SELECT convert(int, [key]) + 1 AS listpos, convert(int, value) AS n FROM OPENJSON('[' + @list + ']')
That is, just wrap the list in brackets and off you go. Would you have another delimiter than comma, you will need to replace that delimiter with a comma to adhere to the JSON syntax.
OPENJSON returns a result set with three columns, but only key and value are of interest to you. Both are nvarchar(4000), so you need to cast them to int. Note that the values in key are zero-based.
In these examples, I used integer lists. I need to raise a word of warning if you are considering to use XML or JSON for lists of strings. If the values are strictly alphanumeric, no sweat. But if there are characters that are special to XML or JSON, the method above will fall apart. It possible to save the show with help of CDATA sections that protects special characters as in this example I got from Yitzhak Khabinsky:
DECLARE @list nvarchar(MAX), @xml xml SET @list = 'Dog & [Pony],Always < then,Glenn & Co. > 100'; SELECT @xml = '<x><![CDATA[' + replace(@list COLLATE Czech_BIN2, ',', ']]></x><x><![CDATA[') + ']]></x>'; SELECT @xml SELECT X.x.value('.', 'nvarchar(30)') AS val --, row_number() OVER(ORDER BY X.x) AS listpos FROM @xml.nodes('/x/text()') X(x);
The output is:
Dog & [Pony]
Always < then
Glenn & Co. > 100
If you feel that your head is starting to spin at this moment, you have my sympathy. Despite its complexity, it is probably the best solution when you cannot write a function. If you want an alternative, you can look at the CTE method that I describe in my longer article. This method can give you the list position in a guaranteed way.
When it comes to speed, XML and JSON are faster than the functions that I showed you in the previous section, and they should work well with lists with thousands of values. Particularly, pay attention to the addition of the text() function in the .nodes method. Without it, shredding the XML takes about 50 % more time. (I owe this trick to Yitzhak Khabinsky.)
Amazingly enough, I still occasionally see people who use or propose dynamic SQL. That is, something like this:
SELECT @sql = 'SELECT ...FROM tbl WHERE col IN (' + @list + ')'
There are all sorts of problems here. Risk for SQL injection. It makes the code more difficult to read and maintain. (Just imagine that this is a large query spanning fifty lines that someone wrapped in dynamic SQL only because of the list). Permissions can be a problem. It leads to cache littering. On top of that performance is poor. Above I cautioned you that the functions I presented are not good for long lists – but they are certainly better than dynamic SQL. It takes SQL Server a long time to parse a long list of values for IN.
Do absolutely not use this!
Sometimes you may encounter a table column which holds a delimited list of values. This is an anti-pattern that appears to have become rather more popular over the years, despite that relational databases are designed from the principle that a cell (that is, a column in a row) is supposed to hold one single atomic value. Storing delimited lists goes against that principle, and if you store data this way, you will have to pay a dear price in terms of performance and complex programming. The proper way is to store the data in the list in a child table.
Nevertheless, you may encounter a comma-separated list that someone else has designed. And even if you have the power to change the design, you still need to how to handle it. Let's first get an example to work with:
CREATE TABLE orders (orderid int NOT NULL, custid int NOT NULL, orderdate date NOT NULL, products varchar(MAX) NOT NULL, quantities varchar(MAX) NOT NULL, prices varchar(MAX) NOT NULL, CONSTRAINT pk_orders PRIMARY KEY (orderid) ) go INSERT orders (orderid, custid, orderdate, products, quantities, prices) VALUES (1, 108, '20201215', 'A16769,B1234,B2679,DL123', '1,2,1,1', '100,123,9000,450'), (2, 985, '20201216', 'A16769,A8744,B1233,CBGB2,E98767', '3,4,1,1,7', '100,560,400,600,320'), (3, 254, '20201217', 'X5277', '19', '300') go SELECT * FROM orders
This is an unusually bad example with three comma-separated lists that are synchronised with each other. (Thankfully, I rarely something this crazy in the wild!) To keep it simple, we first ignore the quantities and prices columns and run a query that lists the orders with one product per row:
SELECT o.orderid, o.custid, p.str AS prodid FROM orders o CROSS APPLY strlist_to_tbl(o.products, ',') AS p ORDER BY o.orderid, prodid
The key is the CROSS APPLY operator. APPLY is a kind of a join operator. When you say A JOIN B, you add conditions with ON that correlate A and B, but B itself cannot refer to A. For instance, B cannot be a call to a table-valued function that takes a column from A as parameter. But this is exactly what APPLY permits you. On the other hand, there is no ON clause with APPLY as the relation between A and B is inside B. (B can also be a subquery).
Here is the result set:
orderid custid prodid
----------- ----------- ---------
1 108 A16769
1 108 B1234
1 108 B2679
1 108 DL123
2 985 A16769
2 985 A8744
2 985 B1233
2 985 CBGB2
2 985 E98767
3 254 X5277
Note: There is also OUTER APPLY. The difference between CROSS APPLY and OUTER APPLY is outside the scope of this article, though.
The normal design is of course to have two tables, orders and orderdetails. Here is a script to create a new table and move the data in the columns products, quantities and prices columns to this new table:
CREATE TABLE orderdetails (orderid int NOT NULL, prodid varchar(10) NOT NULL, qty int NOT NULL, price int NOT NULL, CONSTRAINT pk_orderdetails PRIMARY KEY (orderid, prodid) ) INSERT orderdetails (orderid, prodid, qty, price) SELECT o.orderid, p.str AS prodid, q.n AS qty, c.n AS price FROM orders o CROSS APPLY strlist_to_tbl(o.products, ',') AS p CROSS APPLY intlist_to_tbl(o.quantities, ',') AS q CROSS APPLY intlist_to_tbl(o.prices, ',') AS c WHERE p.listpos = q.listpos AND p.listpos = c.listpos ALTER TABLE orders DROP COLUMN products, quantities, prices
To pair the values from the lists, we synchronise them on the listpos column.
Here is the same operation, but using string_split and making use of the third parameter to get the list position. As noted above, this requires SQL 2022 or later.
SELECT o.orderid, p.value AS prodid, convert(int, q.value) AS qty, convert(int, c.value) AS price FROM orders o CROSS APPLY string_split(o.products, ',', 1) AS p CROSS APPLY string_split(o.quantities, ',', 1) AS q CROSS APPLY string_split(o.prices, ',', 1) AS c WHERE p.ordinal = q.ordinal AND p.ordinal = c.ordinal
Say now that you need to update one of the values in the list. Well, didn't I tell you: you need to change the design so that the delimited list becomes a child table? But, OK, you are stuck with the design, so what do you do? Answer: you will have to unpack the data into a temp table, perform your updates and then reconstruct the list(s). As I said, relational databases are not designed for this pattern.
Here is how you would rebuild a list if you are on SQL 2017 or later. For brevity, I only show how to build the products column. The other two are left as an exercise to the reader.
SELECT orderid, string_agg(prodid, ',') WITHIN GROUP (ORDER BY prodid) FROM orderdetails GROUP BY orderid
The string_agg function is an aggregate function like SUM or COUNT and it builds a concatenated list of all the input values delimited by the string you specify in the second parameter. The WITHIN GROUP clause permits you to specify the order of the list.
If you are on SQL 2016 or earlier, you can use FOR XML PATH which is a more roundabout way and the syntax is not very intuitive:
; WITH CTE AS ( SELECT orderid, p.products.value('.', 'nvarchar(MAX)') AS products FROM orders o CROSS APPLY (SELECT od.prodid + ',' FROM orderdetails od WHERE o.orderid = od.orderid ORDER BY od.prodid FOR XML PATH(''), TYPE) AS p(products) ) SELECT orderid, substring(products, 1, len(products) - 1) FROM CTE
I refrain from trying to explain how it works. Just try to mimic the pattern if you need this. Or redesign the table after all...
As long as you have only a handful of elements, the method you use to crack the list does not have any significant impact on performance. What is more important is how you get the values from the list into the rest of the query. For simplicity's sake, I have shown you examples like this:
SELECT ... FROM tbl WHERE col IN (SELECT n FROM intlist_to_tbl('1,2,3,4', ','))
However, the optimizer has a hard time to come up with the best plan, since it does not know much about what is coming out of the function. This gets more pronounced if the query is complex and includes a couple of joins and whatnots. This can result in poor performance, because the optimizer settles on a table scan where it should use an index or vice versa. This applies regardless if you use your own function, string_split or something with XML or JSON.
For this reason, I recommend that you unpack the list of values into a temp table and then use that temp table in your query like this:
CREATE TABLE #values (n int NOT NULL PRIMARY KEY) INSERT #values(n) SELECT number FROM intlist_to_tbl('1,2,3,4', ',') SELECT ... FROM tbl WHERE col IN (SELECT n FROM #values)
Because a temp table has statistics, the optimizer has knowledge about the actual values in the list, and therefore the chances are better that it will come up with a good plan.
Extended the section on string_split to describe the new third parameter that is now available in Azure SQL Database.
Back to my home page.