Using a Table of Numbers

(or a Table of Dates, Months etc)

An SQL text by Erland Sommarskog, SQL Server MVP. Most recent update 2024-10-31.
Copyright applies to this text. See here for font conventions used in this article.

Introduction

Beware! This article is directed to readers who need to work with legacy versions of SQL Server, that is SQL 2019 and earlier, or for other reasons need to support compatibility level 150 or lower. In SQL 2022, Microsoft introduced the function generate_series, which it is also available in Azure SQL Database and Azure SQL Managed Instance. With this function at hand, you don't need your own table of numbers. If you are lucky to be in that position, you should instead read the short story The Power of Generate_series.

In this short story, I will introduce the reader to the concept of a table of numbers (also called a "tally table" by some people). A table of numbers is a one-column table with all integer numbers from one (or zero) up to some limit, for instance one million. There are several query problems that can be solved with the help of such a table, so it is a good asset to add to your database. Many of these problems are related to date and time, so it can be useful to have a separate table of dates. There can also be situations that call for other tables of time units such as hours, months etc.

In the first chapter after this introduction, I first show how to create and fill up a table of numbers. This is followed by a couple of examples where this table comes in handy. Sometimes it may not be within your powers to create a table, and therefore I have a chapter where I discuss alternatives to use a permanent table. In the last chapter, I take a look at using tables of dates and other time units.

The examples in this article work with an demo database called NorthNumbers, and you can download a script to create it here. This database is cloned from Microsoft's old demo database Northwind, to which I have made a few alterations for the benefit of my examples. The database is small, around 100 MB in size.

In this article, I'm assuming that the reader is using at least SQL 2012, and I will not call out syntax that does not work on older releases.

Table of Contents

Introduction

Table of Numbers

Creating and Populating the Table

Show Data for All Dates

Finding Missing IDs

Picking Strings Apart

Exploding Values

Alternatives to a Full-Fledged Table

Recursive CTE

The Exploding CTEs

spt_values

Using the VALUES Clause

The Ultimate Solution – generate_series

Tables of Dates, Months, Hours etc

Table of Dates

Tables of Other Time-Related Entities

Conclusion

Revision History

Table of Numbers

In this chapter, we will first look at how to create a table of numbers, and the remaining sections includes examples of using that table.

Creating and Populating the Table

Here is a script to create a Numbers table.

CREATE TABLE Numbers (n int NOT NULL PRIMARY KEY);

WITH L0   AS (SELECT 1 AS c UNION ALL SELECT 1),
     L1   AS (SELECT 1 AS c FROM L0 AS A, L0 AS B),
     L2   AS (SELECT 1 AS c FROM L1 AS A, L1 AS B),
     L3   AS (SELECT 1 AS c FROM L2 AS A, L2 AS B),
     L4   AS (SELECT 1 AS c FROM L3 AS A, L3 AS B),
     L5   AS (SELECT 1 AS c FROM L4 AS A, L4 AS B),
     Nums AS (SELECT row_number() OVER(ORDER BY c) AS n FROM L5)
INSERT Numbers (n)
  SELECT n 
  FROM  Nums 
  WHERE n <= 1000000;

SELECT MIN(n) AS "min", MAX(n) AS "max", COUNT(*) AS "count" FROM Numbers;

The table itself is very simple with one single column. The list of CTEs with L0 to L5 and Nums is a bit bewildering, but I leave it uncommented for now, and I will return to it later. The last query serves to verify the result. This is what we see:

min         max         count

----------- ----------- -----------

1           1000000     1000000

This verifies that we have a contiguous range from one to one million. One million is just an arbitrary upper limit, and you can choose a different limit, if you wish. However, if you set the limit too low, you will get incorrect results if you use the table in a query where you need more numbers than you actually have in the table, and I will discuss this further in some of the examples.

The CTEs themselves are good for producing values up to 232, so you can pick any upper limit you like within the realm of an int. Observe though, that it will take quite some time to fill up the table to that maximum limit, and you will end up with a table of 3 GB in size. That may be way more numbers than you will ever need.

As for the lower limit, I have made the choice to start on one, but if you prefer it to start on zero, you can modify the query to do so.

To work with the NorthNumbers database, you don't need run the above, as it comes with a a Numbers table with one million rows (starting on one).

Show Data for All Dates

Say that we want to review the sales for NorthNumbers Traders for a period, for instance the month of December 1997. This is a query to do this:

DECLARE @startdate date  = '19971201',
        @enddate   date  = '19971231';

SELECT O.OrderDate, COUNT(*) AS [Count], SUM(OD.Amount) AS TotAmount
FROM   dbo.Orders O
JOIN   (SELECT OrderID, SUM(UnitPrice * Quantity) AS Amount
        FROM   dbo.[Order Details] OD 
        GROUP  BY OrderID) OD ON O.OrderID = OD.OrderID
WHERE  O.OrderDate BETWEEN @startdate AND @enddate
GROUP  BY O.OrderDate
ORDER  BY O.OrderDate;

This query returns 23 rows. But say now that the requirement is that all dates in the period should appear in the output, and not only those that actually have sales. In order to do this, you need something that spans the date dimension, and this can be achieved with help of the Numbers table. This table permits us to generate all days in the period with this CTE:

WITH dates AS (
   SELECT dateadd(DAY, n-1, @startdate) AS d
   FROM   dbo.Numbers
   WHERE  n BETWEEN 1 AND datediff(DAY, @startdate, @enddate) + 1
)

You can run the query on its own to verify that it returns all dates in December 1997 (with the two variables set as above).

Since Numbers starts on 1, the BETWEEN is a little superfluous, and we could let it suffice with:

WHERE  n <= datediff(DAY, @startdate, @enddate) + 1

However, as I discussed above, you may prefer to make the table zero-based, or some joker may even add negative numbers to it. Thus, BETWEEN serves as a safeguard.

Now that we have this CTE, we need to inject it into the rest of the query. It is the CTE that drives the query, as it provides the dates. Therefore, the CTE is what should appear directly after FROM, and to be sure that all dates are retained in the output, we should left-join it to the rest of the query. This leads us to:

DECLARE @startdate date  = '19971201',
        @enddate   date  = '19971231';

WITH dates AS (
   SELECT dateadd(DAY, n-1, @startdate) AS d
   FROM   dbo.Numbers
   WHERE  n BETWEEN 1 AND datediff(DAY, @startdate, @enddate) + 1
)
SELECT d.d AS OrderDate, COUNT(O.OrderID) AS [Count], 
       isnull(SUM(OD.Amount), 0) AS TotAmount
FROM   dates d
LEFT   JOIN (dbo.Orders O
             JOIN   (SELECT OrderID, SUM(UnitPrice * Quantity) AS Amount
                     FROM   dbo.[Order Details] OD 
                     GROUP  BY OrderID) OD ON O.OrderID = OD.OrderID)
    ON O.OrderDate = d.d
GROUP  BY d.d
ORDER  BY d.d;

This form of nested joins may be new to you. We use parentheses to mark that Orders and the aggregation from Order Details should logically be joined first, before being left-joined to the dates CTE. (The actual physical join order is as always up to the optimizer.)

Note two changes in the SELECT list: I have changed COUNT(*) to COUNT(O.OrderID). This is required to get a count of zero for the days with no orders. COUNT(*) counts all rows, whereas COUNT(col) only counts rows with a non-NULL value in col. I have also wrapped the sum of the amounts in isnull, so that I get zero rather than NULL in the output.

We will revisit this example a few more times in this article.

Finding Missing IDs

Say that you have a table where the business rules mandate the IDs to be contiguous. Every once in a while, you may want to check that there are no gaps to verify that the code the generates the IDs is working properly. Here is a simple-minded query to find if there are any IDs missing in the Orders table in NorthNumbers:

SELECT n.n AS MissingID
FROM   dbo.Numbers n
WHERE  n.n BETWEEN (SELECT MIN(O.OrderID) FROM dbo.Orders O) AND
                   (SELECT MAX(O.OrderID) FROM dbo.Orders O)
  AND  NOT EXISTS (SELECT *
                   FROM   dbo.Orders O
                   WHERE  O.OrderID = n.n);

It returns four ids: 10319, 10320, 10550 and 11064.

This works for our small Orders table with only 826 rows, but very many real-world tables have far more rows than you would ever have in your Numbers table. I included this example here to highlight that you need to be aware of this risk. Every time you use Numbers, you should ask yourself: can I run out of numbers?

There is a better solution using the LAG or LEAD functions. These functions return a value from the previous (LAG) or the next (LEAD) row in the result set. If we only want to find the start and the end of the gaps, we can do:

; WITH IdAndNext AS (
   SELECT OrderID, LEAD(OrderID) OVER(ORDER BY OrderID) AS NextOrderID
   FROM   dbo.Orders
)
SELECT OrderID + 1 AS StartRange, NextOrderID - 1 AS EndRange
FROM   IdAndNext
WHERE  NextOrderID - OrderID > 1
ORDER  BY StartRange;

This query uses LEAD. I leave as an exercise to the reader to rewrite it to use LAG instead.

To get a listing of all missing IDs, we need something to produce all values in a gap, and the Numbers table is a perfect match here:

; WITH IdAndNext AS (
   SELECT OrderID, LEAD(OrderID) OVER(ORDER BY OrderID) AS NextOrderID
   FROM   dbo.Orders
)
SELECT I.OrderID + n.n AS MissingID
FROM   IdAndNext I
JOIN   dbo.Numbers n ON n.n BETWEEN 1 AND I.NextOrderID - I.OrderID - 1
WHERE  I.NextOrderID - I.OrderID > 1
ORDER  BY I.OrderID;

For each gap we only need as many numbers as the gap is wide. It seems unlikely that we would run out of numbers here.

Picking Strings Apart

Sometimes you encounter problems where you need to loop over all characters in a string. Here is a first example that computes the frequency of all letters in the column Notes in the Employees table.

WITH chars AS (
  SELECT lower(substring(E.Notes, n.n, 1)) COLLATE Latin1_General_100_CI_AS ch
  FROM   dbo.Employees E
  JOIN   Numbers n ON n.n BETWEEN 1 AND len(E.Notes)
)
SELECT ch, COUNT(*) AS charfreq
FROM   chars
WHERE  ch BETWEEN 'a' AND 'z'
GROUP  BY ch
ORDER  BY charfreq DESC;

The trick is to join Employees with Numbers to get as many numbers as there are characters in the string.

Note: the purpose of the COLLATE clause is to ensure that a and z are really the first and last letters so that the BETWEEN operator works as intended. The alphabets of some languages (and thus collations) have letters that come after z. Swedish is one example of this.

Here is a more practical application. In the Customers table there is a column OrganisationNo. This is intended to be a Swedish organisation number for a juridical person. This is a 10-digit string, where the last digit is a check digit which is computed like this:

  1. The digits in the odd positions have a weight of 2, whereas digits in the even positions have a weight of 1.
  2. For each digit we compute a product which is digit * weight.
  3. From this product we compute a term which is the sum of the digits in the product. That is, if the digit is 7 and the weight is 2, the result is 5, as 7*2=14 and 1+4=5.
  4. We compute the sum of all the terms and keep only the last digit of that sum.
  5. Unless this last digit is 0, we subtract this last digit from 10. This gives us our check digit.

Here is a query that uses Numbers to validate the check digits in the Customers table:

WITH products AS (
   SELECT C.OrganisationNo, 
          IIF(n.n % 2 = 1, 2, 1) * 
              try_cast(substring(C.OrganisationNo, n.n, 1) AS int) AS product
   FROM   dbo.Customers C
   CROSS  JOIN (SELECT n FROM dbo.Numbers WHERE n BETWEEN 1 AND 9) AS n
   WHERE  len(C.OrganisationNo) = 10
     AND  C.OrganisationNo NOT LIKE '%[^0-9]%'
), checkdigits AS (
   SELECT OrganisationNo, 
         (10 - SUM(product / 10 + product % 10) % 10) % 10 AS checkdigit
   FROM   products
   GROUP  BY OrganisationNo
)
SELECT OrganisationNo, checkdigit
FROM   checkdigits
WHERE  substring(OrganisationNo, 10, 1) <> checkdigit;

In the products CTE, I perform step 1 and 2 above. I also filter out any values that have illegal format, that is, the wrong number of characters or strings that include non-digits. I still need to use try_cast in the SELECT list to avoid conversion errors for non-digits, as SQL Server may prefer to perform the filtering after doing the computation in the SELECT list. (As it happens, there are no illegal values in the OrganisationNo column, but as this is a general principle, I include it as a token of best practice.)

The next CTE, checkdigits, performs the steps 3 to 5 in the list above. The final query selects the rows with incorrect check digits. In total, there are six of them in NorthNumbers.

We will revisit the computation of check digits when we look at alternatives of using a Numbers table.

Here is another example on this theme. Say that you want to retrieve the orders for a number of customers during a certain date interval. The customer IDs in NorthNumbers are five-character strings, so you decide to pass them as a string of concatenated values with fixed length. You extract the values from this string with help of the Numbers table:

CREATE PROCEDURE get_orders @customers nvarchar(500),
                            @startdate date,
                            @enddate   date AS
   SELECT O.OrderID, O.CustomerID, C.CompanyName, O.OrderDate, O.ShippedDate,
          OD.Amount + O.Freight AS TotalAmount
   FROM   dbo.Orders O
   JOIN   dbo.Customers C ON O.CustomerID = C.CustomerID
   JOIN  (SELECT OrderID, SUM(UnitPrice * Quantity) AS Amount
          FROM   dbo.[Order Details]
          GROUP  BY OrderID) AS OD ON O.OrderID = OD.OrderID
   WHERE  O.OrderDate BETWEEN @startdate AND @enddate
     AND  EXISTS (SELECT *
                  FROM   dbo.Numbers n
                  WHERE  n.n BETWEEN 1 AND len(@customers) / 5
                    AND  O.CustomerID = substring(@customers, (n.n - 1) * 5 + 1, 5));
go
EXEC get_orders 'RATTCFOLIGRICARLAZYK', '19970301', '19970331';

It is more common, though, to have numeric ids. And while it is possible to create a fixed-length strings from numeric ids, it is very popular to pass them as a comma-separated list. You can crack such a list into table format with help of the Numbers table too, albeit it is a little more complex. Here is a function do this, together with some examples:

CREATE FUNCTION split_me(@param nvarchar(MAX),
                         @delim varchar(10))
RETURNS TABLE AS
RETURN(SELECT try_cast(substring(@param, n, 
                       charindex(@delim, @param + @delim, n) - n) AS int) AS Value,
              row_number() OVER(ORDER BY n) AS Position
       FROM   dbo.Numbers
       WHERE  n BETWEEN 1 AND len(@param) + len(@delim) - 1
         AND  substring(@delim + @param, n, len(@delim)) = @delim);
go
SELECT Value, Position FROM dbo.split_me('12222,7,59644,19,1', ','); 
SELECT Value, Position FROM dbo.split_me('12222<->7<->59644<->19<->1', '<->');

You may know that starting with SQL 2016, SQL Server comes with a built-in function string_split for this purpose. Unfortunately, this function has several shortcomings. For instance, you cannot get the position of the elements in the list, and it only supports a single-character delimiter. The function split_me above overcomes both these issues. This version is also tailored to handle integer lists, so that you don't need to litter the queries with convert or cast.

Note: in SQL 2022, string_split accepts a third parameter with you can use the request the list position. But since you are reading this article, you may not be using SQL 2022. :-(

What the function does, logically, is to loop over the string to see if a delimiter starts in position n. @delim is prepended to @param, so that we also get a hit for the first element in the list. The value starts in position n and lasts until we find the next delimiter. We here take benefit of that charindex has a third parameter to specify the starting position for the search. In the call to charindex, we append @delim to @param, so that we also get a hit for the last element in the list. The extraction is wrapped in try_cast, so that any non-numeric value results in NULL rather than a conversion error.

While this function works and is usable, there are a few possible improvements that are outside the scope for this article. In my article Arrays and Lists in SQL Server, I discuss all sorts of methods to expand a list into table format, and one chapter is entitled Using a Table of Numbers where I discuss some enhancements.

Exploding Values

The final example we will look at may seem esoteric to some, but it is drawn from something I actually developed for a client.

In the Customers table in NorthNumbers there is a column BonusPoints. NorthNumbers Traders have decided to conduct a lottery among their customers, and the rule is that each bonus point equates to a lottery ticket. (And thus a customer may win more than one prize.) Winning a prize means that you lose one bonus point.

The task is to write a query which generates the ten lucky winners. The trick is join the Customers table with Numbers, so that we get one row for every bonus point. Then we apply the row_number function, ordering by newid() which is good for generating a random order, and then we pick the first ten of these. Finally, we deduct the bonus points in a separate query.

DROP TABLE IF EXISTS #winners
CREATE TABLE #winners (CustomerID    nchar(5)      NOT NULL,
                       CompanyName   nvarchar(40)  NOT NULL,
                       ContactName   nvarchar(40)  NOT NULL,
                       PrizeNo       int           NOT NULL PRIMARY KEY
);

WITH numbering AS (
  SELECT C.CustomerID, C.CompanyName, C.ContactName, C.BonusPoints,
         row_number() OVER (ORDER BY newid()) AS PrizeNo
  FROM   dbo.Customers C
  JOIN   dbo.Numbers n ON n.n BETWEEN 1 AND C.BonusPoints
)
INSERT #winners(CustomerID, CompanyName, ContactName, PrizeNo)
  SELECT CustomerID, CompanyName, ContactName, PrizeNo
  FROM   numbering
  WHERE  PrizeNo <= 10;

SELECT * FROM #winners ORDER BY PrizeNo;

UPDATE dbo.Customers
SET    BonusPoints -= w.cnt
FROM   dbo.Customers C
JOIN   (SELECT CustomerID, COUNT(*) AS cnt
        FROM   #winners
        GROUP  BY CustomerID) AS w ON C.CustomerID = w.CustomerID;

Alternatives to a Full-Fledged Table

You may be in the unfortunate situation where you for one reason or another cannot add a Numbers table to the database you work in. In this chapter, we will explore alternatives to using a fixed Numbers table.

Recursive CTE

This is not really the most elegant solution, nor is it the most efficient. But it is simple, and it is OK if you only need, say, less than a few hundred numbers. Here is a revised version of the query to show order data for all dates:

DECLARE @startdate date  = '19971201',
        @enddate   date  = '19971231';

WITH dates AS (
   SELECT @startdate AS d
   UNION ALL
   SELECT dateadd(DAY, 1, d)
   FROM   dates
   WHERE  d < @enddate
)
SELECT d.d AS OrderDate, COUNT(O.OrderID) AS [Count], 
       isnull(SUM(OD.Amount), 0) AS TotAmount
FROM   dates d
LEFT   JOIN (dbo.Orders O
             JOIN   (SELECT OrderID, SUM(UnitPrice * Quantity) AS Amount
                     FROM   dbo.[Order Details] OD 
                     GROUP  BY OrderID) OD ON O.OrderID = OD.OrderID)
    ON O.OrderDate = d.d
GROUP  BY d.d
ORDER  BY d.d
OPTION (MAXRECURSION 0);

The recursive CTE simply increments the previous value by 1 and keeps on going that way until we reach @enddate and thus we get a row for each date in the interval.

Note the hint OPTION (MAXRECURSION 0) at the end of the query. You should always include this hint when you use a recursive CTE to generate numbers or dates like this. By default, a recursive CTE raises an error if there are more than 100 recursions. The intended use case for recursive CTEs is to unwind tree structures, and 100 levels in a tree is a lot. But when you use a recursive CTE to generate numbers (or dates as here), you may need more than 100 numbers. MAXRECURSION 0 means that you permit any number of recursions.

The Exploding CTEs

You may recall the CTEs we used to fill the Numbers table. Let's take a closer look at them:

WITH L0   AS (SELECT 1 AS c UNION ALL SELECT 1),          -- 2 rows.
     L1   AS (SELECT 1 AS c FROM L0 AS A, L0 AS B),       -- 4 rows.
     L2   AS (SELECT 1 AS c FROM L1 AS A, L1 AS B),       -- 16 rows.
     L3   AS (SELECT 1 AS c FROM L2 AS A, L2 AS B),       -- 256 rows
     L4   AS (SELECT 1 AS c FROM L3 AS A, L3 AS B),       -- 65536 rows.
     L5   AS (SELECT 1 AS c FROM L4 AS A, L4 AS B),       -- 2^32 rows.
     Nums AS (SELECT row_number() OVER(ORDER BY c) AS n FROM L5)

As the comments indicate, each CTE from L1 to L5, squares the number of rows from the previous CTE. Then the final CTE uses row_number to produce the numbers. I did not come up with this query myself, but I got it from my friend and long-time SQL Server MVP Itzik Ben-Gan.

If it is within your powers to create functions, you can package the CTE in a function:

CREATE FUNCTION dbo.fnNumbers(@from int, @to int) 
RETURNS @n TABLE (n int NOT NULL PRIMARY KEY) AS 
BEGIN
   WITH L0   AS (SELECT 1 AS c UNION ALL SELECT 1),
        L1   AS (SELECT 1 AS c FROM L0 AS A, L0 AS B),
        L2   AS (SELECT 1 AS c FROM L1 AS A, L1 AS B),
        L3   AS (SELECT 1 AS c FROM L2 AS A, L2 AS B),
        L4   AS (SELECT 1 AS c FROM L3 AS A, L3 AS B),
        L5   AS (SELECT 1 AS c FROM L4 AS A, L4 AS B),
        Nums AS (SELECT row_number() OVER(ORDER BY c) AS n FROM L5)
   INSERT @n (n)
      SELECT n 
      FROM   Nums
      WHERE  n BETWEEN @from AND @to;
   RETURN
END;

Here is an example of using the function, a variation of the simple-minded query to look for gaps in the order IDs:

SELECT n.n AS MissingID
FROM   dbo.fnNumbers((SELECT MIN(O.OrderID) FROM dbo.Orders O),
                     (SELECT MAX(O.OrderID) FROM dbo.Orders O)) AS n
WHERE  NOT EXISTS (SELECT *
                   FROM   dbo.Orders O
                   WHERE  O.OrderID = n.n);

One advantage with using fnNumbers over the Numbers table is that you don't have to worry about running out of numbers, but the above would work with a hundred-million row table as well. The execution time may be a little longer, though.

Here is a version of the query based on LEAD to find the gaps, based on fnNumbers:

; WITH IdAndNext AS (
   SELECT OrderID, LEAD(OrderID) OVER(ORDER BY OrderID) AS NextOrderID
   FROM   dbo.Orders
)
SELECT I.OrderID + n.n AS MissingID
FROM   IdAndNext I
CROSS APPLY fnNumbers(1, I.NextOrderID - I.OrderID - 1) n
WHERE  I.NextOrderID - I.OrderID > 1
ORDER  BY I.OrderID;

Note that I use CROSS APPLY to invoke the function.

The observant reader may have noticed that the above is a multi-statement function. Normally, if you put a single SELECT statement into a function, you make it an inline table function, as this is normally more efficient. But this is an exception. The optimizer has very little understanding of all those cross joins in the CTEs and may go off a tangent to a compose a execution plan where it actually produces four milliard numbers. I first wrote fnNumbers as an inline function which I tested in the queries above. I killed both when they had been running for a minute. When I changed fnNumbers to be multi-statement, both queries were instant. (As they should be in this small database.)

If you are not permitted to create functions, you may be tempted to use the CTE above directly in a query. The same caveat applies as with the inline function: the performance may be completely abysmal. It may be better to insert the numbers you need into a temp table.

spt_values

SQL Server actually comes with a built-in numbers table, albeit not a very big one. spt_values is an undocumented table in the master database that holds various constants that are used by system procedures. All these values are divided into different "types", and type = P is a series of numbers from 0 to 2047, as you can view with this query:

SELECT * FROM master.dbo.spt_values WHERE type = 'P';

If you know that you don't need as many as 2000 numbers, you could use spt_values, at least for a quick thing. Keep in mind that it is unsupported and undocumented, so it could disappear or change in a future release.

Note: spt_values is not available in Azure SQL Database.

Using the VALUES Clause

If you only need a handful of numbers, you can use the VALUES clause to produce the numbers directly in the query. Here is a variation of the query to compute check digits:

WITH products AS (
   SELECT C.OrganisationNo, 
          IIF(n.n % 2 = 1, 2, 1) * 
              try_cast(substring(C.OrganisationNo, n.n, 1) AS int) AS product
   FROM   dbo.Customers C
   CROSS  JOIN (VALUES(1), (2), (3), (4), (5), (6), (7), (8), (9)) AS n(n)
   WHERE  len(C.OrganisationNo) = 10
     AND  C.OrganisationNo NOT LIKE '%[^0-9]%'
), checkdigits AS (
   SELECT OrganisationNo, 
         (10 - SUM(product / 10 + product % 10) % 10) % 10 AS checkdigit
   FROM   products
   GROUP  BY OrganisationNo
)
SELECT OrganisationNo, checkdigit
FROM   checkdigits
WHERE  substring(OrganisationNo, 10, 1) <> checkdigit;

The Ultimate Solution – generate_series

As I mentioned in the introduction, SQL 2022 comes with the function generate_series, and this function has been available in Postgres for many years. With this function you can say:

SELECT * FROM generate_series(1, 10000);

And this will return all numbers from 1 to 10000. generate_series also permits you to specify a step, so you could say:

SELECT * FROM generate_series(1, 11, 2);

to get the odd numbers 1, 3, 5, 7, 9, 11.

A big advantage with this function is that you don't have to worry about running out of numbers. Nor that some smart person gets the idea of deleting rows from Numbers. So if you have been thinking "is there really anything useful in SQL 2022?", you have the answer now. Yes, there is.

Finally, on Postgres, generate_series supports the timestamp data type, so you can use it to generate days or hours. This query returns all days in March 1997:

SELECT cast(generate_series AS date)
FROM   generate_series('1997-03-01'::timestamp, '1997-03-31', '1 day');

Note: On Postgres, as in ANSI SQL, timestamp is a data type for date and time with no relation to the type known as timestamp in SQL Server.

However, this not supported in SQL 2022.

Tables of Dates, Months, Hours etc

Table of Dates

If you have many queries where you use Numbers to generate dates, it is a good idea to create a separate Dates table, to make the dates-related queries are a little easier to write. Here is a script that creates and populate a Dates table with all dates from 1990-01-01 to 2149-12-31:

DROP TABLE IF EXISTS Dates
CREATE TABLE Dates (Date date NOT NULL PRIMARY KEY);

WITH L0   AS(SELECT 1 AS c UNION ALL SELECT 1),
     L1   AS(SELECT 1 AS c FROM L0 AS A, L0 AS B),
     L2   AS(SELECT 1 AS c FROM L1 AS A, L1 AS B),
     L3   AS(SELECT 1 AS c FROM L2 AS A, L2 AS B),
     L4   AS(SELECT 1 AS c FROM L3 AS A, L3 AS B),
     L5   AS(SELECT 1 AS c FROM L4 AS A, L4 AS B),
     Nums AS(SELECT ROW_NUMBER() OVER(ORDER BY c) AS n FROM L5
)
INSERT Dates(Date)
   SELECT dateadd(DAY, n - 1, '19900101')
   FROM   Nums
   WHERE  n BETWEEN 1 AND datediff(DAY, '19900101', '21500101');

SELECT MIN(Date), MAX(Date), COUNT(*), datediff(DAY, MIN(Date), MAX(Date)) + 1
FROM   Dates;

Rather than using the Numbers table, I use the wild CTEs instead. This is for the benefit of readers who may have been pointed directly to this section, and who have an urgent problem that calls for a table of dates.

The careful reader who runs the script above will notice that this results in 58439 dates, and thus we don't need the L5 CTE. However, I left it in, since there could be readers who need a wider range of dates.

In the beginning of the article we looked at a query to get an order count for all days in a month. With Dates we can simplify this query a little:

DECLARE @startdate date  = '19971201',
        @enddate   date  = '19971231';

SELECT D.Date AS OrderDate, COUNT(O.OrderID) AS [Count], 
       isnull(SUM(OD.Amount), 0) AS TotAmount
FROM   Dates D
LEFT   JOIN (dbo.Orders O
             JOIN   (SELECT OrderID, SUM(UnitPrice * Quantity) AS Amount
                     FROM   dbo.[Order Details] OD 
                     GROUP  BY OrderID) OD ON O.OrderID = OD.OrderID)
    ON O.OrderDate = D.Date
WHERE  D.Date BETWEEN @startdate AND @enddate
GROUP  BY D.Date
ORDER  BY D.Date;

Let's change the requirement a little bit and say that we only want to fill out the result set for weekdays, Monday to Friday. This can easily be achieved by adding this condition to the WHERE clause:

  AND  datename(weekday, D.Date) NOT IN ('Saturday', 'Sunday')

Note: I prefer to use datename over datepart here, since datepart is dependent on the SET DATEFIRST setting, and if you don't watch out, you could be filtering for Friday and Saturday when you wanted to filter for the weekend. True, the output from datename depends on the SET LANGUAGE setting and could return Samstag or domenica depending on the setting. But if so, you will not be filtering for anything at all, which is likely to stand out more than if you filter for the wrong days of the week. If nothing else, datename works with both SET LANGUAGE us_english and SET LANGUAGE British, whereas datepart does not.

A further refinement of this requirement could be that we only want to display business days. For instance, we may not want a row for 1997-12-25, since that was Christmas Day and a holiday in more than one country. This leads to an extension of the Dates table which is known as a calendar table. Just like Dates, a calendar table has one row per day, and in addition there are columns with properties about the dates. The exact set of columns depends on your needs, but common examples are IsHoliday, IsBankingDay etc. If you think about the filter with datename above, this filter means that orders placed during the weekends are lost. One way to address this is to have a column NearestWorkingDay, so you can specify that a Sunday should map to the Monday following or whatever you prefer.

Yet a refinement for an international context is to have a two-column key in your calendar table, where the second column is a country code, so that you can track that 17 May is always a holiday in Norway and likewise 6 June is always a holiday in Sweden.

A discussion of how to design and fill up a calendar table would take up too much space in this article and distract readers from its main focus; I only wanted to introduce you to the concept. For further reading I refer you to Ed Pollack's good article Designing a Calendar Table which explores this topic in detail.

Tables of Other Time-Related Entities

Depending on your application, you may have use for tables of this type for different type of intervals. For instance, in the system I spend most of my time with these days, we have a table of months as well as a table of weeks, because these are dimensions we need to span. The table of months has a single integer column, and the values are 201901, 201902, ... 201912, 202001 etc. The table of weeks is likewise a table of integers with numbers going from 202001 to 202053, 202101 to 202152 etc. In this table, we have one extra column which holds the date for Monday for each week.

You may also find use for tables with higher granularity than a day. As an example, say that you have to show statistics from a log file in slots of some length with the same requirement as when looked at the number of orders per day. That is, there should be a row for all slots, no matter whether there are events in a slot or not. Not surprisingly, you need a table of time slots to do this. However, the solution is somewhat different from when we filled up order dates, since the timestamps in the log table are contiguous, so you cannot make straight an equi-join from your table of slots.

Let's look at an example where we want to see data by 15-minute slots. In NorthNumbers there is a table ErrorLog which I've constructed by taking the contents from the SQL Server errorlog on one of my instances.

For our task, we need a table of quarters, which we create this way:

CREATE TABLE #quarters (q smalldatetime NOT NULL PRIMARY KEY);

INSERT #quarters(q)
   SELECT dateadd(MINUTE, 15*(n-1), '20100101')
   FROM   dbo.Numbers
   WHERE  n >= 1 
     AND  dateadd(MINUTE, 15*(n-1), '20100101') < '20300101';

I did this as a temp table, since if we only need this table in one place in our system, it may not be worthwhile to make it a permanent table. Obviously, we could skip the temp table, and put the SELECT in a CTE that we use directly in our main query.

You may note use the data type smalldatetime here. This makes sense, since we only need minute precision. (It assumes that we don't need to support dates beyond 2079-06-06, which is the upper limit of smalldatetime.)

In a real-world case, your log table may be big, and you make need to take some care to get acceptable performance. Therefore, I'm showing two different solutions. If you encounter a real-world problem of this kind, you may want to try both to see which performs the best. And maybe the winner is some combination of the two.

Here is the first solution. Here @date is a local variable which I set to the current date. In a real-world case, @date would be a parameter to a stored procedure or a parameterised batch.

DECLARE @date date = '2023-03-11';

WITH QuarterAndNext AS (
    SELECT q, LEAD(q) OVER(ORDER BY q) AS nextq 
    FROM   #quarters
    WHERE  q >= @date
      AND  q < dateadd(DAY, 1, @date)
)
SELECT q.q, COUNT(e.LogDate)
FROM   QuarterAndNext AS q
LEFT   JOIN dbo.ErrorLog e ON e.LogDate >= q.q
                          AND e.LogDate < q.nextq
GROUP  BY q.q
ORDER  BY q.q;

In the CTE QuarterAndNext, we extract the quarters for the given day, and we also add the next quarter, so that in the main query we can left-join it to the log table over an interval. Note that we should not use BETWEEN, but the interval should be open in the upper end. (So that events that fall on whole quarters, for instance 2020‑07‑24 22:15:00, are only counted in one interval.) As in the orders example, we don't use COUNT(*), but COUNT of some column from the log table to get 0 when there are no rows.

An interesting property of this solution is that it is entirely agnostic to the length of the slots. It would be no different (save for the names of the table and the CTE), if we wanted to have two-hour slots instead. It would also work for slots of different lengths, for instance quarters during office hours and hours in the evening and at night.

The other solution is somewhat longer:

DECLARE @date date = '2023-03-11';

; WITH midnight AS (
    SELECT LogDate, convert(datetime2(0), convert(date, LogDate)) AS Midnight
    FROM   dbo.ErrorLog
    WHERE  LogDate >= @date
      AND  LogDate < dateadd(DAY, 1, @date)
), MakeQuarters AS (
    SELECT dateadd(MINUTE, 15 * (datediff(MINUTE, Midnight, LogDate) / 15), 
                           Midnight) AS Quarter
    FROM   midnight
)
SELECT q.q, COUNT(mq.Quarter)
FROM   #quarters q
LEFT   JOIN MakeQuarters mq ON q.q = mq.Quarter
WHERE  q.q >= @date
  AND  q.q < dateadd(DAY, 1, @date)
GROUP  BY q.q
ORDER  BY q.q;

The first CTE, midnight, computes midnight for all entries in the log table for the chosen interval. Note again that we cannot use BETWEEN, but we need a combination of >= and <. In the next CTE, MakeQuarters, we normalise all values of LogDate to whole quarters, by first computing the minutes since midnight, divide that by 15 with integer division, and then multiply that value with 15 again. Finally, we add that to the midnight value. In the final query, we make a straight equi-join of the quarters to the normalised LogDate value.

Unlike the previous query, this one is hardwired to 15-minute slots, but it is straightforward to change the slot size to something else. It would be more work to adapt it to slots of different length, though. You may also note that the interval condition on @date is repeated for both tables. From a logical point of view, there is no need to apply that filter on the log table, but it is likely to be beneficiary for performance.

Note: SQL 2022 introduces the function date_bucket which performs the acrobatics in the last query for us.

Conclusion

We have now looked at how you can use a table of numbers to solve various problems. While I have given some examples, it is not a complete catalogue, and I am sure that once you have this table in your toolbox, you will find other uses for your Numbers table.

We have also looked at alternatives when in its not within your powers to create tables, but these solutions should be seen as a plan B, and you should firstly act to get a Numbers table created.

We have also looked at having a table of dates, and in practice, this may be what you will use the most. And while a one-column table of dates is good for many problems, it is not unlikely that you will need full-fledged calendar table, and again I recommend reading Ed Pollack's article if you need this. Or google for "calendar table" to find other sources.

And as a reminder: this article is useful if you need to support older versions of SQL Server where generate_series() is not available. Once you have access to this function, you don't need the Numbers table any more. Instead you should read the accompanying short story The Power of Generate_series.

If you have questions or comments directly related to this article, feel free to drop me a mail on esquel@sommarskog.se. I like to stress that you are more than welcome to point out spelling or grammar errors. On the other hand, if you have a specific problem you need help with, I recommend that you ask your question in a public forum for SQL Server, as more people will see your question and you may get help quicker than if you mail an individual.

Revision History

2024-10-31
A very small change with regards to the Dates table where I've changed the name of the column to align with the accompanying article The Power of Generate_series.
2023-04-14
With SQL 2022, we finally got generate_series() as a function in SQL Server! This is great news. Rather than covering generate_series in this article, I've written a new short story, The Power of Generate_series, which also looks at the new functions datetrunc and date_bucket. This article here has now been demoted to be ab article for legacy versions of SQL Server. There is a minor update in the section Tables of Other Time-Related Entities, as I've added an ErrorLog Table to NorthNumbers
2020-09-16
First version.

Back to my home page.