SearchCode

Search Your Database Code with Full-Text

An SQL text by Erland Sommarskog, SQL Server MVP. Last revision: 2018-12-01.
Copyright applies to this text. See here for font conventions used in this article.

Introduction

A frequent question on SQL Server forums is how to find which store procedures, functions etc that refer to a certain table or a column. SQL Server offers two catalog views to track dependencies: the older sys.sql_dependencies and the newer sys.sql_expression_dependencies. The latter is accompanied by two DMVs sys.dm_sql_referenced_entities and sys.dm_sql_referencing_entities. While they are useful, none of these four are fully reliable. There are two reasons for this:

In this article, I offer an alternate solution, SearchCode, to address these problems. SearchCode is a simple utility that permits you to import all SQL code in sys.sql_modules into a full-text indexed table, which you can query with CONTAINS and the other full-text predicates. The advantage over using the catalog views is that nothing (well, almost) can hide from the full-text search. But as we shall see, the solution has its own set of short-comings, so it is not a replacement for the built-in views, but rather a supplement.

In this short article, I will look both the built-in capabilities and SearchCode.

Looking at the Built-in Views

To have something to work with, we create this fairly non-sensical database:

CREATE DATABASE dependdemo 
go
USE dependdemo
go
CREATE TABLE tableone   (colone int NOT NULL, coltwo int NULL, colthree int NULL)
CREATE TABLE tabletwo   (colone int NOT NULL, coltwo int NULL, colthree int NULL)
CREATE TABLE tablethree (colone int NOT NULL, coltwo int NULL, colthree int NULL)
go
CREATE PROCEDURE procone AS 
CREATE TABLE #temp (a int NOT NULL)
SELECT colone, coltwo FROM tableone
SELECT two.colone, two.coltwo
FROM   tabletwo two
JOIN   #temp t ON two.colone = t.a
EXEC('SELECT colone, coltwo FROM tablethree')
go
EXEC procone

Let's first see what information we can find in sys.sql_dependencies:

SELECT o.name AS procname, o2.name AS tbl, c.name AS col, d.* 
FROM  sys.sql_dependencies d
JOIN  sys.objects o  ON d.object_id           = o.object_id
JOIN  sys.objects o2 ON d.referenced_major_id = o2.object_id
LEFT  JOIN  sys.columns c ON c.object_id = o2.object_id
                         AND c.column_id = d.referenced_minor_id

Note: As the result sets are a bit too wide to easily fit into the article, I don't include the result sets in the text, but I encourage you to run the queries yourself.

In the result set, there are two rows for the two columns in tableone, but there is no trace of the other two tables. The dependency on tabletwo is not recorded because of the temp table. When SQL Server creates the procedure, the temp table does not exist, and therefore SQL Server abandons analysis of the query that refers to #temp and tabletwo. As a consequence, the dependency to tabletwo is not recorded. tablethree is hidden in a string literal, do SQL Server does not see this reference at all, and thus neither that dependency is recorded.

You may note that the view includes a column is_updated, so you can tell whether a procedure updates a column. is_select_all means that the column is referred to by means of SELECT *. An important property of sys.sql_dependencies is that if you drop tableone and recreate it, the query above no longer returns any rows. This is due to that the dependency is recorded by object id, and thus is lost when the table is dropped.

sys.sql_dependencies was introduced in SQL 2005, as a replacement for the even older system table sysdepends in SQL 2000 and earlier. Today, sys.sql_dependencies is officially deprecated, in favour of sys.sql_expression_dependencies and the two DMVs, that all three were introduced in SQL 2008.

Next, we look at sys.sql_expression_dependencies:

SELECT o.name AS procname, ed.* 
FROM   sys.sql_expression_dependencies ed
JOIN   sys.objects o ON ed.referencing_id = o.object_id

We get two rows, one each for tableone and tabletwo. That is, tabletwo is detected here, despite the temp table. But tablethree is still missing. In this view, dependencies are recorded by name, so no information is lost if a table is dropped. You may note that we don't get information on column level in this view. (There is a referencing_minor_id in this view, but it appears to be non-zero only for references like CHECK constraints and indexes.) You can also tell from the columns referenced_server_name and referenced_database_name in the view that if procone were to refer to tables in other databases or other servers, those dependencies would also be recorded. (This is not the case with sys.sql_dependencies.)

Let's look at the first of the two DMVs:

SELECT s.name, o.name, re.*
FROM   sys.objects o
JOIN   sys.schemas s ON o.schema_id = s.schema_id
CROSS  APPLY sys.dm_sql_referenced_entities(s.name + '.' + o.name, 'OBJECT') re

This time we get four rows in the output. We get three rows for the references to tableone; one for the table itself and one for each column being referred to. (Note that colthree is not listed.) We also get one row for tabletwo, but we cannot see which columns are being referred to in this table. The reason for this is again the temp table. tablethree is once more missing. Just like sys.sql_dependencies, this DMV gives us information about whether a column is selected or updated. But in difference to sys.sql_dependencies, this information is not persisted, but it is determined on the demand when you run the DMV. If all you want to know is what a certain procedure touches, this is alright. But if you want to find where a certain column is updated, you need to run a query like the one above and invoke the DMV for all SQL modules in your database. If you have thousands of stored procedures, the response will not be instant.

Finally, here is a query with the second DMV:

SELECT s.name, o.name, re.*
FROM   sys.objects o
JOIN   sys.schemas s ON o.schema_id = s.schema_id
CROSS  APPLY sys.dm_sql_referencing_entities(s.name + '.' + o.name, 'OBJECT') re

This DMV relates which objects that refer to a specific table (or stored procedure). Here we get one row each for tableone and tabletwo, but tablethree is still missing. There is no information on column level.

Disclaimer: I will have to admit that I have mainly worked with sys.sql_dependencies and largely given the other three a miss. The last section will explain why. Thus, I may have missed some useful nuance of this trio.

Entering Full-text

Let's now look at the full-text solution. To be able to use SearchCode, your SQL Server instance must fulfil two prerequisites:

The full script for this chapter is in file SearchCode.sql. The script creates a database SearchCode and in this database it creates two tables Objects and Lines. Objects holds the name and type of the object (stored procedure, trigger etc), while Lines holds the SQL code split up into lines. The script loads the two tables with the SQL code in the dependdemo database and then creates a full-text index on the Lines table. The file also includes a sample query. If you want to use full-text to search your database, you can use the script as-is – almost. You have to replace dependdemo with the name of your database. Below I discuss the script in a little more detail.

The script starts off with creating two auxiliary objects: a Numbers table and a table-valued function inline_split_core, that I don't discuss further here. If you are curious, you can read about them in my article Arrays and Lists in SQL Server. These are the two tables that actually hold the data:

CREATE TABLE Objects (object_id int     NOT NULL,
                      schema_   sysname NOT NULL,
                      name      sysname NOT NULL,
                      type      char(2) NOT NULL,
                      CONSTRAINT pk_Objects PRIMARY KEY (object_id),
                      CONSTRAINT u_Objects UNIQUE (schema_, name)
)
go
CREATE TABLE Lines (ident       int IDENTITY   NOT NULL,
                    object_id   int            NOT NULL,
                    linenum     int            NOT NULL,
                    linetext    nvarchar(MAX)  NOT NULL,
                    CONSTRAINT pk_Lines PRIMARY KEY (object_id, linenum),
                    CONSTRAINT u_Lines UNIQUE(ident),
                    CONSTRAINT fk_Lines_Objects FOREIGN KEY(object_id)
                        REFERENCES Objects(object_id)
)

The object_id is the object id in the source database, would you need it. schema_ has an underscore since SCHEMA is a reserved keyword in T‑SQL (and I don't like typing brackets). type is the same as in sys.objects. The purpose of the ident column in Lines will appear later. linenum is, yes, the line number, and linetext is the actual code on that line.

The Objects table is loaded this way:

INSERT Objects (object_id, schema_, name, type)
   SELECT o.object_id, s.name, o.name, o.type
   FROM   dependdemo.sys.objects o
   JOIN   dependdemo.sys.schemas s ON o.schema_id = s.schema_id
   WHERE  EXISTS (SELECT *
                  FROM   dependdemo.sys.sql_modules sm
                  WHERE  o.object_id = sm.object_id)

Keep in mind that when you load SearchCode for your own database, you should replace dependdemo with name of your own database. The same applies to the statement that loads Lines:

INSERT Lines (object_id, linenum, linetext)
   SELECT sm.object_id, isc.Pos, isc.Value
   FROM   dependdemo.sys.sql_modules sm
   CROSS  APPLY inline_split_core(
      replace(sm.definition COLLATE Latin1_General_BIN2, char(13), ''), char(10)) isc

inline_split_core splits the input into rows on the given separator which is char(10) i.e., line-feed. Commonly, lines in the Windows world are terminated by CR-LF, but I also want to support code that was created from Unix or elsewhere and only have LF as terminator. Whence, I replace all CR by nothing at all. The reason I force a binary collation is performance – on a larger database, this makes a considerable difference.

Finally, we set up the full-text index:

CREATE FULLTEXT CATALOG fulltext WITH ACCENT_SENSITIVITY = ON AS DEFAULT
CREATE FULLTEXT INDEX ON Lines(linetext LANGUAGE 0x0) KEY INDEX u_Lines

All full-text indexes must belong to a full-text catalog, so we need to start by creating one, and since we mark it as default, we don't have to actually mention it when we create the index. When creating a full-text index we need to specify three things:

In difference to a regular index, a full-text index is populated asynchronously, so it can take some time before you can use the index. For the small dependdemo database it is only a matter of seconds but for a database with several thousands of store procedures it may take up to an hour. To see whether the index build has completed, you can use the query below. Once it returns 1, the index has been fully populated.

SELECT has_crawl_completed FROM sys.fulltext_indexes

So how to you use this? Here is a sample query:

SELECT O.schema_, O.name, O.type, L.linenum, L.linetext
FROM   Objects O
JOIN   Lines L ON O.object_id = L.object_id
WHERE  CONTAINS(L.linetext, 'tablethree')
ORDER  BY O.name, L.linenum

You will find that query returns one row, saying that tablethree is mentioned in procone on line 7. This is the reference that none of the of the built-in views were able to find, because it was hidden in dynamic SQL.

CONTAINS is a predicate that is special for full-text. Here the search-term was a single word. But you can also search on a sequence:

SELECT O.schema_, O.name, O.type, L.linenum, L.linetext
FROM   Objects O
JOIN   Lines L ON O.object_id = L.object_id
WHERE  CONTAINS(L.linetext, '"colone coltwo"')
ORDER  BY O.name, L.linenum

Note here that search term has to be enclosed in double-quotes. This is required when it is more than a single word. Note also that in procone, colone and coltwo are separated by comma on both occasions, but full-text cares about words, not interpunctuation, so it finds the sequence anyway. This can be useful to find where a table is updated; in this case you would search for UPDATE tablename.

You can also use OR:

SELECT O.schema_, O.name, O.type, L.linenum, L.linetext
FROM   Objects O
JOIN   Lines L ON O.object_id = L.object_id
WHERE  CONTAINS(L.linetext, '"colone" OR "coltwo"')
ORDER  BY O.name, L.linenum

Note that the OR should not be included in double quotes; they apply to the search terms only.

You can use * as a wild-card, but only at the end of the search-term. This finds both tabletwo and tablethree:

SELECT O.schema_, O.name, O.type, L.linenum, L.linetext
FROM   Objects O
JOIN   Lines L ON O.object_id = L.object_id
WHERE  CONTAINS(L.linetext, '"tablet*"')
ORDER  BY O.name, L.linenum

To learn more about the possibilities, look at the topic for CONTAINS. I should add that for my own part, I very rarely search for anything else but a single search term. (Typically, a table name or the name of a stored procedure.)

Here is one thing you should be aware of. You might expect this query to return two rows, since FROM appears twice in procone:

SELECT O.schema_, O.name, O.type, L.linenum, L.linetext
FROM   Objects O
JOIN   Lines L ON O.object_id = L.object_id
WHERE  CONTAINS(L.linetext, 'FROM')
ORDER  BY O.name, L.linenum

But it returns no rows at all. This is because FROM is defined as a stopword. (Also known as a noise word.) Stopwords are common words like articles, prepositions, etc that would take up a lot of space in the index, but be of little use, since they would return so many hits. (Recall that full-text is designed for searching written text, not programming code.) As long as the names of your tables and other objects are distinctive enough, this should be much of an issue.

Note: it is possible to manipulate the stoplists. I have never engaged in this myself, but if you have the need, start with the topic Configure and Manage Stopwords and Stoplists for Full-Text Search.

After some time, your SearchCode database may be inaccurate with regards to the contents in your database. The easiest way is to re-run the script. That is, drop the database and start over.

Let's now look at the properties of this solution. The main advantage is that since full-text extracts words, it does not matter where a name appears. That is, references are never hidden because of dynamic SQL (as long as you don't construct the names themselves dynamically!) nor are they hidden due to deferred name resolution of temp tables.

But there are also issues:

In the end, SearchCode is a not replacement for the built-in capabilities in SQL Server, but rather a supplement.

Note: you may find that running the script takes about ten seconds. What takes time is loading of the Numbers table, which I have not bother to optimize.

VCDBLOAD, Taking it One Step Further

I also have a more versatile solution, VCDBLOAD, which is part of my AbaPerls toolset. VCDBLOAD does not read the code from SQL Server, but from version-control – TFS or Visual SourceSafe, no support for Git, Subversion etc. It does not stop at reading the SQL code, but it reads all code it finds. This is a great asset to find references to all sorts of things.

Like SearchCode, VCDBLOAD stores the code line-by-line, but VCDBLOAD has an extra feature: it stores the code in two columns: one without comments (for languages it knows about) and one with comments, so you can search the code without getting false hits in the comments. (But both columns are full-text indexed, so if you want to search comments too, you can do that.)

AbaPerls is not exactly easy to get started with, but you can use VCDBLOAD on its own and ignore the rest.

AbaPerls also includes a tool DBBUILD to build an empty database from a source control. When AbaPerls loads a stored procedure, it first extracts all temp-table definitions in the procedure and creates them prior to creating the procedure. This lures SQL Server to perform full binding of these queries, and the main objective with the exercise is to get SQL Server to report errors in such queries early. A side-effect is that sys.sql_dependencies is populated with complete information for all such queries, and thus becomes fully reliable for all but dynamic SQL. I should immediately warn you that to use DBBUILD you need to more or less embrace AbaPerls in total, so it is not likely that you would use it only to get complete dependency information. But at least you can understand why I stick to sys.sql_dependencies. It is not possible to get that complete information with sys.dm_sql_referenced_entities.

Revisions

2018-12-01
First version.

Back to my home page.