Finding References in Your Database
An SQL text by Erland Sommarskog, SQL Server MVP. Last revision: 2022-09-18.
Copyright applies to this text. See here for font conventions used in this article.
It is common to see people in SQL Server forums asking how to find references to a certain table or column, or what tables and columns that are accessed by a stored procedure, or some variation of these questions. In this article, I will explore the available options, including some utilities that I have written myself over the years. None of the alternatives I present are good for all situations, but you will often have to work with more than one of them.
In summary, here are the options we will look at:
This article is applicable to all SQL Server versions from SQL 2008 and up. We will work with a very simple nonsensical demo database:
USE tempdb go IF db_id('dependdemo') IS NOT NULL DROP DATABASE dependdemo CREATE DATABASE dependdemo go USE dependdemo go CREATE TABLE tableone (colone int NOT NULL, coltwo int NULL, colthree int NULL) CREATE TABLE tabletwo (colone int NOT NULL, coltwo int NULL, colthree int NULL) CREATE TABLE tablethree (colone int NOT NULL, coltwo int NULL, colthree int NULL) go CREATE VIEW aview AS SELECT colone + colthree AS col31 FROM tableone go CREATE PROCEDURE firstproc AS CREATE TABLE #temp (a int NOT NULL) UPDATE tableone SET colone = 14 WHERE coltwo = 19 SELECT two.colone, two.coltwo FROM tabletwo two JOIN #temp t ON two.colone = t.a EXEC('SELECT colone, coltwo FROM tablethree') go CREATE PROCEDURE otherproc AS SELECT * FROM aview EXEC dbo.firstproc go EXEC otherproc
SQL Server offers one stored procedure, five catalog views and DMVs to track dependencies:
sp_depends sys.sysdepends |
The only option up to SQL 2000. Now obsolete, and I will not discuss them further. |
sys.sql_dependencies | Introduced in SQL 2005, but now officially deprecated. |
sys.sql_expression_dependencies sys.dm_sql_referenced_entities sys.dm_sql_referencing_entities |
Introduced in SQL 2008 and the official recommendation from Microsoft. |
While sys.sql_dependencies is deprecated, I still find it worthwhile to explore what it can offer. The query below gives you all dependencies in the database. In practice, you would add a WHERE clause to filter for the table, column or stored procedure you are interested in for the moment. (And this is true for most queries we will look at in this article.)
SELECT o.name AS procname, o2.name AS tbl, c.name AS col, d.* FROM sys.sql_dependencies d JOIN sys.objects o ON d.object_id = o.object_id JOIN sys.objects o2 ON d.referenced_major_id = o2.object_id LEFT JOIN sys.columns c ON c.object_id = o2.object_id AND c.column_id = d.referenced_minor_id
Here is the result set. For space reasons, I have split up the result set into two sections and left out one column class_desc, for which the value is always OBJECT_OR_COLUMN_REFERENCE_NON_SCHEMA_BOUND.
procname tbl col class object_id column_id
--------- --------- -------- ----- --------- ---------
aview tableone colone 0 949578421 0
aview tableone colthree 0 949578421 0
firstproc tableone colone 0 965578478 0
firstproc tableone coltwo 0 965578478 0
otherproc aview col31 0 981578535 0
otherproc firstproc NULL 0 981578535 0
referenced_major_id referenced_minor_id is_selected is_updated is_select_all
------------------- ------------------- ----------- ---------- -------------
901578250 1 1 0 0
901578250 3 1 0 0
901578250 1 0 1 0
901578250 2 1 0 0
949578421 1 0 0 1
965578478 0 0 0 0
You can see that the dependency from the view aview to tableone is fully recorded. Next you can see that firstproc refers to two of the columns in tableone, and thanks to the column is_updated you can see that the column colone is updated by the procedure.
However, the other two tables are not listed. The reason that the dependency on tabletwo is missing is the temp table. When SQL Server creates the procedure, the temp table does not exist, and therefore SQL Server abandons analysis of the query with #temp and tabletwo. The unfortunate consequence of this is that the dependency to tabletwo is not recorded. It is not really surprising that the reference to tablethree is missing, since it is hidden in a string literal which is executed through dynamic SQL. Technically, the dynamic SQL is not part of the procedure, but it is a nameless stored procedure on its own. Nevertheless, from a source-code maintenance perspective, we are likely to count firstproc as referring to tablethree, and in an ideal world, we would like to see this reference listed.
Note: the limitation with temp tables does not apply to table variables, as they are declared entities. If you replace #temp above with a table variable, you will find that the dependency on tabletwo is fully recorded.
Finally, you can see the references from otherproc to aview and firstproc. Note that for the reference to aview, the column is_select_all is 1 which reflects that the column reference is by SELECT *
.
There are a few more limitations with sys.sql_dependencies. Cross-database and cross-server dependencies are not recorded at all, so you cannot tell if a procedure is referencing things outside the database. Another issue is illustrated if you run this batch:
DROP TABLE tableone CREATE TABLE tableone (colone int NOT NULL, coltwo int NULL, colthree int NULL)
and then run the query above again. You will find that the result set consists of two rows only: the references from otherproc to aview and firstproc. Since sys.sql_dependencies records references by object id, all references were lost when tableone was dropped, and they were not magically restored when the table was recreated. To restore the information, you would also need to recreate firstproc and aview or run sp_refreshsqlmodule and sp_refreshview on them.
It was to address these shortcomings that Microsoft introduced sys.sql_expression_dependencies and the two DMVs in SQL 2008. Let's take a look of what they return, starting with sys.sql_expression_dependencies:
SELECT o.name AS procname, ed.* FROM sys.sql_expression_dependencies ed JOIN sys.objects o ON ed.referencing_id = o.object_id
This query returns quite a few more columns than the previous one. As I am not going to cover all of them, I have left out several columns:
procname referencing_id referencing_class_desc referenced_class_desc
--------- -------------- ---------------------- ---------------------
aview 949578421 OBJECT_OR_COLUMN OBJECT_OR_COLUMN
firstproc 965578478 OBJECT_OR_COLUMN OBJECT_OR_COLUMN
firstproc 965578478 OBJECT_OR_COLUMN OBJECT_OR_COLUMN
otherproc 981578535 OBJECT_OR_COLUMN OBJECT_OR_COLUMN
otherproc 981578535 OBJECT_OR_COLUMN OBJECT_OR_COLUMN
referenced_server_name referenced_database_name referenced_schema_name
---------------------- ------------------------ ----------------------
NULL NULL NULL
NULL NULL NULL
NULL NULL NULL
NULL NULL NULL
NULL NULL dbo
referenced_entity_name referenced_id referenced_minor_id
---------------------- ------------- -------------------
tableone 997578592 0
tableone 997578592 0
tabletwo 917578307 0
aview 949578421 0
firstproc 965578478 0
This view records dependencies by name, so therefore it does not matter that we dropped and recreated tableone. The dependency is still recorded. You may also note that the dependency from firstproc to tabletwo is included, so the temp table is not a limitation here. The reference to tablethree is still missing, but that is more or less to be expected.
Pay attention to the columns referenced_server_name and referenced_database_name. They are all NULL here, since we do not have any cross-server or cross-database references, but this view is perfectly capable or recording them. You may also note that the column referenced_schema_name is only populated for the reference from otherproc to firstproc – the only place in the database creation script where the dbo schema is specified explicitly.
An interesting property of this view is that it can help you to find orphaned references. Test dropping tableone again and run the query above before creating the table again. The reference is still there, but the column referenced_id is now NULL, indicating that this is a reference to a non-existing object.
In contrast to sys.sql_dependencies, there is no information about column references here, but we only see references on view/table level. (There is a referenced_minor_id in this view, but it appears to be non-zero only for references like CHECK constraints and indexes.)
To find column references, you need to use the DMV sys.dm_sql_referenced_entities, which accepts a schema-qualified name as input and gives you a list of objects that the input references. Here is a query that runs the DMV for all objects in the database. (Make sure that you have recreated tableone before you run this query):
SELECT s.name + '.' + o.name AS referrer, re.* FROM sys.objects o JOIN sys.schemas s ON o.schema_id = s.schema_id CROSS APPLY sys.dm_sql_referenced_entities(s.name + '.' + o.name, 'OBJECT') re
I have removed some columns from the output, including referenced_server_name and referenced_database_name, since they are all NULL in this example, but this DMV is good for finding cross-server and cross-database references.
referrer referenced_schema_name referenced_entity_name referenced_minor_name
-------------- ----------------------- --------------------------------------------
dbo.aview NULL tableone NULL
dbo.aview NULL tableone colone
dbo.aview NULL tableone colthree
dbo.firstproc NULL tabletwo NULL
dbo.firstproc NULL tableone NULL
dbo.firstproc NULL tableone colone
dbo.firstproc NULL tableone coltwo
dbo.otherproc dbo firstproc NULL
dbo.otherproc NULL aview NULL
dbo.otherproc NULL aview col31
referenced_id is_caller_dependent is_ambiguous is_selected is_updated
------------- ------------------- ------------ ----------- ----------
901578250 0 0 1 0
901578250 0 0 1 0
901578250 0 0 1 0
917578307 0 0 0 0
901578250 0 0 0 1
901578250 0 0 0 1
901578250 0 0 1 0
965578478 0 0 0 0
949578421 0 0 0 0
949578421 0 0 0 0
is_select_all is_all_columns_found is_insert_all is_incomplete
------------- -------------------- ------------- -------------
0 1 0 0
0 1 0 0
0 1 0 0
0 0 0 0
0 1 0 0
0 1 0 0
0 1 0 0
0 0 0 0
1 1 0 0
1 1 0 0
If you look at the lines for aview, you see that there is first a row for the reference to tableone as such, and then there are two more rows for the two columns in tableone included in the view. The same is true for the reference from firstproc to tableone and the reference from otherproc to aview. However, for the reference from firstproc to tabletwo, we only have the reference to the table itself, but the references to the columns are not included. Just like with sys.sql_dependencies, this is due to the temp table and deferred name resolution. So while the new features in SQL 2008 resolved some of the limitations with the older view, this issue still remains. As does the limitation with dynamic SQL – tablethree is still not listed.
There are quite a few columns that give us information about the nature of the reference. I will only discuss some of them. The triplet is_selected, is_updated and is_select_all are the same as in sys.sql_dependencies. You also get is_insert_all which is 1 if the table appears in an INSERT statement without a column list. (Something which is generally considered bad practice.) The column is_all_columns_found relates to the problem with the temp table. That is, SQL Server will set this column to 0 when a table appears in a query with a temp table (or some other missing table), preventing full analysis of that query. For instance, you can see that this column is 0 for the reference from firstproc to tabletwo. (It is also zero for the reference between the two stored procedures, since there are no columns at all in this case.)
Note: not all columns above are present in all versions of SQL Server. On SQL 2008, the last column to appear in the result set is is_ambiguous. The next four – is_selected, is_updated, is_select_all and is_all_columns_found – were added in SQL 2012. is_insert_all was added with SQL 2016. Books Online says that is_incomplete was added in SQL 2016 SP2, but I also see it on SQL 2014 SP3, so it appears to have been backported.
sys.dm_sql_referenced_entities does not rely on persisted information, but instead it runs binding on the object you pass to it. If you only want to know what objects a certain stored procedure refers to that is alright. But if you want to use this DMV to find all references to a certain table column, you need to invoke it for all modules in the database as in the query above and then filter for the column. If you have several thousands of stored procedures, this can take quite some time. You may also get error messages, because the DMV may fail for some modules. This fact combined with its failure to return complete information for queries with temp tables makes sys.dm_sql_referenced_entities less attractive to me.
The other DMV, sys.dm_sql_referencing_entities, gives you all reference to an object. Here is a query:
SELECT s.name, o.name, re.* FROM sys.objects o JOIN sys.schemas s ON o.schema_id = s.schema_id CROSS APPLY sys.dm_sql_referencing_entities(s.name + '.' + o.name, 'OBJECT') re
This is the full result set:
referred_to referencing_schema_name referencing_entity_name referencing_id
------------- ----------------------- ----------------------- --------------
dbo.tableone dbo aview 949578421
dbo.tableone dbo firstproc 965578478
dbo.tabletwo dbo firstproc 965578478
dbo.aview dbo otherproc 981578535
dbo.firstproc dbo otherproc 981578535
referencing_class referencing_class_desc is_caller_dependent
----------------- ---------------------- -------------------
1 OBJECT_OR_COLUMN 0
1 OBJECT_OR_COLUMN 0
1 OBJECT_OR_COLUMN 0
1 OBJECT_OR_COLUMN 0
1 OBJECT_OR_COLUMN 0
I will have to admit that I have not been able to figure out if this DMV provides any information which is not present is sys.sql_expression_dependencies.
Rather than querying these views directly, you can use View Dependencies from the context menu of an object in Object Explorer in SSMS. This will result in queries against the views discussed in this chapter.
An alternative to use the built-in views is to search the procedure definitions in sys.sql_modules directly with LIKE queries. This can be a tad slow if you have thousands of stored procedures, and it can be difficult to make the searches precise. A more sophisticated approach is to build a full-text index on the code, which makes these searches really quick.
Full-text is an optional component when you install SQL Server, so it may not be present on your instance. Run this query to find out:
SELECT serverproperty('IsFullTextInstalled')
If it returns 0, full-text is missing and you need to run SQL Server Setup to add it. Note that if you are using Express Edition, you need to have Express with Advanced Features to install full-text.
The full script for this chapter is in file SearchCode.sql. The script creates a database SearchCode and in this database the script creates two tables Objects and Lines. Objects holds the name and type of the object (stored procedure, trigger etc), while Lines holds the SQL code split up into lines. The script loads the two tables with the SQL code in the dependdemo database and then creates a full-text index on the Lines table. The file also includes two sample queries. If you want to use full-text to search your database, you can use the script as-is. You only have to replace dependdemo with the name of your database. Below I discuss the script in a little more detail.
The script starts off with creating two auxiliary objects: a Numbers table and a table-valued function tblnum_split_mstmt, that I don't discuss further here. If you are curious, you can read about them in my article Arrays and Lists in SQL Server. These are the two tables that actually hold the data:
CREATE TABLE Objects (object_id int NOT NULL, schema_ sysname NOT NULL, name sysname NOT NULL, type char(2) NOT NULL, CONSTRAINT pk_Objects PRIMARY KEY (object_id), CONSTRAINT u_Objects UNIQUE (schema_, name) ) go CREATE TABLE Lines (ident int IDENTITY NOT NULL, object_id int NOT NULL, linenum int NOT NULL, linetext nvarchar(MAX) NOT NULL, CONSTRAINT pk_Lines PRIMARY KEY (object_id, linenum), CONSTRAINT u_Lines UNIQUE(ident), CONSTRAINT fk_Lines_Objects FOREIGN KEY(object_id) REFERENCES Objects(object_id) )
The object_id is the object id in the source database, would you need it. schema_ has an underscore since SCHEMA is a reserved keyword in T‑SQL (and I don't like typing brackets). type is the same as in sys.objects. The purpose of the ident column in Lines will appear later. linenum is, yes, the line number, and linetext is the actual code on that line.
The Objects table is loaded this way:
INSERT Objects (object_id, schema_, name, type) SELECT o.object_id, s.name, o.name, o.type FROM dependdemo.sys.objects o JOIN dependdemo.sys.schemas s ON o.schema_id = s.schema_id WHERE EXISTS (SELECT * FROM dependdemo.sys.sql_modules sm WHERE o.object_id = sm.object_id)
Keep in mind that when you load SearchCode for your own database, you should replace dependdemo with name of your own database. The same applies to the statement that loads Lines:
INSERT Lines (object_id, linenum, linetext) SELECT sm.object_id, ism.Pos, ism.Value FROM dependdemo.sys.sql_modules sm CROSS APPLY tblnum_split_mstmt(replace(sm.definition COLLATE Latin1_General_BIN2, char(13), ''), char(10)) ism WHERE EXISTS (SELECT * FROM dependdemo.sys.objects o WHERE sm.object_id = o.object_id)
tblnum_split_mstmt splits the input into rows on the given separator which is char(10), i.e., line-feed. Commonly, lines in the Windows world are terminated by CR-LF, but I also want to support code that was created from Unix or elsewhere and which only has LF as the line terminator. Whence, I replace carriage return with nothing at all. The reason I force a binary collation is performance – on a larger database, this makes a considerable difference. You may be puzzled by the EXISTS clause, but sys.sql_modules can include code for modules that are not in sys.objects. The one such example I know of is DDL triggers. Obviously, I could have extended the INSERT into Objects to include DDL triggers, but I decided to leave that as an exercise for the reader. Also, a DDL trigger may be there for administrative reasons and not really be part of the application, so it may not be of interest anyway.
Note: You may be wondering why I am not using the built-in function string_split to split up the code in lines. There are two reasons for this: 1) I wanted to provide code that also runs on SQL 2014 and earlier, where string_split is not available. 2) On SQL 2019 and earlier, string_split is not able to return the position of the element in string, which I need to feed the linenum column.
Finally, we set up the full-text index:
CREATE FULLTEXT CATALOG fulltext WITH ACCENT_SENSITIVITY = ON AS DEFAULT CREATE FULLTEXT INDEX ON Lines(linetext LANGUAGE 0x0) KEY INDEX u_Lines
All full-text indexes must belong to a full-text catalog, so we need to start by creating one, and since we mark it as default, we don't have to actually mention it when we create the index. There are three things we need to specify with the index, though:
Unlike a regular index, a full-text index is populated asynchronously, so it can take some time before you can use the index. For the small dependdemo database it is only a matter of seconds but for a database with several thousands of stored procedures it may take up to an hour. To see whether the index build has completed, you can use the query below. Once it returns 1, the index has been fully populated.
SELECT has_crawl_completed FROM sys.fulltext_indexes
So how do you use this? Here is a sample query:
SELECT O.schema_, O.name, O.type, L.linenum, L.linetext FROM Objects O JOIN Lines L ON O.object_id = L.object_id WHERE CONTAINS(L.linetext, 'tablethree') ORDER BY O.name, L.linenum
This is the result set:
schema_ name type linenum linetext
-------- ---------- ---- ------- ---------------------------------------------
dbo firstproc P 11 EXEC('SELECT colone, coltwo FROM tablethree')
This is the reference that none of the built-in views were able to find, because it was hidden in dynamic SQL.
CONTAINS is a predicate that is special for full-text. Here the search-term was a single word. But you can also search on a sequence:
SELECT O.schema_, O.name, O.type, L.linenum, L.linetext FROM Objects O JOIN Lines L ON O.object_id = L.object_id WHERE CONTAINS(L.linetext, '"colone coltwo"') ORDER BY O.name, L.linenum
This returns the same output as above. Observe that the search term has to be enclosed in double-quotes when there is more than one word in it. Note that in the code above, there is actually a comma between the two column names, but the search string does not include the comma. Full-text cares about words, not interpunctuation, so it finds the sequence anyway. One situation where it can be useful to search for sequences is to find where a table is updated; in this case you would search for UPDATE tablename.
You can also use OR:
SELECT O.schema_, O.name, O.type, L.linenum, L.linetext FROM Objects O JOIN Lines L ON O.object_id = L.object_id WHERE CONTAINS(L.linetext, '"colone" OR "coltwo"') ORDER BY O.name, L.linenum
Note that the OR should not be included in double quotes; they apply to the search terms only.
You can use * as a wild-card, but only at the end of the search-term. This finds both tabletwo and tablethree:
SELECT O.schema_, O.name, O.type, L.linenum, L.linetext FROM Objects O JOIN Lines L ON O.object_id = L.object_id WHERE CONTAINS(L.linetext, '"tablet*"') ORDER BY O.name, L.linenum
To learn more about the possibilities, look at the topic for CONTAINS in Books Online. I should add that for my own part, I very rarely find the need to search for anything else but a single search term. (Typically, the name of a table, column or stored procedure.)
Full-text queries are always case-insensitive. If you have a database with a case-sensitive collation, and you only want to find hits on customer, but not Customer, you can add a traditional SQL condition to the query, for instance:
AND L.linetext LIKE '%customer%'Here is one thing you should be aware of. You might expect this query to return several rows:
SELECT O.schema_, O.name, O.type, L.linenum, L.linetext FROM Objects O JOIN Lines L ON O.object_id = L.object_id WHERE CONTAINS(L.linetext, 'FROM') ORDER BY O.name, L.linenum
But it returns no rows at all. This is because FROM is defined as a stopword. (Also known as a noise word.) Stopwords are common words like conjunctions, prepositions, etc that would take up a lot of space in the index, but be of little benefit, since they would return so many hits. (Recall that full-text is designed for searching human-written text, not programming code.) Thus, would you have a table or a column of which the name is considered to be a stopword by full-text you would not be able to find it this way. You can review the stopwords for language 0 with this query:
SELECT *FROM sys.fulltext_system_stopwords WHERE language_id = 0
Note: it is possible to manipulate the stoplists. I have never engaged in this myself, but if you have the need, start with the topic Configure and Manage Stopwords and Stoplists for Full-Text Search.
After some time, your SearchCode database may be inaccurate with regards to the contents in your database. The easiest way is to re-run the script. That is, drop the database and start over.
Let's now look at the properties of this solution. The main advantage is that since full-text extracts words, it does not matter where a name appears. That is, references are never hidden because of dynamic SQL (as long as you don't construct the names themselves dynamically!) nor are they hidden due to deferred name resolution with temp tables.
But there are also drawbacks:
SELECT *
or INSERT without a column list.In the end, full-text is not replacement for the built-in capabilities in SQL Server, but rather a supplement.
Note: you may find that running the script takes about ten seconds. What takes time is the loading of the Numbers table, which I have not bothered to optimise. When you load SearchCode for a real production database, it will take a lot more time to load Objects and Lines anyway.
This is an approach that Paul McMillan made me aware of. When using the plan cache to find references, you don't have to worry about references being hidden by temp tables or in dynamic SQL. The plan cache has it all. And in difference to using full-text, there are no problems with finding references to Customers.Status, since the plan cache has the exact references.
But that doesn't mean that this method is not without disadvantages. First of all, querying the plan cache is expensive, and querying the plan cache on a busy production server will add even more load. And you more or less need to run your queries on production, to make sure that you get as complete information as possible. Because you will only find references from stored procedures and other queries that actually are in the cache. There are several reasons why a stored procedure may not be in the plan cache:
But if we overlook the last point, there is a flip side of this as well: Say that through the previous methods you have found that table_that_smells_funny is referenced by dusty_old_procedure, and you have a feeling that the procedure may not be in use. By querying the plan cache, you can get more information on whether you can drop both of them. (But keep in mind that absence from the plan cache never proves anything, but only gives an indication.)
There is one more advantage with the plan cache over the other methods we have looked at so far: it will also give you references from ad-hoc queries from application code. Although, you cannot tell directly if a reference is from client code, or it was just someone who was curious and queried the table from SSMS. A little more on this later.
So how do you query the plan cache? Here is a query that I developed from a post on StackOverflow by Stephen Turner that Paul McMillan was kind to point me to. Again, keep in mind that it can take quite some time to run and add extra load to the server. Be careful!
WITH XMLNAMESPACES ( 'http://schemas.microsoft.com/sqlserver/2004/07/showplan' as ns1 ), CompiledPlan AS ( SELECT qp.query_plan, qp.dbid AS RefDbId, qp.objectid AS RefObjectId FROM sys.dm_exec_cached_plans cp CROSS APPLY sys.dm_exec_plan_attributes(cp.plan_handle) pa CROSS APPLY sys.dm_exec_query_plan(cp.plan_handle) qp WHERE pa.attribute = 'dbid' AND pa.value = convert(sql_variant, db_id()) ), ColumnReferences AS ( SELECT DISTINCT cp.RefDbId, cp.RefObjectId, p.value('@Database', 'sysname') AS TargetDatabase, p.value('@Schema', 'sysname') AS TargetSchema, p.value('@Table', 'sysname') AS TargetTable, p.value('@Column', 'sysname') AS TargetColumn FROM CompiledPlan cp CROSS APPLY query_plan.nodes('//ns1:ColumnReference') t(p) ) SELECT db_name(RefDbId) AS RefDB, object_schema_name(RefObjectId, RefDbId) AS RefSchema, object_name(RefObjectId, RefDbId) AS RefModule, TargetDatabase, TargetSchema, TargetTable, TargetColumn FROM ColumnReferences WHERE TargetDatabase = quotename(db_name()) AND TargetSchema <> '[sys]'
If you run this query in the dependdemo database, you will get an output like this:
RefDB RefSchema RefModule TargetDatabase TargetSchema TargetTable TargetColumn
----------- --------- --------- -------------- ------------ ------------ ------------
dependdemo dbo otherproc [dependdemo] [dbo] [tableone] colthree
dependdemo NULL NULL [dependdemo] [dbo] [tablethree] coltwo
dependdemo dbo firstproc [dependdemo] [dbo] [tabletwo] coltwo
dependdemo NULL NULL [dependdemo] [dbo] [tablethree] colone
dependdemo dbo otherproc [dependdemo] [dbo] [tableone] colone
dependdemo dbo firstproc [dependdemo] [dbo] [tableone] colone
dependdemo dbo firstproc [dependdemo] [dbo] [tabletwo] colone
dependdemo dbo firstproc [dependdemo] [dbo] [tableone] coltwo
That is, all references are returned, and none are missing due to temp tables or dynamic SQL. Well, there is a possibility that the two references to tablethree with RefModule NULL are missing when you run the query. This would happen if you have only executed otherproc once, and the configuration option optimize for ad hoc workloads is set to 1 (which is generally seen as best practice). With this setting, the plan for an unparameterised ad-hoc query, including dynamic SQL, is not put in the cache until it is executed a second time. If this happens to you, just run otherproc again. (While this may at first seem like one more limitation with using the plan cache for finding references, it can also be construed as a feature. You may not care for such singleton references.)
There is one thing to observe: the references from otherproc are listed against tableone and not aview. A view in SQL Server is just a macro which is replaced with its definition before optimisation, so you will generally not see view references with this query. (The one exception is if the optimizer has matched a query against an indexed view.) Also, note that since there are no plans for calls to stored procedures, we do not see the dependency from otherproc to firstproc here.
Let's take a closer look at the query. WITH XMLNAMESPACES is needed to be able get data out of the XML for the query plan. The first CTE, CompiledPlan, starts with sys.dm_exec_cached_plans which has metadata information about all plans in the plan cache. The plan itself is extracted from sys.dm_exec_query_plan. The query also calls sys.dm_exec_plan_attribute to retrieve the dbid, so we can restrict the query to plans that originate from the current database. Now, you may ask yourself two things:
The answer to both is mainly about performance. As I said, retrieving plans from the cache is expensive, so if we know that there are no references from other databases, it is a good idea to restrict the query to the current database. But if we would use the dbid from sys.dm_exec_query_plan for that filter, the query would still call that DMV for all plans, and not only those in the current database. To avoid this, we take the database id from the far less expensive sys.dm_exec_plan_attributes, so that sys.dm_exec_query_plan is only called for plans from the current database. When I run the query above in the dependdemo database on my instance, the response is instant. When I comment out the WHERE clause in CompiledPlans to remove the filter on dbid, the query may run for minutes. I should immediately say that this is very much "it depends". My instance is a lab server at home with many small junk databases. On a production server with only one production database, the query is likely to take a long time, with or without filter on dbid.
Note: My testing indicates that the order of the DMVs matters. That is, it is important that sys.dm_exec_plan_attributes comes before sys.dm_exec_query_stats in the query. Presumably, the optimizer has no information about the actual cost of DMVs, so it just takes them in order of appearance.
Beside the query plan, we also collect the id for the originating database and the object id for the referencing module. (The object id will be NULL if the reference is from dynamic SQL or an ad-hoc query.) We take this information to the next CTE, ColumnReferences, where we extract all column references there are in the plan. Since the same column may appear more than once, we apply DISTINCT. In the final query, we filter TargetDatabase on the current database. Without that filter, you may see junk references you don't care for. The query filters out references in the sys schema, of which there can be quite a few, but which are rarely of interest. To get the name of referencing modules, we use the metadata functions object_schema_name and object_name, rather than joining to the catalog views sys.schemas and sys.objects, as this allows us to look up references in any database. (So that we have this flexibility if we remove the database filter in the first CTE.)
As I mentioned above, when you see a reference with RefModule NULL, you cannot tell whether that is dynamic SQL from within SQL Server, an ad-hoc query submitted from an application or just a whimsical query from SSMS. If you want to dig further you can modify the CompiledPlan CTE to add a call to the DMV sys.dm_exec_sql_text which also accepts the plan_handle as parameter and which returns the query text in the text column. When you do this, you should probably apply filters to restrict the query to the tables or columns you are interested in to reduce the size of the output.
Let's finally look at this from the reverse direction. In the original question on Stack Overflow, the poster wanted to know which tables and columns a certain stored procedure touched. The query above serves this purpose well, at least as long as you are not concerned by columns in views being replaced by the underlying table columns. Run the procedure (so that you know that there is a plan for it in the cache) and add a filter on RefModule, in the query above. If you want to see references to tables outside the current database replace the line
WHERE TargetDatabase = quotename(db_name())
with
WHERE TargetDatabase IS NOT NULL
(Without this filter you may get lots of rows with intermediate expressions in query plans such as Expr1001.)
Note that for this purpose, there is no need to run the query on a production server. Since you only want to see the references for this particular procedure, you can (and should!) run the query on a test or dev server. This is even more important if you also want to find indirect references. That is, references through dynamic SQL constructed in the procedure or references made by sub-procedures. In this case, you would clear the plan cache with DBCC FREEPROCCACHE (something you absolutely should not run on a whim on a production server!) before you execute the procedure. Then run the query above to get all references. Again, note that you need to change the filter on TargetDatabase to see references outside the database.
An alternative to using the plan cache on SQL 2016 or later is Query Store. A big advantage of this is that you don't have to run any queries in production, but you can query the data in a copy restored to non-production environment.
Inspired by Stephen Turner's query, I wrote this script:
DROP TABLE IF EXISTS #plans SELECT TRY_CAST (p.query_plan AS xml) AS query_plan, q.object_id INTO #plans FROM sys.query_store_plan p JOIN sys.query_store_query q ON q.query_id = p.query_id WHERE try_cast(p.query_plan AS xml) is not null ; WITH XMLNAMESPACES ( 'http://schemas.microsoft.com/sqlserver/2004/07/showplan' as ns1 ), ColumnReferences AS ( SELECT DISTINCT p.object_id, CR.p.value('@Database', 'sysname') AS [Database], CR.p.value('@Schema', 'sysname') AS [Schema], CR.p.value('@Table', 'sysname') AS [Table], CR.p.value('@Column', 'sysname') AS [Column] FROM #plans p CROSS APPLY p.query_plan.nodes('//ns1:ColumnReference') CR(p) WHERE p.query_plan IS NOT NULL ) SELECT s.name + '.' + o.name AS Module, CR.[Database], CR.[Schema], CR.[Table], CR.[Column] FROM ColumnReferences CR LEFT JOIN (sys.objects o JOIN sys.schemas s ON o.schema_id = s.schema_id) ON CR.object_id = o.object_id WHERE CR.[Database] IS NOT NULL AND CR.[Schema] <> '[sys]'
To test it, you first need to enable Query Store and run otherproc again before you run the query above.
ALTER DATABASE CURRENT SET QUERY_STORE = ON EXEC otherproc
If you are on SQL 2016 or SQL 2017, the output should be the same as for the query against the plan cache. If you are on SQL 2019 or later, you will probably only see the references from firstproc. This is because there were some changes in SQL 2019 to what Query Store captures by default. Rather than capturing exactly everything, it now skips very trivial queries that only run occasionally. But if you execute otherproc repeatedly, you will eventually get eight rows back from the query above on SQL 2019 as well.
The advantage with using Query Store over the plan cache is that the data is persisted and survives a restart of SQL Server. This does not mean that Query Store is perfectly reliable for finding references. It is after all designed for a different purpose: to be an aid for performance tuning. Here are a couple of things to consider:
Performance is also a consideration. You may note that rather than using CTEs all the way, I have introduced an intermediate temp table. In my first version I did not have this temp table, but I found that even for the very small dependdemo database, the query could run for more than ten seconds. If your system has many different stored procedures or ad-hoc queries, there are a lot of plans to get through, much more than in the plan cache, since there can be many plans for the same query. So don't run the query above in production, but restore a backup to an idle test system. Or even better, use DBCC CLONEDATABASE which gives you a database with metadata and Query Store but no sensitive table data and copy that small database to your laptop or wherever.
In this section I will briefly discuss two possibilities that comes with my AbaPerls toolset. AbaPerls is not exactly to get started with, but you can use the first tool that I discuss, VCDBLOAD, without embracing the rest of AbaPerls. This is not exactly true for the second option.
VCDBLOAD gives you the same thing as SearchCode: a full-text indexed database where you can search your source code. But instead of loading the code from sys.sql_modules in the database, VCDBLOAD loads the source code from your version-control system (which must be TFS or SourceSafe, the only two I support). This is not restricted to SQL code, but VCDBLOAD loads all non-binary files it finds to the full-text indexed database. Furthermore, for languages it understands (SQL, C++, C#, Visual Basic, JavaScript), it stores the source code with the comments retained as well as with the comments stripped out, so that you can search without getting false hits in comments. This full-text database can be a very powerful tool to find references in your system, both server-side and client-side, and this is a very important tool at the client where I developed this solution.
The other tool is DBBUILD which builds an empty database from a source control, which must be organised according to a structure mandated by AbaPerls. (VCDBLOAD does not have any such restrictions). When AbaPerls loads a stored procedure, it first extracts all temp-table definitions in the procedure and creates them prior to creating the procedure. This lures SQL Server to perform full binding of these queries. The main objective with the exercise is to get SQL Server to report errors in such queries early. But a side-effect is that sys.sql_dependencies is populated with complete information for all such queries, and thus becomes fully reliable except for dynamic SQL and cross-database and cross-server queries. Note that it is only in this older view where you get the complete information. sys.sql_expression_dependencies does not hold information on column level, and since sys.dm_sql_referenced_entities starts over, it has no benefit of the tricks that AbaPerls played when the procedure was created.
We have now looked at a number of ways to find references in a database. Is this the full story, or are there more options? There are a few more alternatives I can think of, but which I did not find worthwhile to explore further:
As always, I appreciate feedback from my readers on my articles. The feedback could be of any size from pointing out a major omission to pointing out a spelling or grammar error or two. And, of course, if you think that there is something that is unclear and you have any questions directly related to the article, you are more than welcome to contact me. The mail address is esquel@sommarskog.se. (On the other hand, if you have a more general SQL question, I recommend that you ask in a public SQL forum, as you may get help faster that way.) I like to thank Paul McMillan and Daniel Adeniji for their input.