Steve Simon

Which Reporting Services dataset fields are being utilized by the reports?

March 18, 2015 by

Introduction

Have you ever felt like pulling your hair out, trying to ascertain exactly which fields in your existing Reporting Services datasets are being utilized by your reports. This happened to me recently during a corporate conversion and cleanup exercise for a database migration to the cloud.

The “aha moment came after having presented a paper at the PASS SQL Server Nordic Rally (March 2015), when one attendee came up to me and asked if I knew of a method to do this. As they say ‘necessity is the mother of invention’ and spiking my interest, I played around until I came up with the solution that we are going to chat about today. The end solution may be seen below

Let’s get started.

Getting Started

Management at SQL Shack Cars would like to know which fields in their database tables are NOT actively be utilized. The reason for this is that they wish to migrate the database to the cloud maintaining only those fields that are being used, in order to reduce the cost of pulling data from the cloud to their local data warehouse.

All the data that we are looking for resides within the Reporting Services database. The challenge that we have is that the data that we are looking for resides within the XML code of the reports themselves. This said, in order to isolate and extract the required data we have to integrate lines of code which are XML oriented.

As we shall be working with XML, we need to ensure that the necessary namespaces are present. The code below will ensure that our query has the necessary access to the required ‘libraries’. We shall be utilizing a Common Table Expression (CTE).

With this knowledge, we begin with our inner most or ‘core query’

Our ‘core query’ runs against the catalog table within the Reporting Services database.

We note that the query renders the path of the report (i.e. where the report resides on disc and the name of the report. We note also that there is a third field (contentXML) which contains the data for which we are looking.

By double clicking on the XML value in the first row, the XML is exploded and shows the structure of the data (see below).

The trick is now to parse the tree and extract what we require. In order to understand what is transpiring, let us step through the lines of code to obtain a better understanding.

We first issue our select statement calling for the Report Path which was extracted directly from the catalog table.

We must now locate the Data Source Name(s) and Data Set Name(s).

We select the “Query/DataSourceName” branch off of the XML root choosing the first XML descendant on this branch. “Query/DataSourceName” subs off from “/Report/DataSets/DataSet” as

may be seen below. Thus the whole chain thus far is

|      Root      |      Child      |

  “/Report/DataSets/DataSet/ Query/DataSourceName”.

The code for this may be seen below:

Next we wish to view the Query Command Text.

The astute reader will note that the command text falls under the <Query> element and the code to extract this field may be seen below:

This coded we are now in a position to extract the dataset field names. They are located on a different branch of the XML tree. The fields that we have identified thus far have been prefixed with an “R”. More on this when we discuss the CROSS APPLY in a few minutes. The CROSS APPLY links our current branch of the tree back from its “Dataset” to the root called “Report”. As we shall be discussing two distinctly different branches under ‘/Report/DataSets/DataSet’ we need to prefix them differently.

Note that we prefix this branch with the letter “Z”. The relationship between the two branches and the parent branch may be seen below:

We have now reached the portion of our code where the sub query (that we discussed above) is located. To refresh our memory this is the query that showed us the report path (see below).

As we discussed above, we need some way to link our children to the mother path. This is where the CROSS APPLY comes in (see below). Note that the one branch to the command text has been prefixed “R” and the other (to the field list) “Z”.

The entire query is now executed by the SELECT portion of the CTE (see below).

SELECT DEF.ReportPath ,DEF.DataSourceName ,DEF.DataSetName ,DEF.Fields ,DEF.CommandText FROM DEF

Quo Vadis?

In a prior “get together” we discussed the construction of a SQL Server Monitoring dashboard. At that time we developed one stored procedure that we are now going to reuse. This query tells us which reports have not been used over a given time period. The link to this article may be found below.

/monitoring-sql-server-reporting-services/

This combined with the query that we have just created will give us the answer that we are looking for. Both queries have a common field “Report Path”.

We create an inner join on the output of both queries (each extract has its own temporary table). The code listing may be seen in Addenda 2.

Note the result of the first query which gives us the start and end date of our period (see above). This is there for information only.

The second set, contains the reports that were not run during this time period. The key to this logic lies in the “Catalog.ItemID NOT IN” part of the predicate (see above).

The screen shot shown above shows the reports that were run during the time period 7/1/2014 to 6/30/2015. Note that the report in the “Cars1” directory is not showing. The reason being that that report has not been run on this instance of the reporting server.

Further, when I removed the “NOT” in the query predicate, then the reports that were actually run during the time interval, now become visible (see below).

Conclusions

Thus having found that the only report that had not been run (during the given time period) was the “SQLShackCars” report that resides in the “Car1” directory, we have achieved our goal of determining which reports have not been utilized. Thus any fields unique to these reports may not have to be ported to the database in the cloud.

Further, with those report that were run during the period, we know that their fields need to be ported to the new system. Most of these fields are located in varied tables within the database and should be easily located using the query in Addenda 3. The one gotcha to be aware of, is that some fields from datasets may be calculated fields and as such should not be present in any of the tables under consideration, assuming that we do not have a naming clash.

Thus we have come to the end of another get together. I hope that this exercise is of some use to you and as always, should you have any questions or concern, please feel free to contact me.

In the interim, happy programming!

See more

For fault tolerant auditing with centralized storage and reporting, consider ApexSQL Audit, an enterprise level SQL Server auditing and compliance tool.

Addenda 1

Addenda 2

Addenda 3

The code snippet below will find all tables (within the database) that have a field with that name.


Steve Simon

Steve Simon is a SQL Server MVP and a senior BI Development Engineer with Atrion Networking. He has been involved with database design and analysis for over 29 years.

Steve has presented papers at 8 PASS Summits and one at PASS Europe 2009 and 2010. He has recently presented a Master Data Services presentation at the PASS Amsterdam Rally.

Steve has presented 5 papers at the Information Builders' Summits. He is a PASS regional mentor.

View all posts by Steve Simon
Business Intelligence, SQL Server auditing

About Steve Simon

Steve Simon is a SQL Server MVP and a senior BI Development Engineer with Atrion Networking. He has been involved with database design and analysis for over 29 years. Steve has presented papers at 8 PASS Summits and one at PASS Europe 2009 and 2010. He has recently presented a Master Data Services presentation at the PASS Amsterdam Rally. Steve has presented 5 papers at the Information Builders' Summits. He is a PASS regional mentor. View all posts by Steve Simon

395 Views