Steve Simon

Taking a deeper dive into XPATH queries

March 20, 2015 by

Introduction

In my last article entitled “Which fields do my reports actually use”, we had a quick look at a practical implementation of an XPATH query to obtain a list of database table fields being utilized by our reports. Reporting Services reports are really no more than XML documents. The fields utilized by our reports reside in Reporting Services datasets.

In that article, the eagle – eyed reader would have noted that only the first field for each dataset was pulled. This was done intentionally and was meant to act as an introduction to utilizing more complex techniques to pull the complete suite of data fields.

Today’s end goal (to pull all the fields) may be seen in the screen shot shown below:

Getting Started

The small snippet of code shown below helps us obtain the names of the “data fields” utilized by our reports that are resident on our report server. It renders the Report Path (where the report is located), the name of the report and the bulk of the data that we need which is found in an image field called “Content” which we convert to XML (see below).

Obtaining the names of the dataset/reports fields is slightly more convoluted. In order to obtain this data we need to descend the following path (see the screen shot below).

/Report/DataSets/DataSet//Fields/Field/DataField

Our task is now to ensure that when we run this query, that all the necessary “report fields” are being successfully extracted. As always, the complete code listing may be found in Addenda 1.

One of the nagging issues that I have always encountered is that we are never really 100% confident of the maximum number of fields that a dataset will contain. In most of the dataset that I have utilized, I have found that the number of sibling fields/nodes rarely exceeds thirty fields per dataset.

We set our root path to Report/DataSets/DataSet

Obtaining the report name, the report path and the contentXML fields are fairly simple. They form the first portion of our “Select” statement (see below).

The code snippet above, shows us the subquery containing these first three fields (i.e. RPT.ReportPath, ReportName, and contextXML) plus two additional fields in blue. The fields in blue are discussed in a few seconds.

This whole subquery may also be altered to ascertain if the report and fields have been utilized in the recent past (see below in bold text).

The complete code listing OF THIS QUERY may be found in Addenda 2.

When it comes to the DataSourceName and the DataSetName, we must JOIN back to the “Catalog” table (in the Report Server database) utilizing the “Content” field and the XPath statements to the DataSourceName and DataSetNameNodes. When working with XML we utilize a “cross APPLY” instead of an inner join(see below).

As mentioned, these cross APPLY’s contain the path to the necessary leaf nodes where we shall be able to find the date source name and the name of the dataset (above and below).

Obtaining the remaining fields

As discussed above, knowing the maximum number of fields that the largest of our report datasets contains, is always a conundrum. The ones that I utilize are normally no more than 30 fields, the reason being that most of my reports contain matrices that show data for twelve months (both actuals and goals). This clearly accounts for twenty four fields. The other six being descriptive.

Working from the XPATH expression “R” (shown immediately above), we are now in a position to pull all thirty plus fields within any dataset (see below).

ALL the way through to Field31

The observant reader will call “foul” as not all datasets will have thirty plus fields.

This is true, however those fields that do not exist will have “NULL” for the corresponding “Field” node value (see below).

…and

Obtaining our end list of report fields

Having obtained the name of the report, the data source name, the dataset name and a whole wad of other fields, we must now consolidate all of this into four distinct fields, “Reportname”, “DataSourceName”,”DataSetName” and “Fields1”. “Fields1” will contain the names of the report fields within each report (e.g. Last Name, First Name, City, State, Country etc.).

We achieve this by creating the following query.

Once loaded, we still have one last task to do and that is to remove those records that have null values (see below).

Once again, these values are NULL as there were no corresponding fields in the ‘SalesPerson” dataset (as an example) for perhaps “Field21” and “Field22”. The observant reader will note that at this stage the concept of “Fields2” through “Field31” no longer exist due to the “union”. The important point being that these records with NULL (as a value) must be removed from the recordset.

The said, we now issue the following T-SQL statement:

Caveat Emptor

As with any process, we shall find the normal “gotcha’s”.

When I first attempted this exercise, my hair turned grey as 95% of the XML data would come out and the other 5% was NULL when it should NOT have been. The reason for this is related to the namespaces utilized by the XML.

Indications are that some of the XML data utilized the 2010/01 name spaces and others 2008/01 (see below).

The screen shot above shows the “Content” field within the Reporting Services Database “Catalog” table, which requires the usage of the 2008 names spaces.

You have been warned!

Conclusions

When contemplating any database migration, one often tries to perform a bit of housekeeping prior to proceeding with the migration. In a recent migration to the cloud, the client wanted to have an inventory of the datasets and fields that were utilized by their reports and to remove unnecessary reports or unused reports (for that matter). In reality this is a double edged sword. In one case one tends to want to look at recent execution of reports as a guide line to what must be maintained, retained and moved. On the other hand, some reports tend to be seasonal and most enterprises truncate report server logs on a periodic basis; thus if one does not pay attention to this fact, potentially useful and important reports tend to fall through the cracks.

This said, Addenda 1 contains the code to be executed regardless of the last runs of the report.

Addenda 2 (on the other hand) is execution based and will render data based upon execution times from the log table.

So we come to the end of another “get together”. As always, should you have any questions or concerns, please feel free to contact me.

In the interim, happy programming!

See more

For fault tolerant auditing with centralized storage and reporting, consider ApexSQL Audit, an enterprise level SQL Server auditing and compliance tool.

Addenda 1

Addenda 2

Code to report fields based upon recent usage


Steve Simon

Steve Simon is a SQL Server MVP and a senior BI Development Engineer with Atrion Networking. He has been involved with database design and analysis for over 29 years.

Steve has presented papers at 8 PASS Summits and one at PASS Europe 2009 and 2010. He has recently presented a Master Data Services presentation at the PASS Amsterdam Rally.

Steve has presented 5 papers at the Information Builders' Summits. He is a PASS regional mentor.

View all posts by Steve Simon
Business Intelligence, SQL Server auditing

About Steve Simon

Steve Simon is a SQL Server MVP and a senior BI Development Engineer with Atrion Networking. He has been involved with database design and analysis for over 29 years. Steve has presented papers at 8 PASS Summits and one at PASS Europe 2009 and 2010. He has recently presented a Master Data Services presentation at the PASS Amsterdam Rally. Steve has presented 5 papers at the Information Builders' Summits. He is a PASS regional mentor. View all posts by Steve Simon

15 Views