Jefferson Elias

How to retrieve information about SSIS packages stored in MSDB Database

February 21, 2017 by

Introduction

Nowadays, most mid-size companies have implemented a Data Warehouse (DWH) solution. This DWH can be designed using a set of tools (or features) from Microsoft SQL Server. One of them is SQL Server Integration Services, also known as SSIS.

Basically, this feature (SSIS) allows us to extract, transform and load data from almost any kind of source to almost any kind of destination : flat file, Excel File, RDBMS. Based on our version of SQL Server, we will use either Visual Studio (starting SQL Server 2012) or SQL Server BI Development Studio (before SQL Server 2012).

Microsoft defines SSIS as a plaftform for building enterprise-level data integration and transformation solutions. We can easily imagine that building a data warehouse is not the only application SSIS is designed for. There is an article on Technet listing ten typical usages of SSIS : Typical Uses of Integration Services.

With some basic searching, you will find in Microsoft’s documentation, articles and blog posts that there are two kinds of deliverables when we work with SQL Server Integration Services : we can build and deploy SSIS deliverables either as packages or projects.

There are plenty of differences between those two modes :

  • You cannot deploy projects prior to SQL Server 2012 : project deployment appeared with this version. This mode of deployment overcomes package deployment, which is now referred to « legacy deployment mode ».

    So, before SQL Server 2012, only package deployment is available. Microsoft provides tools to migrate from an older version of a SSIS packages to a SSIS project containing this package.

  • A package can be compared to executable code or a workflow description. A project contains a set of packages and has advanced functionalities compared to a simple package. For instance, you can define a connection to a SQL Server instance or an Excel file at project-level. This connection will be accessible to all packages defined in the project. In legacy mode, you will need to create the exact same connection in each package you would group into a single project.

  • Packages and projects are not stored the same way. The former is stored either in msdb database or on filesystem while the latter is stored in a SQL Server database called the SSIS Catalog.

  • Etc.

You can find a comparison on MSDN : Deploy Integration Services (SSIS) Projects and Packages.

No matter these differences, in both modes, you can run packages using (scheduled) SQL Server Agent Jobs.

But this article is not about SSIS development or deployment. There are plethora of tutorials and courses on the subject available on the web. In fact, this article will focus on the extraction of information about SSIS Packages (not projects) deployed in MSDB database (not on file system), using a set of T-SQL queries. The results of those queries could be stored in tables inside a database.

This extraction process was designed to be the input of an analysis process in order to prepare a side-by-side migration and pinpoint problematic aspects before going any further. The objective was to plan preventive corrections a long time before the actual migration process, when it’s possible. This analysis process will eventually be the subject of another article. I think the entire process (extraction plus analysis) can also be used to ensure quality of development, or simply to document existing packages.

Now, let’s dive into SSIS fundamentals. We will view (or review) how a SSIS package is defined and designed. This will allow us to pinpoint relievant components or properties that would be useful to extract, just as a list or for further analysis.

SSIS Packages : what we need to know.

In the following sections, we will sometimes use screen captures of a SSIS package in SQL Server BI Development Studio. The package used is called “Lookup Sample” and taken from SQL Server Integration Services Product Samples. We will refer to this package as our demo package or example package.

SSIS Package composition and properties (overview)

Flows

First of all, when we open either a new or an existing SSIS package in SQL Server Business Intelligence Development Studio or Visual Studio, we will notice that it’s composed in two kinds of « flows » :

  • The control flow : a workflow which lists tasks to perform during package execution. Tasks can be grouped in containers. In this case, containers are seen as undividable tasks.

  • The data flow : a data flow exists for most tasks on data in the control flow. It’s the way data is treated. Some tasks can be defined without data flow. Particularly, database connections are used there (as Data Source or Data Destination)

Here is a diagram from Microsoft that summarize the explanation from above:

You will find below a screen capture with an example control task and its corresponding data flow view.

Connections

In data flow, we generally take data from a source (the extract), we eventually process these data (transform) and store it to a destination (load). Source and destination are defined by connections. There are various kinds of connections : XML file, CSV file, SQL Server, ODBC, Oracle Database, Excel … Have a look at this documentation page for further details. Multiple connections can be used in a single package.

Defining and editing connections is performed using a component of SSIS called « Connection Manager ».

Here are the connections defined in our example package.

During the creation, we will set properties that describe the physical connection to be used during the execution of this package. Amongst those properties, we will find the kind of connection (Excel, CSV, etc.) and the connection string.

Recommendation

When you define connection strings to a database instance or to a host…

  1. Use DNS aliases instead of IP Address or actual server name. This simplifies change management.

  2. Set the ApplicationName property whenever it’s possible. This will allow you to simplify debugging and tracing. For instance, in SQL Server Profiler, you will be able to filter on that particular ApplicationName. I recommend setting this property in any application you develop.

Identifier property

Each package has a unique identifier, the GUID. As an example, you will find ID for our demo Package.

This identifier is generated at creation and can be generated at any time, on demand. This identifier is used for logging and some other available functions. You will find below the way to generate a new identifier : you simply click on the arrow next to the current Id then on “<Generate New ID>”.

A lot of SSIS developers have the common habit that consists in creating a package template. The advantage is that they can reuse it for their future developments. A problem with this practice is that they don’t necessarily generate a new GUID. If that happens, multiple packages can share the same GUID, leading to a headache in case of debugging (when SSIS logging is used) !

Recommendation

If you defined a package template, don’t forget to generate a new GUID for the newly created package based on that template.

Variables and configurations

In every SSIS package, we can define variables. Here are some variables defined in our demo package :

As in any programming language, variables holds values available at runtime. For instance, the User ::ErrorCount variable will hold the number of errors that occurred during execution.

Another common usage of variables consists in keeping connection strings that are generated based, for instance, on the content of a SQL Server table.

Generally, variables come along with SSIS Package configurations. They are defined as follows :

A configuration is a property/value pair that you add to a completed package.

Package organization: folders

Finally, last but not least thing to know, packages can be grouped in package folders. We can build a more or less complex hierarchy or simply group packages by project or by functionality. For instance :

Description of a SSIS Package file

Let’s say our package is ready (or saved but still under development), we can build it anyway. This operations results in the creation of a ‘dtsx’ file. This file is actually an XML file that completely describes the SSIS package (flows, variables, connections…). You will find an example of its contents below.

As for any well-formed XML, there is a corresponding XML Schema Definition a.k.a. « XSD ». For those who are not used to this term, here is a definition from Wikipedia:

XSD (XML Schema Definition), a recommendation of the World Wide Web Consortium (W3C), specifies how to formally describe the elements in an Extensible Markup Language (XML) document. It can be used by programmers to verify each piece of item content in a document. They can check if it adheres to the description of the element it is placed in.

The corresponding definition can be seen on this page.

SSIS management objects (in msdb database)

In this article, we extract information on SSIS packages that are stored in msdb database. For that purpose, Microsoft provided us three tables. You will notice that only two of them are relievant to be used in the following of this article.

  • dbo.sysssispackages :

    Contains one row for each package that is saved to Microsoft SQL Server. There is a column called packagedata which is an exact copy of the content of the ‘dtsx’ file used for deployment.

  • dbo.sysssispackagefolders :

    Contains one row for each logical folder in the folder hierarchy that Microsoft SQL Server Integration Services uses.

  • dbo.sysssislog :

    Contains one row for each logging entry that is generated by packages or their tasks and containers at run time.

How to run a SSIS package using SQL Server Agent

Running a SQL Server Integration Services Package using SQL Server Agent consists of creating and scheduling a SQL Server Agent Job of type « SSIS ». Once created, you can either run it when desired or let it run automatically (based on the schedules you defined).

For those of you who are interested of a more straight forwards procedure, I advice you to have a look at the following MSDN page: Schedule a Package by using SQL Server Agent.

Anyway, SQL Server Agent will just take the role of an interface to a command line tool called « dtutil ». You will find on the documentation page (dtutil Utility ) of this tool that it takes a /SourceServer parameter, which allows you to perform calls to remote SSIS management.

What to collect?

We are now at a point where we should define the questions we can and we want to answer.

You will find such a list in the table below. The purpose of this list is not to be exhaustive, but feel free to contact me with suggestions.

Must Have Should Have Could Have Wont Have
# Question
1 Which SSIS packages are deployed on a server, in which folder ? X
2 Who is the owner of a given SSIS Package ? X
3 What storage capacity does a given package take ? X
4 What is the version of a given package ? X
5 What connections are defined in a given package ? X
6 What providers are used in Connection Manager ? X
7 Which variables are defined in a given package ? X
8 Which package configurations are defined in a given package ? X
9 Which packages are run using a SQL Server Agent Job ? X
10 Does the SSIS Package Owner meet enterprise standards ? X
11 Are there packages with connections that does not meet enterprise standards ?
For instance, references to local server should be pointed out.
X
12 Are there Agent jobs running SSIS Packages that does not meet enterprise standards ?
For instance, references to local server should be pointed out.
X

Based on these questions and everything we discussed above, we can list the following actions :

How will we collect data ?

Well, while the target is well known, we still need to take the time to answer the following question : « how will we do it ? ».

Actually, the answer will be pretty « simple ». As we’ve seen in a previous section, there is a column called packagedata in the dbo.sysssispackages table (stored in msdb database). While this table will allow us to list SSIS packages (action 1), the packagedata column can be used as input to extract attributes of a SSIS package like the connections (action 2). In fact, the content of this column is an XML and SQL Server provides everything we need to analyze/query XML. It’s the XQuery language that is part of T-SQL. This language is not the subject of this article, so it won’t be fully coverred. If you are interested in this subject, please refer to the following page on Microsoft’s documentation website : Introduction to XQuery in SQL Server 2005 or on the page describing all the T-SQL Methods to query XML Data Type.

As you may expect, we won’t use XQuery to get the list of SQL Agent jobs that run SSIS packages. Why ? Simply because it’s not stored in packagedata column and not even in dbo.sysssispackages. Instead, we will query tables related to SQL Server Agend, also in msdb :

The last action that remains (action 4) will exploit the results of prior actions, but’s it’s not part of this article.

In summary, here is the algorithm that we will follow :

Action 1 : Extract the list of SSIS packages

To extract the list of SSIS Packages, we will obviously use the msdb.dbo.sysssispackages table. This table alone should be enough, but if you want to get the full path of a given SSIS Package, there is some more work to do. This table contains a folderid column that refers to the key column with the same name in msdb.dbo.sysssispackagefolders table. Folders can be structured and have a parent folder. So we can find paths like /Finance/Billing or /CRM/Customers.

As we have this unique identifier for a given folder, we can take advandtage of Common Table Expression (CTE) feature and query recursion to get the full path of a given SSIS Package.

Let’s review the structure of target query. There is the CTE declaration followed by the actual SELECT query:

Let’s first review the details of a target temporary table that will hold the results of the query. You will find below the TSQL statements to create such a table.

The following table maps the columns created in last T-SQL statement with the answers we need to provide.

# Question List of columns
1 Which SSIS packages are deployed on a server, in which folder ? /td>
  • PackageRunningId
  • ParentFolderFullPath
  • PackageName
  • PackageDescription
2 Who is the owner of a given SSIS Package ?
  • PackageOwner
3 What storage capacity does a given package take ?
  • PackageSizeKb
4 What is the version of a given package ?
  • PackageVersionMajor
  • PackageVersionMinor
  • PackageVersionBuild
  • PackageVersionComments

Note

While the PackageRunningId column could be used as the primary key for this table. I would not recommend it and suggest, as I did with PackageUniqifier, to add another identifier (like an IDENTITY column), because of a common habits to use a template SSIS package without generating a new identifier.

There are other columns we also keep as they might be useful one day. For instance, the isEncrypted column may allow us to pinpoint the need to extract a certificate or to get a password used to encrypt this package. We will also take other columns for eventual analysis.

Now we know which information is needed, we will take the time to build the query that will fill the #SSISPackagesList temporary table step by step.

First, we’ll get the list of « root-level packages folders », i. e. folders that are created at the root of folders hierarchy. Notice that the following parameters are used :

  • @RootLabel will keep the string to be used as root label. For instance « / ».
  • @SeparatorChar will keep the folder separator character to be used.

The previous query will bring back the minimal set of folders, the first-level folders. It corresponds to the <FirstLevelQuery> tag introduced previously. Notice that there is a FullPath column that retains the full path of the folder (which is exactly what we want to get at the moment). To initiate recursion, we will take the union of this minimal set with another SELECT statement. This statement will use the msdb.dbo.sysssispackagefolders table joined with the CTE itself based on equality between the value of folderid column of this table and the value of parentfolderid column in CTE. It corresponds to the <SelfRecursiveQuery> tag.

Finally, we can get back the list of SSIS packages with their full path in folder hierarchy. The next statement is the complete T-SQL query that fills up #SSISPackagesList temporary table.

Action 2 : Extract the list of defined connection and connection providers

Here is the part of the XSD which defines the expected contents of a XML describing a SSIS package. It’s taken from 5.1 DTS XSD.

Let’s review some fields that I find interesting for further lookup.

First of all, there is a Retain property that you can set to « True » if you want to share this connection accross multiple tasks in a given package. It means that if it’s not set to « True », SSIS will create a new connection everytime it has to use it.

The ConnectionString property is the most interesting one as it’s the actual descriptor of the connection.

Furthermore, when we edit a new connection, we can tell SSIS to delay the validation of this connection until run time. We will then use the DelayValidation property.

Finally, there are two base properties ObjectName and Description that can be useful

This leads to the design of this temporary table :

And here is the T-SQL statement we can use to populate this table, based on the content of previous #SSISPackagesList temporary table.

Well, as you can see above, to get data about connections, you «  CROSS APPLY » with XML nodes of type ‘/DTS:Executable/DTS:ConnectionManager’.

It’s not difficult to understand that you will do the exact same thing to extract information about either package configuration or variables :

  • For configurations, nodes will be of type ‘/DTS:Executable/DTS:Configuration’.
  • For variables, nodes will be of type ‘/DTS:Executable/DTS:Configuration’.

Action 3 : Extract the list of SQL Server Agent Jobs of SSIS type

While previous action is a little bit more complex, the following one is pretty straightforwards. It consists more or less in a query against msdb.dbo.sysjobs and msdb.dbo.sysjobsteps tables. While the title of this section tends to say we are going to build a list of SQL Agent jobs, we actually will build a list of SQL Agent Job Steps. The first list is easy to get from second.

As always, let’s review which columns will compose our target temporary table.

We will first identity the corresponding SSIS package that is used for a given step. To do so, there is the GUID of the package to identify the package. We know that this is not perfect and we might change this by a unique identifier generated at run time.

We also will get back some general informations about the job :

  • its identifier and name,
  • whether this job is enabled or not,
  • whether this job has already run or not

Finally, as we get data on steps and not only jobs, we will collect the step id, the target server and the entire text of the command that composes the step.

You will find below the T-SQL statement to create the temporary table that will hold the list of SQL Agent job steps that call SSIS packages.

And we still need a query to populate this table. It’s what the following query does.

Summary

We have seen that a SSIS Package is composed of multiple components and properties that can be extracted using different sources and techniques, especially the XML column called packagedata, in msdb.dbo.sysssispackages table and XQuery.

The queries used in this article are grouped together in a SQL file you can download here.

What’s next

We extracted some information. We can take these as input of an analysis process to pinpoint deviations to enterprise standards.

References

Jefferson Elias
Integration Services (SSIS)

About Jefferson Elias

Living in Belgium, I obtained a master degree in Computer Sciences in 2011 at the University of Liege. I'm one of the rare guys out there who started to work as a DBA immediately after his graduation. So, I work at the university hospital of Liege since 2011. Initially involved in Oracle Database administration (which are still under my charge), I had the opportunity to learn and manage SQL Server instances in 2013. Since 2013, I've learned a lot about SQL Server in administration and development. I like the job of DBA because you need to have a general knowledge in every field of IT. That's the reason why I won't stop learning (and share) the products of my learnings. View all posts by Jefferson Elias

168 Views