This article explains the SSIS XML Source and the SSIS XML task, which are the XML-related components in the SQL Server Integration Services.
This article is the sixteenth of the SSIS features face-to-face series that aims to compare similar SSIS components in order to remove any confusion for the data integration developers that use this technology.
XML stands for Extensible Markup Language. It consists of codes, or tags, that describe the text in a digital document. XML was designed to store and transport data and to be both human- and machine-readable.
XML is a semi-structured file format that contains tags or other markers to separate semantic elements and enforce hierarchies of records and fields within the data. In XML, schema can be defined within the XML file (called inline schema) or stored within a separate XML schema definition file (XSD).
XML is a generic data storage format that comes bundled with several tools and technologies. It is a standard data format that becomes widely used in data transfer worldwide, making it frequently used in the data warehousing process.
In SQL Server Integration Services (SSIS), one task and one component are provided, allowing developers to perform several XML-related operations.
This article will explain the SSIS XML source component and how to use it to import data from XML files. Besides, we will explain the XML task and what kind of operation it performs.
Unfortunately, SSIS does not have a destination component for storing data within XML files. But there are several workarounds and third-party components that can be used to achieve that. We have previously illustrated them in the following article: SSIS XML Destination.
- Side Note: All screenshots in this article are taken from Visual Studio 2019, SQL Server Integration Services (SSIS) projects extension version 3.4.
SSIS XML Source
As described in the Visual Studio toolbox, the SSIS XML Source extracts data from an XML file.
Figure 1 – XML Source description in the Visual Studio toolbox
In the SSIS XML Source, there are three data access modes:
- XML file location: Select an XML file manually from a specific path
- XML File from variable: Read an XML file path from an SSIS variable
- XML data from variable: Read an XML file from an SSIS variable
Figure 2 – XML Source data access modes
In addition, the SSIS XML source allows developers to select either the XML file contains an inline schema or to read the schema from an external XSD file.
Figure 3 – XML Source editor
If the XML file does not have a schema, the SSIS XML Source allows the user to generate a schema based on the data stored within the XML file. After selecting an XML file, the user can click on the “Generate XSD” button.
Not all properties are configurable from the SSIS XML Source editor. There is a property called “XMLIntegerMapping” located in the component properties grid. This property defines how XML integer types are mapped. Two options are available: reading numbers as decimals or reading them as 32-bit integers.
Figure 4 – XML Integer mapping property
Some of the SSIS XML Source properties can be evaluated as expressions. In the Data Flow Task properties tab, we can open the expressions window to access those properties (as shown in the image below).
Figure 5 – SSIS XML Source expressions
The “XMLData” property content differs from the Data Access Mode selected; it may contain the XML file path or the SSIS variable name containing the XML path or the XML data. The “XMLSchemaDefinition” property can be used to evaluate the XSD file path as an expression.
There are several articles and official documentation that contains useful information about the XML Source component:
- Import XML documents into SQL Server tables using SSIS packages
- XML Source – SQL Server Integration Services (SSIS) | Microsoft Docs
- Extract Data by Using the XML Source – SQL Server Integration Services (SSIS) | Microsoft Docs
SSIS XML Task
While the SSIS XML Source is used to read data from an XML file, the XML task is used to perform several operations that are not related to the data import process. In the SSIS toolbox, it is mentioned that the XML task works with XML data: it allows retrieving XML documents, using XPath expression, merging documents, validating, comparing, and perform other XML-related operations.
Figure 6 – XML task description in the SSIS toolbox
In the SSIS XML task, six operations are available in the “Operation Type” drop-down list. Those operations are defined as the following in the official documentation:
Table 1 – SSIS XML Task operation types
Figure 7 – Available operation types in the SSIS XML task
Each one of those operations has its own configuration. Some operations (such as Diff, Merge, and Patch) require two connections since they need two XML files, while the other operations (Validate, XPATH, XSLT) only need one XML file connection.
There are three ways to define an XML file connection within the SSIS XML Task:
Figure 8 – XML connection types available in the SSIS XML Task
Direct Input: This option allows the user to enter the XML data directly into a text box
Figure 9 – XML Direct input form
- File Connection: Since XML does not have a connection manager in SSIS, the File connection manager can be used to connect to an external XML file
- Variable: Set the source to a variable that contains the XML document
There are several articles and official documentation that contains valuable information about the SSIS XML task:
- An Overview of the XML Task in SSIS Packages
- XML Task – SQL Server Integration Services (SSIS) | Microsoft Docs
- Validate XML with the XML Task – SQL Server Integration Services (SSIS) | Microsoft Docs
This article explained briefly the SSIS XML source component and the SSIS XML Task, which are the XML-related component in the SQL Server Integration Services. In addition, it illustrated the use cases of each of those features and provided some useful resources for more details.
At the end of this article, it is worth mentioning that besides of the XML component provided in SSIS, there are several third-party components provided by leading companies such as CDATA, COZYROC, ZappySys and others.
Table of contents
- Learn SQL: Insert multiple rows commands - March 6, 2023
- Learn SQL standard deviation function - January 17, 2023
- An overview of the SQL GROUP BY clause - December 28, 2022