Introduction
We can use an SSIS package to perform various tasks such as data import, transform, and get output in various formats. Once we launch Visual Studio 2019 and create an integration services project, it shows you various tasks in the SSIS toolbox. In this article, we will explore the XML Task in the SSIS.
Overview of an XML Task
We use XML Task to validate, compare the XML documents, apply various transformations. It uses Extensible Stylesheet Language Transformations (XSLT) for applying the transformation on the XML data. We can also merge XML documents from various sources and create a comprehensive XML document.
Explore XML Task configurations
Let’s drag this XML task in the Control flow and explore its various configurations.
Now, double-click on the XML editor to open the XML task editor.
Let’s have a look at the options available in the task editor:
- Operation type: It provides a list of options that we can perform on the XML document. We get in the Input section
- Diff: We can compare two XML documents using the diff operation. It uses the first XML as source XML and compares it with the following XML specified. In the output, we get an XML Diffigram document
- Merge: Sometimes, we require to merge two copies of XML documents in a single document. Similar to the diff operation, it uses the first XML as a base document. It merges the second XML document with the base XML
- Patch: We can use the Patch method to create a new XML document that includes the output of the XML Diffigram document
- Validate: We use an XML Schema definition(XSD) to validate an XML document. An XSD document contains the tabs, attributes, definition of the XML document. We can validate the XML using this option
- XPATH: We can perform XPath queries o XML documents using this XPATH option
- XSLT: We can transform an XSL document using the XML documents
By default, it shows the XML editor in the Diff operation mode. For the Diff operation type, we have further options in the Diff Option group.
Expand the DiffOptions to check various options for Diff.
- DiffAlgorithm: It defines the comparison algorithm for the XML document. We have Auto, Fast and Precise modes. By default, it uses the Auto mode. In the precise mode, XML comparison might take a longer duration. Auto mode selects the mode depending upon the document size.
- DiffOptions: By default, all options in this group are false:
- IgnoreComments: If this value is True, SSIS compares the comments as well in both XML’s
- IgnoreNamespaces: It defines the comparison for the element uniform resource identifier(URI and its attributes. If we set this value to TRUE, XML elements with the same local name are considered as identical. It ignores the namespace value for both the elements
- IgnorePrefixes: It shows whether the XML task compares the element and its attributes. It ignores the prefix while making the comparison
- IgnoreXMLDeclaration: It ignores the XML declarations in the documents for comparison purpose
- IgnoreOrderOfChildElements: It ignores the order of the child elements in the XML document. If we set this property to TRUE, child element with the different positions is considered to be identical
- IgnoreWhiteSpaces: It controls the white space while making the comparison
- IgnoreProcessingInstructions: We can specify whether we want to compare the processing instructions or not
- IgnoreDTD: It defines whether we want to compare the document type declaration(DTD) or not
- FailOnDifference: If the XML task finds any difference in both the base and second XML document, we can control the SSIS package behaviors
- FailOnDifference= True: Package fails if any difference exists
- FailOnDifference=False: The package does not fail due to the XML document’s difference
- SaveDiffGram: Once the XML document compares the XML document, it creates a DiffGram file, if we set this property to TRUE. We also need to define a location where this DiffGram file is saved. We can store the difference in a variable as well
Compare XML documents in a SSIS package using XML task
Let’s create two XML documents and compare them using this XML task in the SSIS package. For the demonstration purpose, I use the XML specified in the Microsoft docs.
Base file:
Second XML File:
In the source type, create a new file connection and specify the location of the XML document we want to use as a base file.
Similarly, we need to provide the path of the second XML file in the Second Operand group. Select the SecondOperandTYpe as a File Connection. We can use the option DirectInput or variable as well from the drop-down menu.
Additionally, we need to configure the path for the DiffGram file that captures the difference in both XML documents. First, set the property SaveDiffGram to True. In the new file connection manager, select the option to create the file and specify the file name and directory.
We do not want our SSIS package to fill in case it finds any difference in both XML’s file. I would go with the default setting FailOnDifference as False. You can change it to True in case you want the package to fail in case of differences.
In the connection managers, we can see the base XML, second XML file along with the DiffGram file to capture the differences.
Click on Start to execute the SSIS Package. We can see the package is successful because we have selected the property FailOnDifference as False.
Let’s open the DiffGram file from the directory we specified during configuration. In the base and source file, we have additional data for the book id 102. It does not exist in the base file, so you got the difference in the output.
We have book id 101 in both the XML documents. Let’s change the content for book id 101 in the base XML.
Save this XML file and rerun the package. This time we get a failure for the SSIS package. Click on the Execution results, and we see that it failed because the DiffGram file already exists.
Go back to the control flow and open the XML task editor. In the output, we have the option to overwrite the output file if it already exists. Switch the value for OverWriteDestination as True.
Click Ok and execute the package. The package runs fine this time. Open the DiffGram file and note the difference in both XML’s. You can see here that it highlights the difference if it exists in an element value as well.
Merge XML’s using XML Task in SSIS packages
In the previous section, we compared the XML document using the diff operation of the XML task. Let’s use the merge operation. As highlighted earlier, merge operations add the content of the second document into the base document.
Let’s open the XML task editor and change the operation type to Merge. It changes the configuration options corresponding to the merge operation.
Before we go further, let’s view the XML documents we want to merge into a new file. Here, the left file is the base XML(1), and Right-side XML is the XML we wish to merge with the base file.
Let’s make the following changes in the merge file editor.
- Specify the path of the base XML file
- Specify the path of the file we wish to merge in the base file
-
We need to specify the merge options
- XPathStringSourceType: Direct Input
- XPatStringSource: It is the tags that we used in the XML file. If we look at both XML documents, it is embedded in the tags <catalog>. You need to specify this tag in this configuration, as shown above
- Specify the file name and destination for the new merge file. We can also choose to overwrite the existing file if it already exists
Once we have done the configuration for merge operation, click Ok and execute the SSIS package.
Package execution is successful. Let’s open the merge XML file and see if it merged the following XML into the base XML. In the following screenshot, we view the XML content into the base XML document.
Conclusion
In this article, we learned about the XML task in the SSIS package. We explored the XML files comparison and merge operations using the Diff and Merge operations. You can also explore other XML operations such as Patch, validate, XPath and XSLT.
- How to install PostgreSQL on Ubuntu - July 13, 2023
- How to use the CROSSTAB function in PostgreSQL - February 17, 2023
- Learn the PostgreSQL COALESCE command - January 19, 2023