This article guides you through importing XML documents into SQL tables using SSIS packages.
Introduction to XML
Extensible Markup Language(XML) is a simplified markup language. It is an open-source W3C standard language and mostly used to store and transport data. It was derived from the Standard Generalized Markup Language (SGML). It is different from the HTML but has some similarities with it.
- HTML is static while XML is in dynamic in nature
- In HTML, we use pre-defined tags while XML uses user-defined tags
- XML is a case sensitive language while HTML is not
- HTML focuses on the data presented while XML focus is on storing and transporting data
XML Schema Definition(XSD) is used to validate, format and describe an XML file. A document should be validated against the XSD to be a valid XML file. An XML document is valid if the XSD defines the tags, attributes for it. XSD is also a recommendation of W3C (World Wide Web Consortium)
Sometimes, we work with the various data format and gets the requirement to load the data into SQL Server. Source of data might be a different relational database, Excel, CSV, HTML.
We might get a requirement to load XML data into SQL Server tables. In the article, Working with XML Data in SQL Server, we talked about SQL Server data into XML formats. I encourage you to go through this article to understand data conversion in XML format from SQL tables.
SQL Server Integration package (SSIS Package) is an ETL (Extract-Transform-Load) tool to work with complex data structures, transform and load into SQL Server or any other destinations. In this article, we will explore the process of using an SSIS package to load XML files into SQL Server tables.
Sample data and prerequisites
In this article, we use the following XML to upload into SQL Server. I downloaded this sample XML from the Microsoft Docs.
- Install Visual Studio 2019 with SQL Server Data Tools. You can go through Microsoft docs for download and install it
You should gain basic knowledge of SSIS packages before proceeding with this article. In case you are a beginner, you can go through SSIS articles on SQLShack.
Steps to import XML data into SQL Server using an SSIS package
Launch the Visual Studio 2019 and create an integration service project. You should choose an appropriate project name and SSIS package. It is not a requirement but a best practice to recognize projects with their names.
In the SSIS package, add a Data Flow task in the control flow area. Right-click on it and rename the data flow task.
As you know, we have a source data in the XML format. For this purpose, we use an XML Source task from the SSIS package toolbox. Drag this XML Source in the Data Flow. SSIS adds the task in the data flow task created earlier.
For the source configuration, double-click on the XML Source, and you get an XML Source editor.
We have the following configuration option in the source editor.
Data Access Mode
By default, you get the data access mode as XML File Location. It is the appropriate option for us because of our source data in an XML file. We can choose other options – XML file from variable and XML data from the variable.
In the XML location, browse to the XML file directory. Once you select the XML file, you get an error that we have not defined the XML schema (XSD).
Use inline schema
If we have an XML schema defined in the XML file itself, we can use this option. You should verify your XML file if the XML file contains XSD or not.
If we have a separate file for the XML definition, we can browse to the file and select the XSD file. We do not have an XSD file in my demo. In most cases, if we get an XML file from any external source, you also get an appropriate XSD file.
SSIS provides you with an option to generate the XSD file from the XML file. This option works fine, however, for a complex XML file, and you should supply the XSD file separately in the XSD Location.
Let’s click on the Generate XSD option. It generates the XSD in the same folder as of the XML file. You can specify a separate location as well, if required. It is better to generate the file in the source location to avoid any conflict.
Let’s open the XSD file in a suitable program such as Edit Plus or Visual Studio. In the following image, we see that it has a definition for each data column along with their data type. We should be careful in making changes to this XSD because XML validation will fail in case of any consistency issue between XML and XSD files.
Click OK, and it reports the following warning messages. We can ignore this warning message. It does not stop us from importing the data. To resolve this error, you can specify the minimum and maximum length in the XSD file for each column. You should know XML language knowledge for making changes in XML and XSD files.
Click OK, and you see the data columns available in the XML file and XSD. If we want to stop the operation, click on Cancel.
Click OK, and you can see the XML source task is configured successfully. We do not get a red cross icon once a task configuration completes.
We want to insert data into the SQL Server table. For this, add an OLE DB Destination task and configure it with your instance details.
If you have an existing SQL table, you can select the table from the drop-down. Otherwise, click on New and specify a table name. SSIS automatically detects the data columns and their data types. You can change the data type and its value in the Create Table script as well if required. You should configure an appropriate data type and its value to avoid any data truncation.
Once you click OK, it shows the destination SQL table.
Now, click on Mappings. Here, we can define and verify the source XML file and destination SQL table column mappings. It is a mandatory step to validate the column mappings. We cannot proceed with the SSIS package without validating the mappings.
Click OK after column mapping is done, it configures the OLE DB destination task. We can see below the configured SSIS package to import an XML file, as shown below.
Let’s execute this package, and it successfully transferred three rows from the XML Source to the SQL table.
Connect to the SQL instance in the SSMS and verify the data in the SQL table.
SSIS Advanced Editor for XML Source
We can go to the Advanced Editor of XML source to configure various properties of source data. Right-click on the XML source -> Show Advanced Editor.
It opens the following Advanced Editor for XML source.
Navigate to Input and Output properties. It gives you column properties for both source and output data for individual columns.
Sometimes, we received error due to invalid data types. In this case, we can go to the respective column and modify the data type from a drop-down list.
We can quickly load data from an XML file into the SQL Server tables using SSIS packages. You should have a proper XML document and its definition in an XSD format file. You can generate an XSD file from SSIS, but it does not work correctly for complex XML data. You should also configure the error handling in SSIS packages to avoid package failures and investigate it in case of any issues quickly.
- SQL Server Choose() function introduction and examples - June 5, 2020
- Using SSIS packages to add row headers and data into flat files - June 4, 2020
- Working with SSIS packages on AWS RDS SQL Server - June 3, 2020