Marko Radakovic

Using SSIS packages to import MS Excel data into a database

March 15, 2014 by
Inserting the data in the database table can be hard and can take a long time. Especially today when the databases becoming bigger and bigger. Automated data importing is possible using different methods, depending on source file type and data organization inside

The easiest way to import the data into the SQL database is using the Import Data wizard from SQL Server Management Studio. Following the steps through the wizard, user can choose the source file, and the destination table to import the data. However, importing the data this way narrows the process and won’t allow user to execute more complex tasks with different kind of the data, and multiple files. The SSIS packages offer solution for this kind of problem. The SSIS package represents tool for the ETL (Extract-Transform-Load) processing, and it can be used not just to import the data into the database, but to transform, filter, group the data and many other tasks. This article will give explanation how to import the simple Excel worksheet into the SQL database

Before creating an SSIS package, the user needs to create new project in BIDS by choosing the Integration Services from the Business Intelligence list from the left, navigate to the Integration Services project , and define name for the project like shown in the image below

Creating new project in BIDS - defining a name for the project

Importing from an MS Excel worksheet

To be able to import the data from the Excel worksheet into the SQL database, the user first needs to define the Data Flow Task, dragging from the SSIS Toolbox into the Control Flow area like shown in the image below

Defining the Data Flow Task

Double-clicking the Data Flow Task will open the Data Flow tab. Before selecting and configuring the source and the destination, connections for both of them must be created. Since the Excel worksheet is the source, the Excel Connection will be created, and for the destination, the OLE DB connection. Both of these connections can be created by right click on the Connection Manager in the Solution Explorer, and choosing the New Connection Manager from the drop down list, like shown in the image below

Choosing the New Connection Manager from the drop down list in the Solution Explorer

From the list of the connections in the New Connection Manager dialog, the Excel connection will be used to connect the external Excel worksheet with the project. Clicking the Browse button, Excel file can be connected with the task like shown in the image below

Note: The external Excel worksheet must have the “xls” extension.

Connection with the destination, in this case SQL Server database, can be established through the OLE DB connection, choosing it from the list in the New Connection Manager window.

In the Connection Manager window, Native OLE DB\SQL Server Native Client 11.0 must be selected from the Provider drop down list. In the Server Name filed, if the server is local, it can be defined by entering “.” (without quotes). Otherwise, proper server name must be typed

If the server name is correct, the user will be able to choose one of the databases from the server, where the data from the Excel worksheet will be imported, like shown in the image below

Choosing the Database the Excel data will be imported to

When the connections with the source and the destination is created, the Data Flow Tasks can be defined and configured in the Data Flow window. From the SSIS Toolbox, the Excel Source, and the OLE DB Destination tasks must be imported into the Data flow window, as shown in the image below

Tasks are imported into the Data flow window from the SSIS Toolbox, the Excel Source, and the OLE DB Destination

Double click on the Excel Source task will open the Excel Source Editor. Selecting the Excel connection from the Connection manager list allows user to choose the Name of the Excel sheet from the list, like shown in the image below

Selecting the Excel connection and choosing the Name of the Excel sheet

In the Column tab, the user can choose what columns from the excel file will be imported by checking the boxes for desired columns, and rename output columns, as shown in the image below

Choosing what columns from the Excel file will be imported

To configure the OLE DB Destination it must be connected to the Excel Source first. Double click on the OLE DB Destination will open the editor shown below

OLEDB Destination Editor

In the OLE DB connection manager list, previously defined connection must be selected, in this case, connection is established with the AdventureWorks2012 database on the local server. From the table list, desired destination table can be chose, or new table can be created clicking the New button and entering the SQL syntax

After the destination database table is selected, the columns must be mapped before importing the data. That could be done from the Mappings tab like shown in the image below

OLEDB Destination Editor - Mappings tab

Not all the columns from the source need to be imported. Mapping is a way of filtering the input data, and pointing to the specific columns in the database table, where the column from the source will be imported. Choosing the <ignore> from the input column list will exclude it from importing

After the columns are mapped, the SSIS package can be executed by right click on it from the Solution Explorer, and choosing the Execute Package option from the drop down list like shown in the image below

Choosing the Execute SSIS package option from the Solution Explorer

After executing the package all the data from the Excel worksheet will be imported in selected database table. The image on the right side represents imported data into the SQL database, from the Excel worksheet to the left

Dialog showing imported data into the SQL database (on the right), from the Excel worksheet (on the left)

Using this method, data from the same excel sheet can be imported in the selected table until there is no modifications in excel with data types. If the data is inserted or overwritten with new ones, importing process will be successfully accomplished, and the data will be added to the table in SQL database

Useful resources:
Import data from excel to SQL server
SSIS packages for data importing

Marko Radakovic

Marko Radakovic

Marko is an IT and technical education teacher, who likes movies, video games, and heavy metal music.

He uses his spare time to play guitar, ride a bike and hang out with his friends. During winter, he likes skiing the most, but all other snow activities, too.

He is also author of various SQL Shack articles about SSIS packages and knowledgebase articles about ApexSQL Doc.

View all posts by Marko Radakovic
Marko Radakovic
SSIS packages

About Marko Radakovic

Marko is an IT and technical education teacher, who likes movies, video games, and heavy metal music. He uses his spare time to play guitar, ride a bike and hang out with his friends. During winter, he likes skiing the most, but all other snow activities, too. He is also author of various SQL Shack articles about SSIS packages and knowledgebase articles about ApexSQL Doc. View all posts by Marko Radakovic

20 comments
Pendu
Pendu

very good tutorial. AWESOME!

I have a question...


I am new to ETL . Is there a way I can automate the ETL?? 

To automate the importing of excel to DB when the .xls file is dropped in the folder ?


Thanks

MarkoRadakovic
MarkoRadakovic

@Pendu


Thanks for the kind words and for your question.


Executing SSIS can be scheduled using SQL agent job, but that cannot be set based on some condition (for example if you add a new file in folder).

However, you can schedule SSIS on a daily basis (once a day, or at some specific time schedule) and have some condition inside SSIS that will find out if the new file is added to the folder. If yes, the file will be processed and new data will be imported in a table, otherwise, the import will be skipped.

Another solution is to poll the folder constantly for new changes (in your case, adding new files), and have that as a step in a SQL job that will run before the package execution

Corey
Corey

Good Morning


Thank you for the great write up. I am experiencing a really weird occurrence. I have an SSIS package (or 3 lol) using exactly what is described in your post. They worked great for the longest time, recently they started opening the actual excel file but this only occurs when firing off the project with a .bat file , not when running from inside SSIS. Have you ever seen this behavior?

MarkoRadakovic
MarkoRadakovic

@Corey Hi there,

Thanks for the comments. I have to say that I didn't experienced such an issue so far.

Prahalad
Prahalad

Hi Marko,


Thank you for a great tutorial!


My data source is Excel on SharePoint and destination is SQL Server on a remote server which I am accessing from my local machine virtually.


After following the steps above, I tried to execute the package by right clicking on the package and selecting "Execute Package".


However, it did not finish executing. It was stuck and the output was as attached.


I have already installed the 32 bit version of Microsoft Access Database Engine 2010 Redistributable. But this error is still persistent.


I would greatly appreciate it if you could help me solve this! Thanks!


Prahalad

Alen Gubicak
Alen Gubicak

@Prahalad Thanks for the comment :)

It seems that you have Microsoft Office 2016 installed on your machine.

To overcome the error you are experiencing, you need to have the corresponding version of Microsoft Access Runtime installed.

Since you need Microsoft Access Runtime 2016, please download and install it from the link below:

https://www.microsoft.com/en-us/download/details.aspx?id=50040

You can read more about the discussion on this issue by following the link to the Stack Overflow topic below:

http://stackoverflow.com/a/32247214

Hope this will help :)

Satyard
Satyard

Great article. I've one question, how i'll be loading table in one server from multiple table with some filter logic in another server.

MarkoRadakovic
MarkoRadakovic

@Satyard

Thanks for the comments and for your kind words. :)

Importing data from multiple tables from SQL Server database to a single table in a destination differs a bit from this article.

However, you can perform the task using a little bit different task structure.

In general, you’ll need: Execute SQL Task, Foreach Loop Container, Script Task and Data Flow Task

This blog should help you to accomplish the task:

http://satishmsbiworld.blogspot.rs/2015/05/load-data-from-multiple-tables-into.html

Christina Piamonte
Christina Piamonte

Hi thank you for your great tutorial. Anyway I have a question, what if I have 3 columns on my stgTbl:

Name   |   Size          | Color


Tina      |  Medium     | Green

Kim      |   Small         | Orange

Dian     |  Medium     | Red


What I want to happen is on my finalTbl it will only display those details whose size is Medium so on my finalTbl the only details that will appear are Tina's and Dian's..


Hoping for your reply. Thank you.

MarkoRadakovic
MarkoRadakovic

@Christina Piamonte


Thanks for the comment and for your question.

There are variety of ways to achieve this. The one that comes on my mind is to add a Conditional split task between the Excel source and OLEDB Destination, and specify a case (e.g. ImportingMediumOnly with the following expression: Size == "Medium").


This will "extract" only those rows where the Size column has value "Medium".


Hopefully this will help. 

Manuel Garcia
Manuel Garcia

very good tutorial.

I have a question...

if you have multiple types of data in a column, as I do to recognize them?


Thanks!!

MarkoRadakovic
MarkoRadakovic

@Manuel Garcia

Thanks for the question. This is interesting but this is out of the scope presented in the article.

Most probably this can be fixed by processing each filed from the input (column), checking the exact data type that will be loaded into the destination table, and based on that load specific values in the destination table.



Rajesh
Rajesh

Hi Marko,


Thanks for your article. Please can you tell how to load from multiple sources(Excel,text file) into destination.


Regards,

Rajesh

MarkoRadakovic
MarkoRadakovic

@Rajesh


Thanks for the question. There are plenty of ways to perform this task. It depends on the nature of the files and the data nature inside them

.

Basically, you ‘ll need to “filter” each  type of data source and perform a separate load of the data into the destination table.


On another side you can have a loop for files with the same type/data structure.


Some reference that you might find useful:

https://social.technet.microsoft.com/Forums/sqlserver/en-US/deb93da0-cfb9-4ce4-b769-b075e5e2de67/ssis-import-multiple-source-files-single-ssis-package

https://www.mssqltips.com/sqlservertip/1322/merge-multiple-data-sources-with-sql-server-integration-services/

SobhanDutta
SobhanDutta

Hi Marko,


Great article !! Please reply my 3 questions:

1) If in my excel file, there is some unwanted rows on top of the sheet. How can I exclude those when importing into the table?

2) If I want want to reuse the package how I will execute that?

3) If I wanna use different excel on every time, will my previous data will be updated?


Waiting to hear from you :)

MarkoRadakovic
MarkoRadakovic

@SobhanDutta


Thanks for the kind words. I appreciate it.


1)There are several ways to manipulate the rows that will be picked from the excel file. You can add the Sort task, or you can specify the OpenRowet value. For more info, please see:

https://social.msdn.microsoft.com/forums/sqlserver/en-US/947fa27e-e31f-4108-a889-18acebce9217/ssis-excel-import-skip-first-rows

http://sqlserversolutions.blogspot.rs/2009/02/selecting-excel-range-in-ssis.html


2)It depends on where the package will be stored, but once created the package can be run whenever you need.

Here is another article that I wrote about executing SSIS packages: http://www.sqlshack.com/ways-use-execute-sql-server-integration-services-packages/


3)If you are referring to a separate excel file, you’ll need a separate connection in the package for each excel file. The data will be updated in the destination.

silentgut
silentgut

Is there a way for me to do this. my visual studio is 2010. there is no Business Intelligence Template for VS 2010 

MarkoRadakovic
MarkoRadakovic

@silentgut


Thanks for the comment and for your question. Actually, there is no way to do this in VS2010, as you said there is no BI templates for VS2010. As a workaround you can use VS2008 which can run side by side with VS2010. Please bear in mind that VS2008 works with SSIS 2008. If you are using the newer versions of SSIS package you'll need to install the appropriate version of Visual Studio/SSDT

Guest
Guest

Excellent tutorial. Could you please clarify what will happen if I delete some of the data from the excel? Will it automatically get deleted in the database when I run the package next time? 

MarkoRadakovic
MarkoRadakovic

@Guest


Thanks for the kind words.

If you delete entire row in excel table, next time you run the package, it will add all other rows except the one previously deleted.

In case you've deleted just a single field, all data from the input excel file will be added and the deleted field will be presented with "null".

So the answer is YES if you delete the data in the excel file, it will be deleted in the database, when you run the package.