SQL Server Integration Services provides a wide variety of features that helps developers to build a robust Extract, Transform, and Load process. After many years contributing to SSIS-related tags on Stackoverflow.com, I can say that many developers have some misunderstandings about SSIS features (SSIS OLE DB Source, SSIS Expressions, SQL Server destination …) especially those which are very similar and have some common usability.
This article is a part of a series called SSIS features face-to-face, which aim to remove any confusion and to illustrate some of the differences between similar features provided by SQL Server Integration Services.
When using a SSIS OLE DB Source, many Access modes can be used to read from the source:
- Table or View: Select a Table or View from a drop-down list
- Table name or View name variable: Choose a SSIS variable that contains the table or view name
- SQL Command: Write your own SQL query (single table query or complex query)
- SQL Command from variable: Select a SSIS variable that contains a SQL command
Many times I was asked on the difference between (1) using Table or View access mode and selecting specific Input Columns from the SSIS OLE DB Source and (2) using a SQL Command to select some columns from one table.
In order to learn more about SSIS OLE DB Source, you can refer to the following official documentation provided by Microsoft:
The main reason to run this experiment was a question posted on Stack Exchange Database Administrators website. Due to its importance, I decided to write a related article with more details in order to give more explanation.
In order to check the difference between these two options, I will run a small experiment using SQL Server profiler to check what is the command sent to a database in each method. The experiments will be conducted on the AdventureWorks database. In this section, I will describe each step in these experiments and illustrate the results.
- You can download the AdventureWorks database from the following page: AdventureWorks Installation and configuration.
Creating Test Packages
First of all, I will create two packages; each one contains a Data Flow Task where I will set up a SSIS OLE DB Source and a Row Count component.
Before adding these components, we have to create a new OLE DB connection manager in order to set up a connection with the database. Just right-click on the Connection Managers tab and click on New OLE DB connection manager:
Then you have to set up your connection by choosing the server name, authentication parameters and the database name (AdventureWorks) as shown in the image below:
After configuring the OLE DB Connection manager, we must add a Data Flow Task. And within the Data Flow Task, we have to add a SSIS OLE DB Source and a Row Count Transformation (dummy task) as we mentioned above.
In the first package, we will configure the SSIS OLE DB Source to read from [Sales].[Customer] Table using Table or View Data access mode, and I will select CustomerID and AccountNumber columns as Input.
In the other package, I will configure the SSIS OLE DB Source to read from the following SQL Command:
SELECT CustomerID, AccountNumber
Create and Start a Profiler Trace
Before executing the created packages, we have to create and configure a Trace using SQL profiler in order to monitor all commands executed on the AdventureWorks database.
If you are interested to learn more about SQL profiler, you can refer to the following Microsoft articles:
From the SQL Server Management Studio, Go to Tools menu strip and click on SQL Server Profiler:
Then a connection dialog appears where you have to set up a connection with the same Instance you already connected in SQL Server Management Studio.
After connecting to the Server, we have to configure the Trace as the following:
First, we have to select the Tuning Template and to set the trace destination to a new Table called OLEDBSourceTest as mentioned in the images below:
After that we must go to the Event Selection Tab and click on Column filters button, then we have to filter on AdventureWorks2017 database as shown in the images below:
Now, we have to run the Trace.
Note – it is more preferable to create the trace destination table in a different database to prevent showing unwanted queries in the trace log.
After executing the first package (Table or View approach), the following command is logged by the SQL profiler:
SELECT * FROM [Sales].[Customer]
Moreover, after executing the second package the following command is logged by the SQL profiler:
SELECT CustomerID, AccountNumber FROM [Sales].[Customer]
Discussion and Conclusion
After running this experiment, we can figure that even if we select some specific columns while using Table or View data access mode in a SSIS OLE DB Source, a SELECT * command is executed on the database engine and columns are filtered within the SSIS package which is different from using a SQL command to select specific columns since the second choice will only read these columns from the database engine and not filters are applied in SSIS.
Based on that, we can conclude that using the SQL command is more efficient from the performance perspective since it will decrease the memory usage and the amount of data retrieved. But make sure that the SQL command has its needed indexes created. Also, it will be more efficient to have the primary key columns included even if they are not needed.
Also, these results can be applied to Lookups Transformations since they also use the same Data access methods, and be aware that in lookups selecting needed columns is more critical since it affects performance more, especially when handling a huge volume of data.
You can refer to the following links to learn more about monitoring the SQL Server Database Engine and some SSIS best practices:
- Is this the end of SQL Profiler?
- Integration Services Performance Best Practices – Data Flow Optimization
- Integration Services Performance Best Practices – Writing to the Destination