In this article, I will give an overview of Execute SQL Task in SSIS and I will try to illustrate some of the differences between writing an expression to evaluate SqlStatementSource property or writing this expression within a variable and change the Execute SQL Task Source Type to a variable.
This article is the fourth article in the SSIS feature face to face series which aims to remove confusion and to illustrate some of the differences between similar features provided by SQL Server Integration Services.
Execute SQL Task
From the SSIS toolbox item description, Execute SQL Task in SSIS is used to execute SQL statements or stored procedures in a relational database.
Figure 1 – Execute SQL Task description
The SQL statement can be a:
- Single SQL Command
- Multiple SQL Commands
- Stored procedure
Figure 2 – Execute SQL Task in SSIS
This Task need a connection manager to establish a connection with a data source and it supports several data sources other than SQL Server since it can use the following connection managers:
- Excel: An Excel connection manager enables a package to connect to a Microsoft Excel workbook file
- OLE DB: An OLE DB connection manager enables a package to connect to a data source by using an OLE DB provider
- ODBC: An ODBC connection manager enables a package to connect to a variety of database management systems using the Open Database Connectivity specification (ODBC)
- ADO: An ADO connection manager enables a package to connect to ActiveX Data Objects (ADO) objects, such as a recordset
- ADO.NET: An ADO.NET connection manager enables a package to access data sources by using a .NET provider
- SQL Compact (SQLMobile): A SQL Server Compact connection manager enables a package to connect to a SQL Server Compact database
Note that the SQL Statements syntax differs based on the data source provider, as example you have to write T-SQL when data source is SQL Server. But when it is Excel or Access, SQL commands have to be supported by Microsoft.Jet.OLEDB and Microsoft.ACE.OLEDB providers.
Is Excel a relational database?
As mentioned above, Execute SQL Task in SSIS is used to execute SQL statements on a relational database, while Microsoft Excel is not a database engine. Because Excel is widely used to store data, and many users want to retrieve data using SQL Statements, Microsoft has given the ability to query tabular data stored in Excel and text file using Jet and ACE OLE DB providers, which pretend that Excel is a relational database and tries to force some data integrity rules while reading, i.e. forcing a single data type for each column by only reading a dominant data type and converting all other values to NULL.
Execute SQL Task configuration
As shown in the image below, the Execute SQL Task in SSIS has many options that can be configured. This article will only focus on the SQL Statement configuration. If you need more details on other options, you can refer to the official documentation.
Figure 3 – Execute SQL Task editor
There are different methods to define the SQL Statement that we need to execute. You can select the method in the SQLSourceType property:
- Direct Input: You can write the SQL Statement manually in SQLStatement property; it can be a simple query or you can just enter a stored procedure name and change the IsQueryStoredProcedure property to True
- File Connection: You can select a text file or .sql file that contains an SQL Statement using a File Connection manager
- Variable: You can select a variable that contains an SQL Statement; when changing the SQLSourceType property to Variable, a new property appears in the editor which is SourceVariable
Another method to set the SQL Statement is by using expressions, you can go to the Expression Tab, and select the SQLStatementSource property to write an expression. To learn more about writing expressions in SSIS, you can refer to the official documentation: Integration Services (SSIS) Expressions.
SqlStatementSource expression vs Variable Source Type
Many times I was asked on the difference between writing the SQL Statement within a variable and use it as a Source or using expressions to define the SQL Statement. And if one approach is more recommended.
In this section, I will first mention each method description from the official documentation of Execute SQL Task in SSIS, then I will try to clarify more based on my experience.
Figure 4 – SQLStatementSource expression
Referring to the official documentation of Execute SQL Task in SSIS:
- Variable Source: Set the source to a variable that defines the Transact-SQL statement. Selecting this value displays the dynamic option, SourceVariable
- SqlStatementSource expression: When you use an OLE DB connection manager, you cannot use parameterized subqueries because the Execute SQL Task in SSIS cannot derive parameter information through the OLE DB provider. However, you can use an expression to concatenate the parameter values into the query string and to set the SqlStatementSource property of the task
After mentioning each approach description, and based on my experience, I can say that there is no better approach, each one has its own use case.
As mentioned above using expression is needed if the SQL Command is built based on other variables, such as passing the table name dynamically, or passing parameter because Execute SQL Task in SSIS doesn’t support dynamic queries such as:
"SELECT * FROM [" + @[User::SchemaName] + "].[" + @[User::TableName] + "]"
The first approach can be used if the developer decided to store the whole SQL Command inside a variable even if the variable is evaluated as an expression.
In case that variable is evaluated as an expression, there is no difference between both approaches from the SQL command perspective, but it may differ in case that the statement is used multiple times within the package. As an example of the query mentioned above, if the table name is variable and this query needs to be executed at different steps within the package, evaluating the variable as an expression is more efficient than writing the expression multiple times. Also, using a variable may facilitate the debugging process since it can be easily monitored.
In addition, as I remember, in the old SSIS versions, the variable source option was not available and it was added later (I think in 2012) version to facilitate the developer work in case the whole SQL command is stored inside a variable rather than adding one variable expression in SQLStatementSource, as an example if the query is built within a Script Task and stored in a variable.
In the end, it is up to you to decide which approach you feel more comfortable with based on what we have mentioned.