Sifiso W. Ndlovu

Using SSIS ForEach Loop containers to process files in Date Order

August 3, 2017 by

One positive thing to come out of my recent project that involved rewriting one of the Data Marts from our Data Warehouse environment was a confirmation of my suspicions with regards to the behavior of SQL Server Integration Services’ (SSIS) ForEach Loop Container. You see, I have long suspected that the ForEach File Enumerator type in SSIS’s ForEach Loop Container does not process time stamped text files in an order that could be deemed correct to the human eye. For instance, Figure 1 shows a list of text files containing data relating to Marital Statuses of FIFA 2016 Ballon D’Or nominees.

Figure 1: Text Files loaded by Name

The data contained in the files created in the morning of June 30th (suffixed with “AM”) is similar – with Lionel Messi’s marital status set to Single as shown in Figure 2.

Figure 2: Text file without marital status changes

Later on, that day, Lionel Messi got married and as a result the “2PM” file contains changes to Leonel Messi’s marital status as shown in Figure 3.

Figure 3: Text file with marital status changes

It is interesting to note that the default sort order of these text files in Windows Explorer is by file name – which looks to be incorrect as the file suffixed with “2PM” is listed ahead of the “7AM” file. This means that if we were to load data from these files into a Type 2 Marital Status dimension, the latest version of the data would come from the “7AM” file.

The correct order of processing these text files is to have them sorted by date modified as shown in Figure 4 wherein the “2PM” file will be the last one to be imported.

Figure 4: Text Files loaded by Date modified

Processing of Text Files by File Name

The screenshot in Figures 1 and 4, indicates that listing of the file name can appear differently depending on whether you are sorting by file name or date modified. Whilst Windows Explorer shows that the listing of files can be sorted in multiple ways, it looks like the ForEach File Enumerator type only processes text files in an order sorted by file name which – as we had indicated in Figure 1 – is incorrect. To demonstrate this, we make use of a sample SSIS package shown in Figure 5. The package begins by using an Execute SQL Task to clear the staging table. The next step involves using a Data Flow Task inside a ForEach loop container that iteratively loads the text files.

Figure 5: Sample SSIS Package

As shown in Figure 6, ForEach Loop Container is configured to use ForEach File Enumerator type and it processes files with file name like MaritalStatus_FIFA*.

Figure 6: ForEach File Enumerator type

Following the successful execution of the SSIS package shown in Figure 5, we are able to view all data that was imported into the staging table as shown in Figure 7. As already predicted, the ForEach loop container using ForEach File Enumerator type processed the files in a file name order. This is incorrect as the latest record for Lionel Messi (at line 5 in Figure 7) is loaded ahead of the 7AM file.

Figure 7: Data in Staging Table

Processing of Text Files by File Creation Time

The dangers of relying on the ForEach File Enumerator type is that we don’t have control in the way files are processed. We can get around this limitation in two ways:

  1. Renaming Text Files to Military Time

    The simplest way of getting your time stamped text files processed in the correct order, is to adopt a file naming convention that uses military time instead of the standard hour clock. As it can be seen in Figure 8, renaming of the hour clock part of the file names to military time has resulted into the files listed in the correct order in the Windows Explorer.

    Figure 8: Text files with Military Time

    Following the SSIS package execution, the data in our staging table is updated as shown in Figure 9. As it can be seen, the processed text file contains accurate marital status for Lionel Messi.

    Figure 9: Data in Staging Table in the correct order

    One significant limitation of the military time approach is that as SSIS developers we often don’t have control in terms of naming the text files. I recall several instances whereby my SSIS solution processed files that were prepared and dumped to an FTP location by a legacy 3rd party program. In such instances, you are usually given read permission on the FTP location and thereby prevented from editing the files.

  2. Processing Text Files using Foreach ADO Enumerator

    The recommended approach in terms of processing multiple time-stamped text files is using Foreach ADO Enumerator type instead of ForEach File Enumerator. The switch to Foreach ADO Enumerator type requires several changes to your SSIS package as shown in Figure 10. Again, the first step involves using the Execute SQL Task to clear staging tables.

    Figure 10: SSIS Package using ADO Enumerator

    I then use a Script Task (ST – Populate ListOfFiles) that uses methods from LINQ to sort text files by creation time and insert the output into a staging table. The main code of the Script Task is shown in Script 1.

    Script 1

    The Execute SQL Task (ESTPopulate Object Variable) retrieves a list that was built by the Script Task and stores this list into a local package object variable type. As shown in Figure 11, the ForEach loop container is then configured to use Foreach ADO Enumerator type and sources its data from local object variable – varObj.

    Figure 11: ForEach ADO Enumerator type

    The rest of the settings inside the ForEach loop container are similar to the package using the Foreach File Enumerator. Following the package execution, the data stored in the staging table will be similar to what is shown in Figure 9.

Conclusion

If your SSIS solution does not process multiple text files using the Foreach File Enumerator type, then you are probably not affected by the issue that has been discussed. However, for those dealing with multiple text files, consider switching over to the Foreach ADO Enumerator type.

Downloads

See more

For SSIS package documentation, consider ApexSQL Doc, a tool that enables documenting SSIS packages from file system, SSIS package store, SQL Server, SSISDB catalog and SSIS project file in different output formats, and with various included details.

To compare SSIS packages with each other, consider ApexSQL Diff.

References


Sifiso W. Ndlovu

Sifiso W. Ndlovu

Sifiso is a Johannesburg based certified professional within a wide range of Microsoft Technology Competencies such SQL Server and Visual Studio Application Lifecycle Management.

He is the member of the Johannesburg SQL User Group and also hold a Master’s Degree in MCom IT Management from the University of Johannesburg.

He currently works for Clientele Life as an Assistant Manager in Business Software Solutions.

View all posts by Sifiso W. Ndlovu
Sifiso W. Ndlovu
SSIS packages

About Sifiso W. Ndlovu

Sifiso is a Johannesburg based certified professional within a wide range of Microsoft Technology Competencies such SQL Server and Visual Studio Application Lifecycle Management. He is the member of the Johannesburg SQL User Group and also hold a Master’s Degree in MCom IT Management from the University of Johannesburg. He currently works for Clientele Life as an Assistant Manager in Business Software Solutions. View all posts by Sifiso W. Ndlovu

1,089 Views
  • Faced with the same requirement, I came up with a similar solution that requires fewer tasks. My approach was to use .Net functions to filter and sort the file names in memory and then store them directly into an object variable that can be ‘shredded’ by the ForEach loop. That eliminates the need to create or purge a database table.

    Here’s a version of the script task I implemented (modified to integrate with your code example)

    public void Main()
    {
    // This script references 2 SSIS variables:
    // Read-only: USER::CURR_TEMP_Folder (string) – folder containing the files to be sorted
    // Read-Write: USER::varObj (object) – object variable to contain the sorted table of file names

    // Create a dataset, named it unsorted, because it’s not yet sorted
    DataSet dsUnsorted = new DataSet();

    // Create a new table in the dataset
    DataTable filelistTable = dsUnsorted.Tables.Add();
    filelistTable.Columns.Add(“FileName”, typeof(string)); // Filename used for sorting [optional].
    filelistTable.Columns.Add(“FileDate”, typeof(DateTime));// Filedate used for sorting [optional].

    // Get all files within the folder
    string[] allFiles = Directory.GetFiles(Dts.Variables[“User::CURR_TEMP_Folder”].Value.ToString());

    // Variable for storing file properties
    FileInfo fileInfo;

    // Loop through the files in the folder
    foreach (string currentFile in allFiles)
    {
    // Fill fileInfo variable with file information
    fileInfo = new FileInfo(currentFile);

    // Columns: FileName, FileDate
    filelistTable.Rows.Add(fileInfo.Name, fileInfo.CreationTime);
    }

    // Apply a filter if needed. Note: like uses * instead of %.
    DataRow[] rows = dsUnsorted.Tables[0].Select(“FileName like ‘MARIT*'”, “FileDate ASC”);

    // Create a new sorted dataset that the SSIS foreach loop uses.
    DataSet dsSorted = new DataSet();
    DataTable filelistTableSorted = dsSorted.Tables.Add();
    filelistTableSorted.Columns.Add(“FileName”, typeof(string));

    // Fill the new dataset with the sorted rows.
    foreach (DataRow row in rows)
    {
    filelistTableSorted.Rows.Add(row[“FileName”].ToString());
    }

    // Store the dataset in the SSIS variable
    Dts.Variables[“varObj”].Value = dsSorted;

    Dts.TaskResult = (int)ScriptResults.Success;
    }

    • Sifiso Ndlovu

      Yup, i just tried it and it works! Thanks!