In this article, we will give an overview of using Flat Files and Raw Files in SSIS, then we will illustrate some of the differences between using these two types.
This is the ninth article in the SSIS features face-to-face series which aims to remove confusion around similar features in SQL Server Integration Services.
In general, a flat file doesn’t contain an internal hierarchy; it may contain text, log entries, or data in tabular form. More specifically, SSIS Flat Files are text files that store tabular data and are manipulated line-by-line.
To handle Flat Files, you must create an SSIS Flat File connection manager, where you define the Flat File metadata. There are different formats for Flat Files that can be handled by the SSIS Flat File connection manager, as follows:
- Delimited: each row in the Flat File contains columns separated by a delimiter; the most popular format is comma-separated values (.csv)
- Fixed width: each row contains multiple columns having a fixed number of characters
- Ragged Right: This option is added to handle Flat Files that are not well-formatted. You can refer to the following article for more guidance: Working with ragged right formatted files in SSIS
If you open the SSIS Flat File connection manager editor, it contains four tabs:
- General Tab: Here, the following options are configured:
- Connection manager name and description
- Flat File path: Select the file path (if you need to create a new Flat File, then you should enter the desired name)
- Text encoding: Locale, Code page or Unicode
- Flat file Format: Choose if columns are delimited, fixed-width or ragged right
- Header row delimiter: Since some Flat Files are created from templates, you may have a file where the header row delimiter is different from other rows’ delimiters
- Header rows to skip: This option can be used to ignore header rows that contain meaningless information
- Column names in the first header: This option is to specify if the column names are found within the flat file. If this option is unchecked, then auto-generated column names are used (they can be changed in the advanced tab)
- Columns Tab: Here, you can configure how columns are extracted from each row:
- In delimited file format, you have to specify the column and row delimiter
- In fixed-width, you have to specify each column length
- In ragged right files, you have to specify the column markers
- Advanced Tab: Here, you can configure each column’s metadata such as length, data type, name and other attributes
- Preview Tab: Here, the data is reviewed based on the configurations you’ve made
Figure 1 – SSIS Flat File connection manager
In a Flat File, data are stored as text. When values with different data types are stored in a Flat File, you can implicitly convert them from the connection manager or the source component, or explicitly convert them using data conversion and derived column transformations. For more information, you can refer to the following article: SSIS Data types: Change from the Advanced Editor vs Data Conversion Transformations.
Figure 2 – SSIS Flat File source description from toolbox
To import or export data from Flat Files, you must use SSIS Flat File Source and SSIS Flat File Destination components within a Data Flow Task. Note, that if you are handling a Flat File that contains non-tabular data, you may need to read the Flat File using a script (task or component) and implement complex logic, or you may need to read each row as one column (length = 4000) and use a transformation to consume the file content. You can check the following links for examples of importing complex flat files in SSIS:
- How to load mixed record type fixed width file? And also file contain two header
- How to check column structure in SSIS?
- Fill SQL database from a CSV File
- SSIS ragged file not recognized CRLF
Figure 3 – SSIS Flat File destination description from toolbox
Raw Files are a kind of SSIS Flat File used to dump data between different ETL stages. The data is stored in binary format and can only be used by the SSIS Raw Files component.
To use Raw Files in SSIS, you don’t have to create a connection manager, since it can be defined within the Raw File source and Raw File destination components:
Figure 4 – SSIS Raw File destination
To create a Raw File, just add a Raw File destination in a data flow task. When you open the Raw File destination editor, there are two tabs:
- Connection Manager
Figure 5 – SSIS Raw File destination description from toolbox
In the connection manager tab, specify the file name (directly or from a variable), and choose the write mode option:
- Create always: Always creates a new file
- Create once: Creates a new file. If the file exists, the component fails
- Append: Appends data to a previously created file with similar metadata
- Truncate and Append: Overwrites the data in a previously created file with similar metadata
In the columns tab, select the columns you want to dump into the SSIS Raw File destination:
Figure 6 – SSIS Raw File source description from toolbox
After dumping data into a raw file, you must use a Raw File Source to read this data. This component is very similar to the destination component, except that there is no Write mode option:
Figure 7 – SSIS Raw File source
Note that the SSIS Raw File source can only be used to read a file created using a Raw File destination.
To read more about Raw Files, refer to the following official documentation:
Flat Files vs. Raw Files
Now I will illustrate the difference between both file types in SSIS.
SSIS Flat Files are widely used to dump data from relational databases to be used later for other purposes, but most people don’t know that they are not recommended from a performance perspective. Even though comma-separated values files (.csv) are one of the most popular data sources used, Raw Files are designed to deliver higher performance when transferring data.
SSIS Flat Files require parsing and validation, while the data in Raw Files are stored in native format and requires no translation and little parsing. In 2009, an experiment was conducted by John Welch to illustrate the difference between SSIS Flat Files and Raw Files from a performance perspective. You can read this article for more details:
Raw Files are very useful for implementing parallel data import logic since you can split a file over multiple Raw Files then import them in parallel.
On the other hand, Raw Files cannot be edited or consumed outside of SSIS, which makes them only usable for data staging purposes.
In conclusion, if you need to export data into a file for use in other systems or to be published, you will be best served with a data format that is widely used like SSIS Flat Files. But, if you need to dump data for use in a different ETL stage, Raw Files are recommended.