Hadi Fadlallah
This image shows a screen shot of the SSIS flat file connection manager

SSIS Flat Files vs Raw Files

October 29, 2019 by

In this article, we will give an overview of using Flat Files and Raw Files in SSIS, then we will illustrate some of the differences between using these two types.

This is the ninth article in the SSIS features face-to-face series which aims to remove confusion around similar features in SQL Server Integration Services.

Flat Files

In general, a flat file doesn’t contain an internal hierarchy; it may contain text, log entries, or data in tabular form. More specifically, SSIS Flat Files are text files that store tabular data and are manipulated line-by-line.

To handle Flat Files, you must create an SSIS Flat File connection manager, where you define the Flat File metadata. There are different formats for Flat Files that can be handled by the SSIS Flat File connection manager, as follows:

  • Delimited: each row in the Flat File contains columns separated by a delimiter; the most popular format is comma-separated values (.csv)
  • Fixed width: each row contains multiple columns having a fixed number of characters
  • Ragged Right: This option is added to handle Flat Files that are not well-formatted. You can refer to the following article for more guidance: Working with ragged right formatted files in SSIS

If you open the SSIS Flat File connection manager editor, it contains four tabs:

  1. General Tab: Here, the following options are configured:
    1. Connection manager name and description
    2. Flat File path: Select the file path (if you need to create a new Flat File, then you should enter the desired name)
    3. Text encoding: Locale, Code page or Unicode
    4. Flat file Format: Choose if columns are delimited, fixed-width or ragged right
    5. Header row delimiter: Since some Flat Files are created from templates, you may have a file where the header row delimiter is different from other rows’ delimiters
    6. Header rows to skip: This option can be used to ignore header rows that contain meaningless information
    7. Column names in the first header: This option is to specify if the column names are found within the flat file. If this option is unchecked, then auto-generated column names are used (they can be changed in the advanced tab)
  2. Columns Tab: Here, you can configure how columns are extracted from each row:
    1. In delimited file format, you have to specify the column and row delimiter
    2. In fixed-width, you have to specify each column length
    3. In ragged right files, you have to specify the column markers
  3. Advanced Tab: Here, you can configure each column’s metadata such as length, data type, name and other attributes
  4. Preview Tab: Here, the data is reviewed based on the configurations you’ve made

This image shows a screen shot of the SSIS flat file connection manager

Figure 1 – SSIS Flat File connection manager

In a Flat File, data are stored as text. When values with different data types are stored in a Flat File, you can implicitly convert them from the connection manager or the source component, or explicitly convert them using data conversion and derived column transformations. For more information, you can refer to the following article: SSIS Data types: Change from the Advanced Editor vs Data Conversion Transformations.

this image shows the description of SSIS flat file source component from the toolbox

Figure 2 – SSIS Flat File source description from toolbox

To import or export data from Flat Files, you must use SSIS Flat File Source and SSIS Flat File Destination components within a Data Flow Task. Note, that if you are handling a Flat File that contains non-tabular data, you may need to read the Flat File using a script (task or component) and implement complex logic, or you may need to read each row as one column (length = 4000) and use a transformation to consume the file content. You can check the following links for examples of importing complex flat files in SSIS:

this image shows the description of ssis flat file destination component from the toolbox

Figure 3 – SSIS Flat File destination description from toolbox

Raw Files

Raw Files are a kind of SSIS Flat File used to dump data between different ETL stages. The data is stored in binary format and can only be used by the SSIS Raw Files component.

To use Raw Files in SSIS, you don’t have to create a connection manager, since it can be defined within the Raw File source and Raw File destination components:

this image shows the raw file destination component editor

Figure 4 – SSIS Raw File destination

To create a Raw File, just add a Raw File destination in a data flow task. When you open the Raw File destination editor, there are two tabs:

  • Connection Manager
  • Columns

this image shows the raw file destination component description from the toolbox

Figure 5 – SSIS Raw File destination description from toolbox

In the connection manager tab, specify the file name (directly or from a variable), and choose the write mode option:

  • Create always: Always creates a new file
  • Create once: Creates a new file. If the file exists, the component fails
  • Append: Appends data to a previously created file with similar metadata
  • Truncate and Append: Overwrites the data in a previously created file with similar metadata

In the columns tab, select the columns you want to dump into the SSIS Raw File destination:

this image shows the raw file source component description from the toolbox

Figure 6 – SSIS Raw File source description from toolbox

After dumping data into a raw file, you must use a Raw File Source to read this data. This component is very similar to the destination component, except that there is no Write mode option:

this image shows the ssis raw file source editor

Figure 7 – SSIS Raw File source

Note that the SSIS Raw File source can only be used to read a file created using a Raw File destination.

To read more about Raw Files, refer to the following official documentation:

Flat Files vs. Raw Files

Now I will illustrate the difference between both file types in SSIS.

SSIS Flat Files are widely used to dump data from relational databases to be used later for other purposes, but most people don’t know that they are not recommended from a performance perspective. Even though comma-separated values files (.csv) are one of the most popular data sources used, Raw Files are designed to deliver higher performance when transferring data.

SSIS Flat Files require parsing and validation, while the data in Raw Files are stored in native format and requires no translation and little parsing. In 2009, an experiment was conducted by John Welch to illustrate the difference between SSIS Flat Files and Raw Files from a performance perspective. You can read this article for more details:

Raw Files are very useful for implementing parallel data import logic since you can split a file over multiple Raw Files then import them in parallel.

On the other hand, Raw Files cannot be edited or consumed outside of SSIS, which makes them only usable for data staging purposes.

In conclusion, if you need to export data into a file for use in other systems or to be published, you will be best served with a data format that is widely used like SSIS Flat Files. But, if you need to dump data for use in a different ETL stage, Raw Files are recommended.

Table of contents

SSIS OLE DB Source: SQL Command vs Table or View
SSIS Expression Tasks vs Evaluating variables as expressions
SSIS OLE DB Destination vs SQL Server Destination
Execute SQL Task in SSIS: SqlStatementSource Expressions vs Variable Source Types
Execute SQL Task in SSIS: Output Parameters vs Result Sets
SSIS Derived Columns with Multiple Expressions vs Multiple Transformations
SSIS Data types: Change from the Advanced Editor vs Data Conversion Transformations
SSIS Connection Managers: OLE DB vs ODBC vs ADO.NET
SSIS Flat Files vs Raw Files
SSIS Foreach Loop vs For Loop Container
SSIS: Execute T-SQL Statement Task vs Execute SQL Task
SSIS Lookup transformation vs. Fuzzy Lookup transformation
SSIS Pivot transformations vs. Unpivot transformations
SSIS Merge Join vs. Merge Transformation
Data Access Modes in SSIS OLE DB Destination: SQL Command vs. Table or View
SSIS XML Source vs XML task
SSIS Script task vs. Script Component
SSIS term extraction vs. term lookup
Hadi Fadlallah
ETL, Integration Services (SSIS)

About Hadi Fadlallah

Hadi is an SQL Server professional with more than 10 years of experience. His main expertise is in data integration. He's one of the top ETL and SQL Server Integration Services contributors at Stackoverflow.com . Also, he published several article series about Biml, SSIS features, Search engines, Hadoop, and many other technologies. Besides working with SQL Server, he worked with different data technologies such as NoSQL databases, Hadoop, Apache Spark. He is a MongoDB, Neo4j, and ArangoDB certified professional. On the academic level, Hadi holds two master's degrees in computer science and business computing. Currently, he is a Ph.D. candidate in data science focusing on Big Data quality assessment techniques. Hadi really enjoys learning new things everyday and sharing his knowledge. You can reach him on his personal website. View all posts by Hadi Fadlallah

168 Views