Rajendra Gupta
Configured SSIS package

Data Import from Amazon S3 SSIS bucket using an integration service (SSIS) package

October 28, 2019 by

This article explores data import in SQL Server from a CSV file stored in the Amazon S3 SSIS bucket.

Introduction

The SSIS package is an ETL tool for data extraction, transformation and loading data from various sources such as OLE DB, ODBC, Ado.NET, flat-file connections. Nowadays, many organizations have hybrid infrastructures model. It is a combination of both On-premise and cloud infrastructure. It is essential for a DBA and developer to be familiar with cloud solutions as well.

Recently I have a requirement in that to import S3 bucket property and file data with the following source and destination:

  • Source: CSV file stored in AWS S3 bucket
  • Destination: On-premise SQL Server database table

First, let’s take an overview of the AWS S3 bucket.

Overview of Amazon S3

AWS S3 is an acronym of Amazon Web Service Simple Storage service. It is a highly scalable and cost-effective cloud storage for data storage, archival. We can upload any file extension in the S3 bucket. Single file size cannot exceed beyond five TB. Users can control the file properties, accessibility such as public or private accessibility. We can store unlimited data in the S3 bucket. The best part is that you only pay for the used storage.

AWS Free Tier account

The starting point of the AWS journey is the AWS Console and creates an AWS free tier account. Go to this URL, and it opens the following free tier page of AWS:

AWS Free Tier account

We can create a free account, and it is 12 months free. You can use it as a starting point to learn AWS:

AWS Free Tier account usage

Few vital features of free tier account in AWS are as following:

  • 750 hours per month free usage of Amazon EC2 instances
  • 750 hours per month free usage of Amazon RDS t2.micro server
  • 30 GB per month usage of Amazon elastic storage
  • 5 GB free AWS S3 storage

You can go through the URL to read more about the free account in AWS. Click on Create a Free Account and go through the process by entering your personal information such as name, country, email address, etc.

Here is a catch; you need to provide a valid debit or credit card. It deducts 2 rupees from your credit or debit card. Do not worry; you will also get it back in a few days. It is a mandatory step to sign up for a free account. Amazon does not deduct any money until you have not exhausted the limit of the free tier account.

Amazon S3 configuration

To create an S3 bucket do the following:

  • Open the AWS web console and login to the account using IAM user having permissions to create, access, upload, and import data into the S3 bucket
  • Navigate to Services, Click on S3 in the Storage section:

    Amazon S3 configuration

    It lists out all existing AWS S3 Bucket. We can see bucket name, access level, bucket region and the created data of the S3 bucket

  • Click on Create bucket and provide the S3 bucket name:

    Create a new bucket

  • Amazon S3 is a region-specific resource; therefore; you need to select the region from the drop-down list. Then, click on Create:

    Specify bucket and region

    It creates the S3 bucket with default configurations and shows in the bucket list. This bucket belongs to Asia Pacific (Mumbai) region:

    S3 bucket properties

  • Open this S3 bucket, drag the CSV file from the source system, drop in the S3 bucket, and click the Upload button:

    Upload file into S3 bicket

    You can see the CSV file in the S3 bucket:

    Uploaded file properties

Amazon S3 SSIS data upload

As described earlier, we require data import from CSV file (stored in AWS S3 bucket) into the SQL server table.

Question: How do you connect an SSIS package with an AWS S3 bucket?

By default, the SSIS package does not allow you to connect with the AWS S3 bucket. It does not mean that we do have any mechanism to do so.

We can browse Visual studio Marketplace and use Amazon S3 SSIS Components (SSIS Productivity Pack) for our requirements:

Amazon S3 SSIS Components

It enables SSIS package and Amazon S3 SSIS bucket integration without writing any specific code. You can easily use this SSIS productivity pack to download complete the task effectively and efficiently.

Note: It is trial version software, and you can use this to test out software functionality using development tools such as SSDT. You do not require a license in this case. Read more about licensing and available licenses using the link:

Amazon S3 SSIS Components licensing

Download the appropriate 64-bit or 32-bit. Double-click on the installer and follow up the installation wizard for installing SSIS productivity pack with default configurations:

Download appropriate version

Amazon S3 SSIS package configuration

Once you installed SSIS Productivity Pack, launch Visual Studio (SSDT). I am using SSDT Visual Studio 2015 in this article.

S3 bucket connection

The first step is to create a connection pointing to the Amazon S3 bucket. Right-click on the connection manager and choose New connection:

Create a new connection

It opens a list of available connection managers. We require connecting with the AWS S3 bucket; therefore, select the Amazon S3 and click on Add. It opens the Amazon S3 Connection Manager.

In General page of the connection manager, we specify connection properties and login credentials to the AWS S3 bucket:

  • Access and secret key: We specify access and secret key on an IAM user. We get these keys from the IAM console at AWS web services

    • Logon to AWS Console and navigate to IAM in the services menu
    • Navigate to Users and click on a specific user
    • In the user, properties go to the Security credentials tab
    • Choose the access key and download the secret key in a CSV file

    Security credentials

  • You need to store the access key and secret key in a secured location. Specify it in the Amazon S3 Connection Manager. It authenticates user account in Amazon services using these keys, and you can see the bucket name in the drop-down list

    Amazon S3 connection Manager

    Click on Test Connection, and it checks for connection to the Amazon S3 SSIS bucket. Once the connection is successful, you get the following message:

    Test connection

    Click OK, and we can see AWS S3 Connection in the connection manager:

    New configured connection

Add a Data Flow task

In the next step, add a data flow task in the SSIS package for the Amazon S3 SSIS bucket:

data flow task

Rename the data flow task as AWS S3 Data Import:

Rename the data flow task as AWS S3 Data Import

Double-click on this task, and it takes you to the data flow screen. In the SSIS toolbox on the data flow tab, you can see options for Amazon S3 bucket:

  • Amazon S3 Destination
  • Amazon S3 Source
  • Premium Flat File source
  • Premium Flat File destination

Drag Premium Flat File Source in the data flow task from the SSIS toolbox:

Premium Flat file source

Double-click on Premium Flat File Source and it opens the editor window for configuration:

  • Connection Manager: Select the existing connection to the S3 bucket from the drop-down. If you do not have an existing connection to the S3 bucket, click on New. You require specifying keys (access and security key) information, as shown in previous steps
  • Source File Path: It is the CSV file path in the S3 bucket. Click on eclipse and browse to the CSV file path. You might have multiple files in the S3 bucket. The source file path should point to a valid CSV file

Click on eclipse (…) on the source file path and browse to the CSV file path:

Specify the path of CSV Amazon S3 SSIS

We can see the CSV file in the Source item path column.

  • The first row has the header: My CSV file first column contains column headers. Remove the check from here if the CSV file does not contains headers
  • Skip Empty Rows: CSV file might contain empty rows. Put a check on this to skip the empty rows

Premium flat file source editor

Click on columns tab to view excel file columns. We may uncheck the columns if we do not want to import them into SQL Server tables. It also shows the data type and length automatically. We can change the column data types and their properties in this window. For this demonstration, let’s go with default columns properties in the Amazon S3 SSIS package:

Columns and their properties

Click OK and add an OLE DB destination connection. This OLE DB destination should point to the SQL Server instance as per our requirement. Right-click and configure the destination with the SQL Server connection and SQL Server table. You also need to do the mapping between source and destination columns:

OLE DB destination connection

Once the OLE DB destination connection is successful, you can see the package as shown below:

Configured SSIS package

Before executing the Amazon S3 SSIS package, rename the tasks as follows:

  • Premium Flat File Source -> AWS S3 Bucket CSV file
  • OLE DB Destination editor: SQL Server destination table

    Rename the SSIS package tasks

Execute the SSIS package, and it transfers 100 rows from the source CSV file to SQL Server table.

Execute the SSIS package

Verify the records in the SQL Server table.

Verify the records in SQL Server

Conclusion

In this article, we explored the method to integrate the SSIS package and the Amazon S3 SSIS bucket. We can use the component specified in this article for data import, export between On-premise and cloud systems as well.

Rajendra Gupta
Latest posts by Rajendra Gupta (see all)
168 Views