Ahmad Yaseen
Source Linked service

Copy data between Azure data stores using Azure Data Factory

October 27, 2020 by

In the previous article, Starting your journey with Microsoft Azure Data Factory, we discussed the main concept of the Azure Data Factory, described the Data Factory components and showed how to create a new Data Factory step by step.

In this article, we will show how to use the Azure Data Factory to orchestrate copying data between Azure data stores.

Copy Activity Overview

The Copy activity in Azure Data Factory is used to copy data between the different data stores that are located on-premises and in the cloud, in order to use the copied data in other transformation or analysis tasks or copy the transformed or analyzed data to the final store to be visualized.

The copy activity supports various numbers of data stores such as Azure data stores, On-premises relational and non-relational data stores, file stores, generic protocols such as HTTPS and services and applications such as Service Now. Azure Data Factory supports reading from and writing to different file formats, including Avro, Binary, Delimited text, Excel, JSON, ORC, Parquet and XML file formats. For a complete list of the supported data sources (called sources), and data targets (called sinks), in the Azure Data Factory copy activity, check the Supported Data Stores and Formats.

Azure Data Factory uses the Integration Runtime (IR) as a secure compute infrastructure to run the copy activity across the different network environments and make sure that this activity is performed in the closest possible region to the data store. You can imagine it as a bridge between the copy activity and the linked services.

Azure Data Factory supports three types of Integration Runtimes: (1) Azure Integration Runtime that is used when copying data between data stores that are accessed publicly via the internet, (2) Self-Hosted Integration Runtime that is used to copy data from or to an on-premises data store or from a network with access control and (3) Azure SSIS Integration Runtime that is used to run SSIS packages in the Data Factory as we will see later in this articles series. For more information, check the Integration Runtime in Azure Data Factory.

Take into consideration that, you can perform the copy activity in the Data Factory using different tools and SDKs, include the Copy Data tool, Azure portal, .NET SDK, Python SDK, Azure PowerShell, REST API and Azure Resource Manager template.

Copy Data from Azure Blob Storage to Azure SQL Database

In order to copy data from a text file stored in an Azure Blob Storage to an Azure SQL Database table, we need to make sure that we have a storage account with a blob container that contains the data file, as shown below:

Blob Container

And an Azure SQL Database, with Allow Azure services and resources to access this server firewall option enabled, to allow the Azure Data Factory to access that database server, as below:

Db Firewall

Also, we will create a new table in the Azure SQL Database, in order to insert the data from the text file stored in the blob container to that Azure SQL Database table, as in the CREATE TABLE T-SQL statement below:

Create Azure SQL Database Table

Now we are ready to configure the Data Factory pipeline that will be used to copy the data from the blob container to the Azure SQL Database. To configure the copy process, open the Azure Data Factory from the Azure Portal and click on the Author & Monitor option under the Overview tab, as shown below:

Author & Monitor

From the opened Data Factory, you have two options to configure the copy pipeline, the first one is to create the pipeline components one by one manually, using the Create Pipeline option. The second option, which we will use in this demo, is to create the pipeline using the Copy data tool, as shown below:

ADF Pipeline creation options

The first step in the Copy Data tool is to provide a unique name for the copy activity and specify whether to schedule this copy process or run it once, as shown below:

Copy Data tool - Step 1

After that, the Copy Data wizard will ask you to specify the type of source data store in order to create a Linked Service to connect to that data source. In this demo, we will copy data from an Azure Blob Storage, as shown below:

Source Linked service

When you click on the Azure Blob Storage source type then click on Continue, a new window will be displayed, in which you need to provide a unique name for the Linked Service, the subscription where the storage account is created, the name of the storage account and the authentication method that will be used to connect to that storage account. After providing all required information, click on Test Connection to verify that you can reach that storage account then click on Create to create the Linked Service, as shown below:

Create Linked Service.

Once created successfully, the source linked Service will be displayed in the data sources window, as below:

Data sources Window

After that, you need to create the Dataset for the data source, in which you simply need to specify the input data file, or folder in case you need to copy multiple files recursively, as shown below:

Source data set

When you specify the input file, the Data Factory will check the format for that file and allow you to review the file format settings, preview the source data and perform the required changes to fit your requirements, as shown below:

Input file format

The next step is to configure the Linked Service and Dataset for the destination data store. In this demo, we will copy the input file to an Azure SQL Database table. To achieve that, choose the Azure SQL Database from the New linked Service list, then click Continue to configure the sink Linked service, as shown below:

sink Linked service type

In the displayed New linked Service window, provide a unique name for the sink Linked Service, the subscription name, the Azure SQL server and database names and finally the credentials that will be used by the Azure Data Factory to access the Azure SQL Database. After providing all the required information, click on the Test Connection option to verify the connection between the Data Factory and the Azure SQL Database then click Create to create the Linked Service, as shown below:

Create Sink Linked Service

Once created successfully, the sink Linked Service will be listed in the Destination Data stores, as below:

Destination Data Sources

After creating the Linked Service that points to the Sink data store, we need to create the sink dataset, that specifies the target database table, as shown below:

Sink Dataset

Also, you need to review the schema mapping between the input data file and the sink data table and provide any script that should be executed before copying the data, as shown below:

Schema mapping

Now we have the Linked Services and Datasets for both the source and sink data stores configured successfully. The next step now is to configure the copy activity settings, such as the performance and fault tolerance settings, as below:

Copy Activity Settings

From the Summary window, review all the copy process configurations, with the ability to edit it from the Summary window before running the pipeline, as shown below:

Summary window

Now we have reviewed all the pipeline settings and it is ready for deployment and execution. The Data Factory will create all pipeline components then execute the pipeline. Executing the pipeline means executing the copy activity that will copy the data from the input file that is stored in the Azure Blob Storage and write it to the Azure SQL Database table, as shown below:

Pipeline deployment

To verify that the data is copied to the Azure SQL Database table, connect to the Azure SQL Database using the SSMS then run a select statement to read the table data, and you will see that the data is copied successfully, as shown below:

Verify copy using SSMS

Let us see where the different pipeline components are created in the Azure Data Factory. Click on the Author button of the Data Factory and review the available pipelines and datasets, where you can see that the pipeline that is created using the Copy Data tool is saved in the Data Factory pipelines list and the source and destination datasets are available under the Datasets list. Clicking on the created pipeline, you will see the Copy Activity that is used to copy the data from the source file to the destination table, as shown below:

ADF Author

Clicking on the Monitor button, you can review the execution logs of the created pipeline, including the execution time, duration, and result, as shown below:

ADF monitor

The Linked Services of the Data Factory pipelines can be checked by clicking on the Manage button, allowing you to review all available Linked Services and edit it based on your requirements, as shown below:

ADF Monitor

Conclusion

In this article, we showed how to copy data between Azure data stores and verify the pipeline components and results. In the next article, we will see how to copy data from an On-premises data store to an Azure data store using Azure Data Factory. Stay tuned!

Table of contents

Starting your journey with Microsoft Azure Data Factory
Copy data between Azure data stores using Azure Data Factory
Copy data from On-premises data store to an Azure data store using Azure Data Factory
Transform data using a Mapping Data Flow in Azure Data Factory
Run SSIS packages in Azure Data Factory
How to use iterations and conditions activities in Azure Data Factory
How to schedule Azure Data Factory pipeline executions using Triggers
How to Debug a Pipeline in Azure Data Factory
Dependencies in Azure Data Factory
Using Templates in Azure Data Factory
Using Source Control in Azure Data Factory
How to monitor Azure Data Factory
Ahmad Yaseen
Latest posts by Ahmad Yaseen (see all)
Azure

About Ahmad Yaseen

Ahmad Yaseen is a Microsoft Big Data engineer with deep knowledge and experience in SQL BI, SQL Server Database Administration and Development fields. He is a Microsoft Certified Solution Expert in Data Management and Analytics, Microsoft Certified Solution Associate in SQL Database Administration and Development, Azure Developer Associate and Microsoft Certified Trainer. Also, he is contributing with his SQL tips in many blogs. View all posts by Ahmad Yaseen

168 Views