Ahmad Yaseen
Destination data store type

Copy data from On-premises data store to an Azure data store using Azure Data Factory

October 29, 2020 by

In the previous article, Copy data between Azure data stores using Azure Data Factory, we discussed how to copy the data stored in an Azure Blob Storage container to an Azure SQL Database table using Azure Data Factory and review the created pipeline components and result.

In this article, we will show how to copy data from an on-premises SQL Server database to an Azure SQL Database using Azure Data Factory.

Prepare On-premises SQL Server

In order to copy data from an on-premises SQL Server database to an Azure SQL Database using Azure Data Factory, make sure that the SQL Server version is 2005 and above, and that you have a SQL or Windows authentication user to connect to the on-premises SQL Server instance.

As the SQL Server instance is located inside an on-premises network, you need to set up a self-hosted integration runtime to connect to it. A self-hosted integration runtime provides a secure infrastructure to run the copy activities between a cloud data store and a data store in a private network. The self-hosted integration runtime should be installed on the on-premises machine where the SQL Server instance is hosted.

In order to set up the self-hosted integration runtime, open the Azure Data Factory from the Azure Portal, click on the Manage icon at the left side of the window, select the Integration Runtime tab and click on the New option, as shown below:

New SH-IR

From the displayed Integration Runtime Setup window, choose the type of the integration runtime that you plan to create, which is Self-Hosted Integration Runtime in our scenario, then click Continue, as below:

IR Setup-Select type

After that, provide a unique meaningful name for your SH-IR and click Create to proceed. A new window will be displayed, providing you with the link that will be used to download the latest version of the Self-Hosted IR and the authentication key that will be used to register the SH-IR. Click on the download link, copy the installation media to the machine that hosts the on-premises SQL Server instance and copy the authentication key to register it once installed, as below:

SH-IR download link and key

Now run the installation wizard of the Self-Hosted IR on the on-premises machine where the SQL Server instance is hosted, in which you will be asked to accept the terms of the license agreement, specify the installation path of the SH-IR then install it, as shown below:

SH-IR installation wizard

Once installed, a new window will be displayed to register the Self-Hosted IR to the Azure Data Factory using the authentication key copied previously, as shown below:

SH-IR registeration

Then it will show the nodes that are assigned for the SH-IR, with the ability to configure the security of the SH-IR connection using certificates, as below:

SH-IR nodes

Finally, the Self-Hosted Integration Runtime is installed to the on-premises machine, registered and connected to the Azure Data Factory, as shown below:

SH-IR connected

You can verify the SH-IR registration and connection from the Azure Data Factory, from the integration runtime tab under the Manage tab, where the SH-IR is listed under the IR list and the status of the IR is Running, with the ability to upgrade the SH-IR to the latest version from that page, as shown below:

SH-IR status from ADF

Copy from On-premises to Azure SQL Database

With the Self-Hosted Integration Runtime installed and registered on the On-premises SQL Server machine, the source data store is ready. From the Azure SQL Database side, we need to make sure that the Allow Azure Services and resources to access this server firewall option is enabled, in order to allow the Azure Data Factory to access that database server.

To create a pipeline that will be used to copy data from the on-premises SQL Server to the Azure SQL Database, open the Azure Data Factory from the Azure Portal and click on the Author & Monitor option under the Overview tab, as shown below:

Author & Monitor

From the opened Data Factory, you can create and configure the copy pipeline using two methods: creating the pipeline components one by one manually, using the Create Pipeline option or to create the pipeline using the Copy data tool, as shown below:

ADF-Copy Data Tool

First, you need to provide a unique name for the copy activity that indicates the main purpose of that pipeline then specify whether to schedule this copy process or run it once, as shown below:

Copy Data - Properties

After that, you need to create the Linked Service that connects to the source data store, which is the on-premises SQL Server instance, as in our demo. To select the source data store type, click on the Database category, click on Create new connection then choose SQL Server data store, as shown below:

Source data store type

The New linked Service wizard will ask you to provide a unique name for the source Linked Service, the integration runtime that will be used to connect to the source data store, which is the Self-Hosted IR, the name of the on-premises SQL Server, the source database name and the credentials that will be used to connect to that database server. To verify the provided connection properties, click on the Test Connection option then Create to proceed with the Linked Service creation, as shown below:

Source Linked Service

And the created source Linked Service will be displayed under the Source Data stores, as shown below:

Source Data stores

With the source Linked Service created successfully, you need to create the Dataset that is a reference to the source table(s) you will read from. Where you will be requested to select the tables to be copied to the sink data store, or to write a query that filters the data you are interested in, as shown below:

Source Dataset

With the ability to apply additional filters to the source data, as below:

Filter the source data

The next step is to configure the destination Linked Service, which is the Azure SQL Database that can be found under the Azure category, as shown below:

Destination data store type

In the destination New Linked Service window, you need to provide a unique name for the destination Linked Service, the subscription name, the Azure SQL server and database names and finally the credentials that will be used by the Azure Data Factory to access the Azure SQL Database. After providing all required connection properties, click on the Test Connection option to verify the connection between the Data Factory and the Azure SQL Database then click Create to create the Linked Service, as shown below:

Destination Linked Service creation

And the destination Linked Service will be displayed under the Linked Services list, as below:

Destination Linked Service list

With the destination Linked Service created, you need to create the destination dataset, by selecting the destination table in the Azure SQL Database, where a new table will be created automatically if it does not exist or appending the copied data to an existing table, based on the source schema as shown below:

Destination dataset

In addition, you need to review the schema mapping between the source and destination tables, with the ability to provide any script that should be executed before copying the data, as below:

Tables schema mapping

And finally, you are asked to configure the copy activity settings, such as the performance and fault tolerance settings, as below:

Copy Data Settings

When you finish configuring all required settings, review all your choices before starting the pipeline deployment process, with the ability to edit it from the Summary window, then click Next to proceed:

Summary

At that step, the Linked services and the pipeline will be created then executed as shown below:

Pipeline deployment

To edit the created pipeline components, click on the Edit pipeline button in the previous window, or click directly on the Author tab, where you will be able to perform changes to the pipeline copy activity properties and the created datasets, as shown below:

Pipeline Author

Clicking on the Monitor button will move you to the Pipeline runs window that shows the execution logs for all pipelines, including the execution time, duration and result, as shown below:

Pipeline Monitor

The Azure Data Factory linked Services can be checked and edited from the copy activity properties or directly from the Manage option, as shown below:

Pipeline Manage

Finally, to verify that the data is copied from the on-premises SQL Server database to the Azure SQL Database, connect to the Azure SQL Database using SSMS and run the SELECT statement below to check the copied data:

Verify copy data

Conclusion

In this article, we showed how to copy data from an on-premises data store to an Azure data store. In the next article, we will show how to perform a transformation process for the data using the Azure Data Factory. Stay tuned!

Table of contents

Starting your journey with Microsoft Azure Data Factory
Copy data between Azure data stores using Azure Data Factory
Copy data from On-premises data store to an Azure data store using Azure Data Factory
Transform data using a Mapping Data Flow in Azure Data Factory
Run SSIS packages in Azure Data Factory

Ahmad Yaseen
Azure

About Ahmad Yaseen

Ahmad Yaseen is a Microsoft Big Data engineer with deep knowledge and experience in SQL BI, SQL Server Database Administration and Development fields. He is a Microsoft Certified Solution Expert in Data Management and Analytics, Microsoft Certified Solution Associate in SQL Database Administration and Development, Azure Developer Associate and Microsoft Certified Trainer. Also, he is contributing with his SQL tips in many blogs. View all posts by Ahmad Yaseen

967 Views