Rajendra Gupta
Data ingestion

Copy Data tool to import data into Azure SQL Database from web sources

November 8, 2021 by

This article will explore the Copy Data tool for importing data into Azure SQL Database from a web source.

Introduction

Suppose you need to import data into Azure database tables from a CSV file. The source of the CSV file is a Web URL. Usually, we download the file on a local directory and import data into a database table. In the article, Copy Data tool to export data from Azure SQL Database into Azure Storage, we use the Copy Data tool in Azure Data Factory that did the following tasks.

  • Export data from Azure Database tables into CSV files
  • Store the CSV files into the Azure blob storage container

The Copy Data tool contains 90+ built-in containers that allow you to configure a data loading task without expertise in Azure Data Factory entities (linked services, datasets, pipelines).

Pre-requisites

  • You require a URL of the CSV files for data import. For this article, we will refer sample data set from stats.govt.nz

    Sample Data

  • Azure SQL Database Server: We will import the CSV file into a database table. Therefore, you need an Azure database. For this article, we will use the following SQL Database:

    Azure SQL Database instance

  • Azure Data Factory (ADF) V2 instance: We configured the ADF instance in the previous article, as shown below. You can use it or deploy a new ADF instance using Azure portal -> Data Factory

    Azure Data Factory

Azure Data Factory

Azure Data Factory is a cloud-based ETL (Extract-Transform-Load) that provides data-driven data transformation and movement pipelines. It contains interconnected systems for providing an end-to-end platform.

  • Data Ingest: The Azure Data Factory (ADF) has 90+ standard connections for various data sources. It contains data collection at a centralized location for subsequent processing
  • Mapping Data Flow: The ADF deploys data transformation using the graphical interface. It simplifies the configuration of source and destination tasks
  • Azure Compute: The ADF can execute data-driven workflow without deploying the Azure resources. You can deploy an ADF service and start deploying various components and data integration
  • Data Ops: The ADF can integrate with the GitHub and Azure DevOps to simplify managing data pipelines
  • Monitoring: The ADF can integrate with the Azure Monitor, PowerShell, Azure portal health panel. This way, you can monitor execution progress with both scripts and the portal

In the ADF dashboard, click on the Ingest to launch the copy data tool for data import and export.

Data ingestion

For this article, we will use the Built-in copy task. In the task cadence, we use the default option – Run once now.

Copy data tool tasks

You can choose schedule or Tumbling window for additional scheduling configuration, as shown below.

Task cadence

Source Data Store configurations

In the source data store, choose the source type as HTTP. You can use this source for both HTTP and HTTPS URLs.

Source data format

Click on New Connection. In the new connection (HTTP) window, specify a name, description (optional) and specify the CSV file Web URL in the Base URL section.

New HTTP request

By default, it uses basic authentication that requires you to enter credentials for authentication to the data source. Server authentication types

For the Web URL, we do not require any credentials to access the CSV file. Before creating the connection, click on Test Connection to validate the connection to the Web URL.

URL authentication

Click on Create, and it returns to the source data store page with the base URL information.

Source data request timeout, max connections

Click on Next, and it launches the file format settings page.

File format settings

You get an option – Detect text format. It automatically detects the file format, column delimiter, row delimiter. As shown below, it has automatically chosen option – First row as a header.

Detect text format

Click on the option – Preview Data. It opens a pop-up window and displays few data rows from the source. You can validate the data format. If it is not correct, you need to modify file format settings and preview data again.

Preview data

In this preview data page, click on the schema to check the source data columns and their type. As shown below, it is using columns data type as a string.

View schema

Destination Data Source configuration

On the next page, choose the destination data source from the drop-down list. For this article, we choose Azure SQL Database.

Destination data store

Select the Azure SQL Database Server, database, authentication type, and enter the credentials for importing the data.

Destination connection accounts

Once you create a destination datastore connection, it displays an additional configuration, as highlighted below.

  • Source: HTTP file
  • Target: Blank

Source to target mapping

This tool can automatically create the SQL table for you in Azure SQL Database. However, I recommend that you deploy the table based on source columns and their data types in advance. For this article, I have created a demo table with the varchar data type.

Create the table in your existing Azure SQL Database and click on the option – Use existing table. Instead of choosing the option – Auto-create a destination table with the source schema, choose the table from the drop-down list.

Use existing table

Here, we use the table [dbo].[DemoImport] for importing data from the CSV file stored in the Web URL.

Select the table

On the next page, verify the column mapping between your source (CSV) and destination (SQL table). In the type column, it shows the source and destination columns data types.

Column Mappings

If there are many mapping issues or columns left, click on the New mapping, and choose columns from source and destination. Enter the name of the configured copy tool task. The remaining options, such as Task description, fault tolerance, logging, are optional.

Copy data tool settings

The summary page displays source and destination data source graphically. You can verify or modify the configurations if any.

View summary page

Further step validates the copy runtime environment.

Data validation

It does the following tasks.

  • Create source and destination data sets
  • Create pipelines
  • Creates triggers
  • Starts triggers

Click on finish and view pipelines runs. As shown below, its status is succeeded.

Pipeline Monitor

Hover your mouse over activity runs and click on details.

Activity runs It gives helpful information for monitoring the data flow. The information captured is as below.

  • Data read: Total amount of data retrieved from the source data store, including data copied in this run and data, resumed from the last failed execution
  • Files read: Total number of files copied from the source data store, including files copied in this run and files resumed from the last failed run
  • Number of rows read
  • Source peak connections: Peak number of concurrent connections established to the source data store during the Copy activity run
  • Data written: The actual amount of data written/committed to the sink. The size may be different from data Read size, as it relates to how each data store stores the data
  • Rows written: Number of rows copied to sink. This metric does not apply when copying files as-is without parsing them, for example, when the source and sink datasets are binary format type or other format types with identical settings
  • Target Peak connections: Peak number of concurrent connections established to the sink data store during the Copy activity run
  • Copy duration
  • Throughput (KB per second)
  • Used DIU – The effective Data Integration Units during the copy
  • used parallel copies: The effective parallel Copies during copy
  • Working duration in reading from the source – The amount of time spent retrieving data from the source data store
  • Writing to sink – The amount of time spent on writing data to sink data store

ADF success data

You can connect to Azure SQL Database, validate the number of rows (14,946), and view the imported data.

Verify Azure data

Conclusion

This article explored the Copy Data tool of Azure Data Factory for importing data into Azure SQL Database using the Web source. ADS provides a quick solution for data import, export with various data sources and customizable pipelines. It eliminates the requirement of downloading files into the local directory and import them.

Rajendra Gupta
Latest posts by Rajendra Gupta (see all)
168 Views