Frank Solomon
Start building the Data Factory that connects the resources we'll use.

Link Google BigQuery resources into an Azure SQL resource

November 20, 2020 by

In previous article Build a Google BigQuery Resource, we saw how to build a BigQuery resource, and then build a Google OAuth refresh token. In Part Two here, we’ll build an Azure SQL data resource, and then build, configure, and test the Azure Data Factory that will link the cloud resources we built.

Set up an Azure SQL resource

In the SQL Shack article, Create an Azure SQL Database with built-in sample data, one of my fellow authors, showed how to set up an Azure SQL database. Build a new Azure SQL database named BigQueryFrontEnd, and in that database, build a table called BQ_SP_Result_Set with this script:

The screenshot shows the script in SQL Server Management Studio:

The BQ_SP_Result_Set table creation script in SQL Server Management Studio.

Build an Azure Data Factory

First, we’ll set up an Azure account. Then, we’ll create an Azure Data Factory named BQDemoDataFactory as shown in this section. This Data Factory will connect the Azure SQL and BigQuery resources. To do this, at portal.azure.com/#home, click Data factories to open the Data factories blade. At the Basics tab, we’ll pick the subscription and resource group values, and set the Instance details, as shown in this screenshot:

Create an Azure Data Factory.

Configure Git later, as shown in this screenshot:

Configure Git at a later time.

Disable Managed Virtual Network, and click the Public endpoint Connectivity method as shown in this screenshot:

Configure networking for this data factory.

This pick at the Networking tab might need a SQL Server configuration that we’ll see later. Also, a private endpoint for a production data factory could offer “better” security, but for this demo, “public” is probably easier. Click through the Tags tab, as shown in this screenshot:

Click through the Tags tab.

Click “Create” at the last tab, as shown in this screenshot:

Click Create to create the data factory.

This screen shows that the data factory completed successfully:

The data factory creation completed successfully.

At All Resources, click BQDemoDataFactory as shown in this screenshot:

Click the new data factory at the Azure All Resources page.

Configure the Azure Data Factory

Now, click “Author & Monitor” as shown in this screenshot:

Click Author & Monitor to start building the data factory.

On the next screen, click the pencil icon as shown in this screenshot:

Start building the Data Factory that connects the resources we'll use.

The Data Factory definition page will open. As a basic definition, we can say that an Azure data factory connects two or more resources that must somehow exchange and/or manipulate data between them. Here, we want to connect the BigQuery and Azure SQL resources we built, so we’ll use an Azure data factory. Microsoft defines a dataset as “a named view of data that simply points or references the data you want to use in your activities as inputs and outputs.” Since a data factory has dataset components, we’ll first build one dataset for the BigQuery resource, and one for the Azure SQL resource. Drill down to New dataset, as shown in this screenshot:

Build a new dataset.

On the next page, search for and click bigquery and click Continue, as shown in this screenshot:

A BigQuery data store will become the data source.

We’ll first build a linked service for the dataset. A linked service works like a classic connection string. It tells the dataset how to connect to a specific data resource. On the next page, click New, as shown in this screenshot:

Build a linked service for the dataset.

The New linked service page will open. On this page, use the Project name value defined in the earlier Set Up A BigQuery Resource section of Part One for the Project ID value. Then, use the below values defined in the earlier Build a Google OAuth Refresh Token section of Part One for the similarly-named values. These values will build the linked service, as shown in this screenshot:

  • Client ID
  • Client Secret
  • Refresh Token
Set up the linked service values.

Scrolling down, and in the lowest text box, add the Refresh Token value created in Part One. Click Test connection. If the test returned Connection successful, click Create, as shown in this screenshot:

Test the BigQuery resource linked service.

Back at the dataset page, test the connection, refresh the table dropdown, and pick the BigQueryDemoAppDS.zbp11totals table, as shown in this screenshot:

Test the BigQuery dataset connection.

Click Preview data to see a sample of the BigQuery resource data, as seen in this screenshot:

Preview the BigQuery dataset data.

In the Properties pane, name the dataset BQDemoBigQueryDataset, and add a description as shown in this screenshot:

Add the dataset name and description.

We also need a dataset for the target Azure SQL resource. Build a new Azure SQL dataset, as shown in this screenshot:

Pick the data store for the dataset.

On the next page, name the dataset BQDemoAzureSQLDataset, and click New to start building its linked service, as shown in this screenshot:

Build a linked service for the dataset.

The New linked service page will open. Using the Azure subscription value defined in the earlier Build An Azure Data Factory section of this article, and the below values defined in the earlier Set-Up An Azure SQL Resource section of this article, build the linked service with the values shown in this screenshot:

  • Server name
  • Database name
  • User name

Set up the linked service values.

Scroll down, and add the Azure SQL Server password. Test the connection, and if it worked, click Create as seen in this screenshot:

Test the linked service.

Back at Set properties for the linked service, pick the dbo.BQ_SP_Result_Set table, and click OK, as shown in this screenshot:

Pick the target table.

These settings will point the dataset to the dbo.BQ_SP_Result_Set table in the Azure SQL resource. In the Properties pane, set the Description as shown in this screenshot:

Add the dataset description.

On the Schema tab, click Import schema as shown in this screenshot:

Import the table schema.

Now we can build the Azure pipeline that will move the data. Back at the Factory Resources page, drill down to New pipeline as shown in this screenshot:

Build a new data factory pipeline to move the data.

In the Activities pane, click and drag Copy data to the next pane. Set the name and description as shown in this screenshot:

Expand the pipeline creation space.

To expand the space we’ll need, pull up the General tab separator at the bottom. In the General tab, set the Name and Description boxes, and ignore the default values in the other boxes, as shown in this screenshot:

Add name and description values for the pipeline.

Click the Source tab. At the Source dataset dropdown, pick and preview the BQDemoBigQueryDataset defined earlier; for Use query, click Table as shown in this screenshot:

Add the BigQuery dataset as the pipeline data source.

For the Sink tab, pick the BQDemoAzureSQLDataset in the Sink dataset dropdown, and pick None in the Stored procedure name dropdown, as shown in this screenshot:

Add the pipeline data sink, or "target."

At the Mapping tab, click Import schemas, and then Preview source as shown in this screenshot:

Import the data table schema at the Mapping tab.

To validate the pipeline, click Validate as shown in this screenshot:

Validate the pipeline.

To save the data factory resources that we build, we need to publish them. Click the upper-right control to close the validation pane, and then click Publish all to save the pipeline, as shown in this screenshot:

Click "Publish all" to save the resources we built.

At this point, we set up the Azure data factory. Click Debug to launch it, as shown in this screenshot:

Click Debug to test the pipeline.

When we run the pipeline, we might see a failed status, as shown in this screenshot:

If the data factory pipeline fails, we need to figure out why.

This can happen if we clicked “Public endpoint” when we created the data factory earlier. Click the circled icon shown above to open the error message box, as shown in this screenshot:

A potential pipeline failure message.

To solve the problem, run stored procedure sp_set_firewall_rule on the master database in the Azure SQL Server resource. Use IP address ‘xxx.xxx.xxx.xxx’, as seen in the message (masked here), to build this SQL Server command:

In the Azure resource, run the command on the master database, as shown in this screenshot:

Configure the firewall rule to solve the pipeline failure.

Finally, if we click Debug again, we should see that the pipeline worked:

The Azure Data Factory pipeline worked.

We can test the data in the target database with these queries:

This screenshot shows the results of those queries:

The data copy worked.

Conclusion

Organizations often host different sets of data across different data products. To extract useful insights across those datasets, developers must somehow link the host products together. This article series showed how to link a Google BigQuery resource with an Azure SQL resource. Unfortunately, while BigQuery offers stored procedures, we can’t yet call a BigQuery stored procedure from an Azure SQL resource. If we could, we’d get a huge time and money savings, because a BigQuery stored procedure call would return a filtered result set, instead of the entire table as shown here. However, Microsoft and Google improve and enhance their products all the time, and we can expect that soon enough, Azure and BigQuery product enhancements will make these calls possible. Still, this article showed how to set up BigQuery and Azure SQL resources, and how to connect them. These techniques will certainly solve existing problems, and prepare us to solve the other problems that expected product enhancements will cover.

Frank Solomon
Azure, Azure Data Studio

About Frank Solomon

Frank Solomon started out building Microsoft stack products, and he gradually focused on SQL Server. Some years ago, he began a parallel shift to writing and technical writing. He wrote published articles, he blogs at Bit Vectors, and he co-wrote The SQL Workshop for Packt Publishing, with SQL Shack writer Prashanth Jayaram. Frank is looking for his next writing / technical writing role. He levers his sharp software development skills, and sharp writing skills, to add great value and solve big problems. He has plenty of remoting experience, and he uniquely relies on the active voice to build high-quality writing products. See more about Frank at LinkedIn, and reach him at fbs.author@gmail.com

110 Views