Frank Solomon
Start the Google Cloud resource creation process.

Build a Google BigQuery Resource

November 12, 2020 by

Introduction

We can rely on Azure SQL to build reliable, high-quality relational database solutions. In the cloud, Google offers BigQuery as a big data product that has large data capacities, and a standard SQL syntax. Although it can handle data manipulation, it works better as a data warehouse product because of certain product limitations. If we can integrate these products, they become even more useful. This two-part article will show how to link these products together. Part one here will show how to build a BigQuery resource, and then build a Google OAuth refresh token, or security key. Part two will show how to set up an Azure SQL data resource, and build, configure, and test the Azure Data Factory that will link those resources.

Set up the resource

Awhile back, I wrote articles showing how set up BigQuery, but because so much has changed about the product, we’ll examine the process here. Although we could use the sample datasets that Google provides, we’ll see how to upload our own dataset from scratch. The Google Cloud Free Tier provides $300 of Google Cloud resources for ninety days, which will cover the resources we’ll build. Our later steps will transfer this data to the Azure SQL resource. We’ll place the data from the zbp11totals.zip file, available from the U.S. Census Bureau here, into the resource we’ll build. This file has basic payroll information for American zip codes. After the download, extract CSV-format file zbp11totals.txt from it.

First, log into a Google account, and register for the free Google tier here. Even though it’s free, you’ll need a credit card for this step. At the resource page, click “Go to console” as shown in this screenshot:

Start the Google Cloud resource creation process.

When the console opens, click “My First Project” as shown in this screenshot:

Create a new project - the first step.

“Select a project” will open; click NEW PROJECT to open the project creation page, as shown in this screenshot:

Create a new project.

At the New Project page, name the project BigQueryDemoApp, ignore the Location box, and click CREATE, as shown in this screenshot:

Name the new BigQuery project.

Back at the console, click the project dropdown as seen in this screenshot:

The project dropdown.

Highlight BigQueryDemoApp and click OPEN, as shown here:

Select a specific project.

This opens the console once again.

Now, we need to create a dataset in BigQueryDemoApp. The dataset will hold the uploaded data as a SQL-like table. Highlight bigquerydemoapp, and click CREATE DATASET, as shown in this screenshot:

Create a project dataset.

The Create dataset window will open. Name this dataset BigQueryDemoApp, keep the defaults, and click Create dataset, as shown in this screenshot:

Set up the name and details for the BigQuery project dataset.

Back at the console, highlight the BigQueryDemoApp dataset, and click CREATE TABLE to create the new table, as shown here:

Create a a table for the project dataset.

When the Create Table page opens, name the table zbp11totals. a BigQuery file upload can auto-detect the column data formats, but here, this would build a ZIP column, for example, with an integer data type. As a result, it would truncate the first two characters of ‘00501’ as a ZIP column value. To solve the problem, place the text below in the Edit as text textbox of the Schema section.

ZIP:string, NAME:string, EMPFLAG:string, EMP_NF:string, EMP:integer, QP1_NF:string, QP1:integer, AP_NF:string, AP:integer, EST:integer, CITY:string, STABBR:string, CTY_NAME:string

This will specify the exact data types of the table columns that BigQuery will build, in the table it creates. Set the rest of the metadata choices as shown in this screenshot:

Set up details for the dataset table.

Next, scroll down, and expand Advanced options at the lower left. Uncheck Auto detect at the left- center, and click Advanced options at the lower left. The zbp11totals.txt source file has a header row, so set the Header rows to skip textbox to 1. Keep the other defaults, and click Create table as shown in this screenshot:

For the table data import, skip the source table header row.

Back at the console, we can see that we successfully created the table, as seen in this screenshot:

We successfully created the dataset table.

To test everything, type the below query in the Query editor textbox, and click Run.

This query uses a conventional, case-sensitive SQL Server syntax. In a BigQuery query, include the dataset name ahead of the table name. The result set will appear in the Query results section, as seen in this screenshot:

Test the table data with a simple query.

Build a Google OAuth refresh token

A later Azure configuration step will need the specific OAuth refresh token security value that we’ll generate now in the Google cloud resources. Start at this URL:

Drill down to APIs & Services -> Credentials as shown in this screenshot:

Start the process to generate Google Cloud security credentials.

Drill down to CREATE CREDENTIALS -> OAuth client ID as shown in this screenshot:

Create an OAuth client ID value.

The Create OAuth client ID page will open, set the below:

  • Application type = Web application
  • Name = BigQueryDemoApp
  • Authorized Redirect URIs -> URIs = http://127.0.0.1/

Scroll down to click CREATE, as shown in this screenshot:

Set up details for the OAuth client ID.

Scroll down, and click Create. Save the generated Client ID and Client Secret values, as seen in this screenshot:

Save the generated Client ID and Client Secret values for later.

Now, we need to build values for security credentials. Starting with this URL “template”

https://accounts.google.com/o/oauth2/v2/auth?
access_type=offline&
client_id={Client ID}&
prompt=consent&
redirect_uri={URL Encoded Redirect URI}&
response_type=code&
scope={URL Encoded Scope}

Build this list of values:

  • {Client ID} = the new Client ID value built at the Create OAuth client ID page above
  • {URL Encoded Redirect URI} = http%3A%2F%2F127.0.0.1%2F

This value started with the Authorized Redirect URI value http://127.0.0.1/ used in the Create OAuth client ID page above, and replaced the original embedded “:” with “%3A” and “/” with “%2F” as URL-encoded values:

{URL Encoded Scope} = https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fbigquery

Starting with the scope value “https://www.googleapis.com/auth/bigquery”, we made character replacements identical to those just above.

With these values, including a custom fake Client ID value for this example, the template will look like this:

https://accounts.google.com/o/oauth2/v2/auth?
access_type=offline&
client_id= dghw3456Wg3q4asdfhFZdnhtuio68shflsdn3vbhw3hs6.apps.googleusercontent.com&
prompt=consent&
redirect_uri=http%3A%2F%2F127.0.0.1%2F&
response_type=code&
scope=https%3A%2F%2Fwww.googleapis.com%2Fauth%2Fbigquery

Copy this URL, place it in a browser URL bar, and hit enter. It will first return an authorization page, as shown in this screenshot:

Choose an account for the OAuth authorization process that will cover the BigQuery resource.

Log in with the Google account used for this BigQuery development work. On the next page, click Advanced as shown in this screenshot:

OAuth authorization warning message.

Next, click Go to BigQueryDemoApp (unsafe) as shown in this screenshot:

Accept the warning message to continue.

At the confirmation page, click Allow as shown in this screenshot:

Confirm the authorization.

This will return an error page, as shown in this screenshot:

Ignore the page error message. Focus on the URL address bar.

Ignore that error – don’t worry about it. Instead, focus on the URL address bar value. It will look like this:

https://127.0.0.1/?
code=4/4ertyRerheyi56mns5ztuiodfghcvSRTsdfereuaffQqwec5h77zgsnM7thsdERQ45y354hebwEBs7AIDFHdfgh&
scope=https://www.googleapis.com/auth/bigquery

Pull out the “code” parameter from the URL string, and substitute “%2F” for the “/” value:

code=4%2F4ertyRerheyi56mns5ztuiodfghcvSRTsdfereuaffQqwec5h77zgsnM7thsdERQ45y354hebwEBs7AIDFHdfgh

This will become the authorization code value that we’ll need shortly. Now, we’ll move to a second URL. For this second URL “template”, build the below list of values:

https://www.googleapis.com/oauth2/v4/token?&
client_id={Client ID}&

client_secret={Client Secret}&
code={Authorization Code Value}&
grant_type=authorization_code&
redirect_uri={URL Encoded Redirect URI}

Where:

  • {Client ID} = the Client ID value built at the Create OAuth client ID page above
  • {Client Secret} = the Client Secret value built at the Create OAuth client ID page above
  • {Authorization Code Value} = the authorization code value we built in the previous step just above
  • {URL Encoded Redirect URI} = http%3A%2F%2F127.0.0.1%2F

With these values, including fake Client ID, Client Secret, and Authorization Code Values for this example, the template will look like this:

https://www.googleapis.com/oauth2/v4/token?&
client_id=dghw3456Wg3q4asdfhFZdnhtuio68shflsdn3vbhw3hs6.apps.googleusercontent.com&
client_secret=45wgfeWQRTqwe341gZQ2Er34&
code=4%2F4ertyRerheyi56mns5ztuiodfghcvSRTsdfereuaffQqwec5h77zgsnM7thsdERQ45y354hebwEBs7AIDFHdfgh&
grant_type=authorization_code&
redirect_uri=http%3A%2F%2F127.0.0.1%2F

To build the BigQuery security values we want, we need the POST value that this URL will return. We’ll use Postman for this. First, build a Postman account, open Postman, and place the templated URL in the Untitled Request bar as seen in this screenshot:

Use Postman to help build the security credentials.

Click SEND to generate the refresh token, as seen in this screenshot:

Click Send to generate the refresh token.

Save that generated refresh_token value. We’ll need it in Part Two. Note carefully that to build a new refresh token in this process, we can’t re-use any of the values previously used in this process, except for the application name. Here, we used “BigQueryDemoApp” as the application name. To build a new refresh token, we would need to start at the first step of this process.

Conclusion

In this article, we saw how to set up a BigQuery resource, and build a Google OAuth refresh token, or security key. Part Two will first show how to set up an Azure SQL data resource. Then, it will show how to build, configure, and test an Azure Data Factory that connects the Azure SQL and BigQuery resources.

Frank Solomon
Azure, Azure Data Studio

About Frank Solomon

Frank Solomon started out building Microsoft stack products, and he gradually focused on SQL Server. Some years ago, he began a parallel shift to writing and technical writing. He wrote published articles, he blogs at Bit Vectors, and he co-wrote The SQL Workshop for Packt Publishing, with SQL Shack writer Prashanth Jayaram. Frank is looking for his next writing / technical writing role. He levers his sharp software development skills, and sharp writing skills, to add great value and solve big problems. He has plenty of remoting experience, and he uniquely relies on the active voice to build high-quality writing products. See more about Frank at LinkedIn, and reach him at fbs.author@gmail.com

128 Views