This article will walk you through creating a new SQL pool within an existing Azure SQL Server as well as catalog the same using the Azure Purview service.
Data is generated by transactional systems and typically stored in relational data repositories. This data is generally used by live applications and for operational reporting. As this data volume grows, this data is often required by other analytical repositories and data warehouses where it can be used for referential purposes and adding more context to other data from across the organization. Transactional systems (also known as Online Transaction Processing (OLTP) systems) usually need a relational database engine, while analytical systems (also known as Online Analytical Processing (OLAP) systems) usually need analytical data processing engines. On Azure cloud, it is usually known that for OLTP requirements, SQL Server or Azure SQL Database can be employed, and for analytical data processing needs, Azure Synapse and other similar services can be employed. SQL Pools in Azure Synapse host the data on an SQL Server environment that can process the data in a massively parallel processing model, and the address of this environment is generally the name of the Azure Synapse workspace environment. At times, when one has already an Azure SQL Server in production or in use, the need is to have these SQL Pools on an existing Azure SQL Server instance, so data in these SQL pools can be processed per the requirements on an OLAP system as well as the data can be co-located with data generated by OLTP systems. This can be done by creating SQL Pools within the Azure SQL Server instance itself. In this article, we will learn to create a new SQL Pool within an existing Azure SQL Server followed by cataloging the same using the Azure Purview service.
As we intend to create a new SQL Pool in an existing Azure SQL Server instance, we need to have an instance of Azure SQL in place. Navigate to Azure Portal, search for Azure SQL and create a new instance of it. We can create an instance with the most basic configuration for demonstration purposes. Once the instance is created, we can navigate to the dashboard page of the instance and it would look as shown below.
As we are going to catalog the data in the dedicated SQL Pool hosted on Azure SQL instance, we also need to create an instance of Azure Purview. We would be using the Azure Purview studio from the dashboard of this instance, tonregister this SQL Pool as the source and catalog the instance.
Creating SQL pool in Azure SQL
Now that we have the Azure SQL Server instance, we can start creating the new SQL Pool. Click on the New dedicated SQL pool to initiate this process. This feature was earlier known as SQL Data warehouse which is not rebranded as SQL Pool. This feature is identical to SQL Pools in Azure Synapse. So, if you are already familiar with the SQL Pools in Azure Synapse, you do not need to learn anything new to operate these SQL Pools. The only difference here would the hosting server address that is holding these SQL Pools. Once you click the button to create the new pool, it would invoke a wizard as shown below.
In this step, we need to provide the basic details like the name of the new SQL Pool being created. We cannot change the Azure subscription, resource group and server name as it’s already selected based on the Azure server instance under which we intend to create the new SQL Pool.
Next, we need to configure the capacity of the SQL Pool. We need to be mindful of the capacity that we select for the pool, as it can be quite expensive for some, depending on the selected capacity and the duration for which the pool is kept live. By default, the pool capacity is 1000 DW units, which may be ideal for production environments with reasonable data loads. For a demo environment when we are creating a brand-new pool, we can use the smallest available capacity unit to save on costs.
Once you select the 100 capacity units, you would be able to see the cost would come down to $1.51 per hour. For this exercise, we would be terminating the instance in less than an hour, so the max we may end up paying would be $1.15 for this pool.
After the capacity is selected, the next step is to configure networking. This is though an optional step and we can skip it or continue with the default options as well. Keep in view that by default, Azure services are not allowed to access this server. So, we may need to provide explicit permission to allow other Azure Services like Azure Purview to access this pool.
In the next step, in the additional settings section, we can select to restore an existing backup or even create some Sample data that typically comes out of the box when creating database instances in the Azure SQL Server instance. As we may want to catalog some data from this pool with Azure Purview, we can use this sample data or optionally one can create some sample data manually as well.
Once these settings are configured, complete the next steps, review the configuration, and create the instance of this SQL Pool within Azure SQL. Once this pool is created, log on to this SQL Server instance using SQL Server Management Studio (SSMS), and you would be able to see the pool as shown below. For comparison purposes, the below screen shows how the SQL Pool would be shown in an Azure Synapse workspace instance as well as an Azure SQL Server instance. If you expand these pools in each instance and compare the object hierarchy in these pools, you will find that it is identical.
Now that the SQL Pool is created and we can access it as well, it’s now time to catalog the instance. Navigate to the Azure Purview instance on the Azure portal, open Purview Studio, and click on the Register button under the Data section to register a new data source. When we click on the button, it will open a new wizard as shown below. In the list of supported data sources, you can find a data source named Azure Dedicated SQL Pool. This source should not be confused with the SQL Pool in Azure Synapse as this pool is expected to be on Azure SQL Server. Azure Synapse has native integration with Azure Purview.
Once the source is selected, provide a relevant name for this data source, and register it. Once the source is registered, we need to schedule a scan of this data source in Azure Purview, so that it can catalog data assets within this data source. Click on the button on the registered data source that says Schedule Scan, and it would open a new dialog as shown below. By default, you won’t find any database i.e. the SQL Pool listed, as we have not provided explicit permissions on the SQL Pool to let Azure Purview access it. Expand the see more link, use the managed identity name, and add it as a user in the dedicated SQL pool using SSMS. Also, provide permissions to this user to access the data objects in this pool. Once done, you would be able to see the name of the pool listed in the database name as shown below.
Once the scan has been completed, the data objects in the SQL pool will get cataloged and available for exploration and use from Azure Purview as shown below.
In this way, we can create a dedicated SQL Pool in the Azure SQL Server instance and catalog it with Azure Purview.
In this article, we learned how to create a dedicated SQL Pool in the Azure SQL Server instance, right from the dashboard of Azure SQL. We also learned the similarities and differences between SQL Pools hosted on Azure SQL and Azure Synapse, and finally, we learned how to catalog the same using Azure Purview.
- Getting started with Azure Database for PostgreSQL – Citus server - July 30, 2021
- Getting started with Azure Database for PostgreSQL – Flexible server - July 14, 2021
- Data Wrangling with Power Query in Azure Data Factory - July 6, 2021