Gauri Mahajan
Purview Integration

Integrating Azure Purview with Azure Synapse Analytics

June 14, 2021 by

In this article, we will learn how to integrate Azure Purview and Azure Synapse Analytics capabilities to access data catalog assets hosted in Purview from Azure Synapse.

Introduction

Data exists in various formats on various types of repositories on different clouds as well as on-premises. With the growing data landscape, two of the most common capabilities required to manage as well as extract value out of data are data cataloging and data warehousing. Data cataloging or metadata cataloging enables to keep track of the metadata evolution as well acts as a guiding beacon for all data pipelines that move data from source to destination. Data warehousing provides an approach and capabilities to process large volumes of data efficiently when data across the enterprise is collated for deriving insights. The gap between these two capabilities is that if these two capabilities are not integrated, the teams managing these two capabilities would not have any view of each other’s landscape. Typically, the data warehousing capability acts as one of the biggest consumers of data catalogs like many other data capabilities. Azure provides Purview for data cataloging and governance and Azure Synapse Analytics for data warehousing. In this article, we will see how to integrate these two capabilities to access data catalog assets hosted in Azure Purview from Azure Synapse.

Pre-requisite

As we are going to work with Azure Purview as well as Azure Synapse, we need a few things in place before we can start configuring these tools to integrate with each other. It is assumed that one has the required privileges to administer and operate Purview and Azure Synapse services on their Azure account.

First, we need an instance of Purview, which would provide access to the Purview Studio tool. Using this tool, some data repositories should be cataloged so that when we search for data assets cataloged in this tool, we would find some results. A good example would be creating an Azure SQL Database with the sample data that comes built-in and catalog it with Purview. It is assumed that this Azure Purview setup is already in place and data assets are already cataloged.

Next, we need an instance of Azure Synapse Workspace created, which would provide access to the Synapse Studio tool. This is the primary administrative console that facilitates operating the Synapse pool. Once this setup is in place, it would look as shown below and with this, we are ready to start our exercise of integrating Azure Synapse with Azure Purview.

Synapse Studio

Configuring Azure Purview for integrating with Azure Synapse Analytics

Open the Azure Synapse Studio by clicking on the Open Synapse Studio link from the dashboard page of Azure Synapse Workspace. Click on the Manage blade and you will see Azure Purview (Preview) under the External connections section as shown below. This feature is still in Preview as of the draft of this article. This feature allows us to integrate Synapse with Purview.

Purview Integration

As shown above, we need to start by connecting our Azure Purview account here. Click on the button named Connect to a Purview account. It would pop-up a screen as shown below. If you have the Azure Purview account under the same Azure subscription in which the Azure Synapse Analytics account is created, when you select the “From Azure Subscription”, you will find the Purview account name as shown below.

Purview account from Azure Subscription

Select the purview account and click on the Apply button. This will register the account with Azure Synapse as well as integrate it with Purview. Once done, you will receive a successful registration confirmation as shown below.

Successful Registration

The benefit of connecting Azure Synapse with Azure Purview is that we can access the data assets from the catalog right in Azure Synapse Studio, and also use this information to initiate different actions supported by Synapse. To start accessing the Purview catalog from Synapse Studio, navigate to the Data tab and click on the search bar at the top of the screen as shown below. There would be a drop-down in the search bar which would have two options – Workspace and Purview. Ensure to select Purview as shown below. Now we are ready to start searching the catalog for data assets.

Search data from Purview

Type a full or partial name of the database object that we intend to search as shown below, and it would show a list of database objects that match the search criteria. These search results should not be confused with the database objects hosted in the Synapse pools which are part of the Synapse Workspace. As we are searching in the Purview catalog, the result would consist of data assets held in the specific purview account instance only. If we want to search for items within the workspace, we need to select the Workspace option in the drop-down which would list search results of objects in Azure Synapse.

The results are divided into two panes – the filters pane and the results pane. The filters pane shows the data asset type, classification and other such filters related to cataloged data assets. The results that meet the filter criteria as shown on the right pane. The results show the name of the data objects as well as the type of repository that holds the data object and address of the same.

Search Results

Let’s say that we intend to explore the details of a particular data asset to understand whether it is suitable to be used as a source of data for data warehousing. We can click on the item in the results pane and it would show the results as shown below. In this case, it’s an Azure SQL Database table, so the details like Schema, Lineage, Data Classification, Related database objects, etc. are shown. On the right side of this screen, we can find the hierarchy under which this database object belongs.

Data object details from catalog

Another interesting and useful feature of these results can be found in the related tab. At times, we may be searching for a database object but that may not be the exact match. Finding objects that are similar or related to the object being search can elevate the possibilities of finding the database object of interest. The related tab shows database objects like database, schemas, tables, or view depending on the hierarchy selected as shown below.

Related data objects

Once the data object of interest has been discovered, the next step is to take corresponding actions like creating a linked service, integration dataset, or a new data flow to source the data from the corresponding data repository. The Connect and Develop menu item provide links to initiate such actions as shown below. Clicking on these links would open a new pop-up window or wizard which would have the details of the data source and the data object already pre-populated. We can provide the credentials, build the corresponding artifact in Azure Synapse, and start sourcing the data from the targeted object.

Initiate actions

The benefit of this integration is that we do not need to switch between two sets of services, gain access to the catalog which may be maintained by a data steward or data quality team, and port details back and forth from Azure Purview to Azure Synapse. The built-in integration eliminates all this overhead and provides the convenience of a catalog right within the operational console of a data warehousing environment.

Conclusion

In this article, we created an instance of Azure Synapse and Azure Purview. We cataloged data in Purview, integrated it with the instance of Synapse, searched for data sets from the Purview catalog using Synapse Studio and learned how to initiate actions in Synapse Studio based on the data asset of choice.

Gauri Mahajan
Azure, SQL Azure

About Gauri Mahajan

Gauri is a SQL Server Professional and has 6+ years experience of working with global multinational consulting and technology organizations. She is very passionate about working on SQL Server topics like Azure SQL Database, SQL Server Reporting Services, R, Python, Power BI, Database engine, etc. She has years of experience in technical documentation and is fond of technology authoring. She has a deep experience in designing data and analytics solutions and ensuring its stability, reliability, and performance. She is also certified in SQL Server and have passed certifications like 70-463: Implementing Data Warehouses with Microsoft SQL Server. View all posts by Gauri Mahajan

196 Views