This article will get you started with Azure Purview Studio and the different facets associated with this tool.
Azure Purview is a new service offering in preview mode that provides functionality to address the needs of data governance as well as data management. In my previous article, we explore the basics of Azure Purview where we understood the features, functionality, the method to create an account as well as query the resources in the account using azure graph explorer. After creating an account or instance, like Azure Synapse, it provides a web-based environment to operate this service – also known as Azure Purview Studio. This tool is the central console using which developers, administrators as well as end-users can work with Purview. In this article, we will learn about how to get acquainted with this tool, as well as understand the different aspects and functionalities offered by this tool.
Before we perform a step-by-step exercise where we would explore the Purview Studio, we need to have an account in place, as well as the required privileges to administer the account, credentials as well as identities that may be used with this account. Once the account is created, you should be able to see a screen as shown below which is the Purview account dashboard. In this interface, you would find a link titled “Open Purview Studio”. This is the gateway to access the tool. If by any means you are not able to view or access this link, it can be due to insufficient privileges. So, ensure that you have access to this link, and clicking on the link opens the Purview Studio interface in a new tab or new window.
Once the Purview Studio opens, it would look as shown below. Please keep in view that this service is still in preview as of the draft of this article. So, it is possible that at any time in the future, Azure may change the view of the dashboard to some extent. There are many observations that one can make from the dashboard view.
- The name “mypurviewaccount001” in the top-center of the page shows that this is the account that is being operated through this tool
- Below it we can see that there are no data-sources registered. One can register different data sources for cataloging and governance
- Database objects hosted in these data sources are known as assets. As we have not registered data sources, we obviously do not have any assets
- We can also define a glossary of business or domain-related terms. As it’s a brand-new account, by default there are no glossary terms as well
- Below the status of data sources, assets and terms, there is a search bar that facilitates searching of the metadata, once it is extracted from data repositories and registered in Azure Purview
- The four major operations that different user personas can perform from the Azure Purview Studio is exploring the samples and tutorials using the links available in Knowledge Center, Registering of Data Sources to extract metadata, Browsing assets to explore and search the metadata, and Manage glossary of business terms and templates that can be used to tag or map with extracted metadata. These four operations can be readily started from the four tiles listed below the search bar as shown below
- On the left-hand side is the vertical menu bar which opens the specific pages to these operations
Click on the Source icon to open the page related to the data sources in this purview account. On this page, we can find a button to register different supported data sources. If we click on the Register button, a new pop-up would appear listing all the supported data sources as shown below. All the Azure native data sources like Azure Blog Storage, Azure Cosmos DB (SQL API only), Azure Data Explorer, Azure Data Lake Storage Gen1, Azure Data Lake Storage Gen2, Azure SQL Database, Azure SQL Managed Instance, and Azure Synapse are supported. In addition, Power BI as well as SQL Server (hosted on-premises or in a virtualized environment) is supported provided connectivity to it is available. One can organize these data sources in the form of different collections. These collections form a logical grouping of data sources which can be for different geographies, different environments, different business groups etc.
Next, click on the Glossary Terms icon and it would open a page as shown below. From here, we can create a new term or a new term template as well by clicking on the new term button. By default, purview provides a term template with some basic fields already defined in a system template as shown below.
The next section in the list is the Insights section. Click on the icon and it would open a new page as shown below. Once the data repositories are registered and metadata is extracted and registered, assets are created. These assets are then auto analyzed by purview to create several insights that can be accessed and explored from this section as shown below. From this section, one can explore the assets, view a list of scans scheduled for crawling the registered data repositories, view a glossary of term templates, and explore the classification of different metadata fields as well as labels attached to them based on detected data sensitivity.
Finally, the last section in the list is the account information section. Click on the last icon in the link and it would open a page as shown below. Here we can see the account, which is being operated by this tool, as well as we can set a friendly name for this account as well as shown below.
- When data repositories are scanned and assets are created as well as managed, different metrics are generated for monitoring. The same can be accessed from the metrics link which can be accessed from the general section
- One can view as well as manage the external connections from Azure Data Factory that automatically extracts data lineage information from the external connections section
- As Azure Purview provides centralized metadata management as well as governance for the registered data repositories, the same can be accessed from the “Security and Access” section at the bottom of the menu section as shown below
If we click on the classifications link under the metadata management section, we can see all the built-in system classification rules that come pre-configured in Azure Purview. These rules are used to classify metadata when it is extracted from database objects.
Metrics in Azure Purview are integrated and reported using Azure Monitor. Click on the Metrics section and it would have a link to Azure Monitor, which would open in a new tab as shown below. From this interface, we can filter and split different data metrics, plot the same on different types of graphs, as well as build custom dashboards to visualize the metrics to evaluate performance. The available metrics depend on the features enabled in the Azure Purview account. Metrics may have different levels of granularity and this may lead to a significant amount of data. The data generated can be aggregated in these dashboards using the available aggregation functions.
Now that we understand how to navigate the Azure Purview interface, the next logical step is to identify the data sources that need to be registered and the database objects that need to be cataloged in Purview.
In this article, we learned about different features and functionality offered by Azure Purview Studio like data sources, assets, scans, classifications, metrics, and other such features. We explore different aspects of these features and learned how data cataloging and metadata management are organized and operated in Purview.
Table of contents
|Getting started with Azure Purview for Data Catalog and Governance|
|Getting started with Azure Purview Studio|
|Cataloging metadata with Azure Purview|
|Using Azure Purview to analyze Metadata Insights|