In this article, we will explore Azure Purview capability and understand it with a practical walkthrough.
In a typical data architecture diagram, the vertical areas range from data capture, data curation, and data consumption. These major areas house different components of the data practice like databases, data pipelines, data standardization, master data management, reporting and dashboarding, data anonymization, etc. But one component that spans across all these components is the data catalog which is at the center of any data ecosystem. A data catalog can be considered as a central directory that can hold a data dictionary, business terminology that maps to data definitions as well as metadata of data objects across various repositories in the data ecosystem.
Metadata detection and cataloging this metadata is the most common use case of data catalog and is almost found in every matured data ecosystem. When this inventory of metadata definitions is enhanced with annotations at an attribute level, it takes the form of a centralized data dictionary. Both use-cases are implementation-centric and directly beneficial for a technical user. Business or functional users too make use of the data catalog by defining the glossary of business terms that are used organization-wide or terms that are an industry standard. Data stewards may define as well as maintain these glossaries of terms and attach them to various data objects and attributes for classification purposes. This part of the data catalog is often underappreciated as well as underutilized. Maintaining a business glossary in a data catalog is crucial especially in a multi-party business where there is ample scope of having analogous business terms that may create confusion which can lead to data discrepancy.
Azure Purview is Azure’s metadata catalog offering on the cloud platform. It offers all the capabilities discussed above including the ability to define a glossary of business terms.
To get started with this exercise that will follow, we need to have an Azure cloud account in place with the required privileges to administer the Azure Purview service. It is assumed that such an account is already in place. Next, we need to create a new Azure Purview instance. Navigate to the Azure Purview service dashboard and click on the Create button. It would invoke a new account creation wizard as shown below. Select the subscription name and resource group in which the purview account would be created. Provide an appropriate name for the account and the location in which the account would be created. Once done, proceed to the next step.
In the networking section, we can continue with the default option as shown below.
In the next section, we need to configure the capacity units. For this exercise, we can continue with the default minimal capacity as well. Another important point to note is the C1 checkbox which relates to the business glossary and lineage visualization, as we intend to use the business glossary feature. We will also need to provide details of a managed resource group name that will be used to created additional objects that will be integrated with this Azure purview account. Once this configuration is complete, we can click on the Review + Create button to create an instance of the Azure Purview account. Review the necessary details and click on the Create button to create a new Azure Purview account.
Once the new account is created, open it to navigate to the dashboard of Azure Purview. It should have a link named Open Purview Studio. This is the console from where we can use different features of the Azure Purview data catalog. Click on this link, which will open in a new tab as shown below. In the below screen, one can see that a few data sources and data assets are listed. This would be set to zero in case of a brand-new account. There may be cases where the Azure Purview account is already being used and one may want to add a business terms glossary in the same account. In that case, the home page of Azure Studio would resemble with statistics like the one shown below.
Click on the Glossary terms icon to open the home page for term glossaries as shown below. By default, this page would not have any terms listed. Let’s say that we are a business or functional user who is tasked to create a list or hierarchy of terms. We will walk through a sample of terms to understand how different business terms can be created as well as linked with each other.
Click on the New term button and it will open a new page as shown below. First, we need to select a template for the creation of the term. By default, a system template is available which we will be using. More advanced users can create a new custom template as well which can be used by a group of users.
In this step, we can define different details related to term definition and different metadata attributes related to this term. It’s divided into three sections titled – Overview, Related and Contacts. The Overview section is the first section by default. Let’s say we intend to create a top-level term named Client. So, we will give the name of this term as Client. We can provide an optional business or functional description for this term. This business terminology can be considered like what master data management does in terms of standardizing data across data repositories. Instead of data, here is the terminology used across data repositories in data objects. Though the definition is optional here, it is recommended to mandatorily define it with as much clarity as possible, as defining terms without any functional detail is likely to result in terms not being used. As this is a top-level term, the parent for this field would be none. Typically, there are many analogous business terms to one term that are used across the organization. These can be captured in the Acronym section as shown below. There may be a wiki page or different resources where the definition of a client may be explained at length. Such resources too can be referenced by adding a link to the resource in the Resources section.
In the Related section, we can add synonyms as well as related terms. As this is a top-level term, this would be blank or None for now.
It is possible that a data scientist or data modeler or data architect may initiate or drive the creation of this business glossary. And there may be data stewards as well as Subject Matter Experts who may specialize in defining the business context related to a specific term. In this section, we can tag those types of users who can be consulted when needed.
Click on the Create button to create this term and once done it would look as shown below.
Let’s say that we intend to create another term named Engagement which is a child term that is directly related to the term Client. Using the steps explained above, we can add it in the way shown below. Here we are specifying the earlier created term Client as the parent of this term. When we select a term as a parent of another term, the formal name of the term is padded with the parent term by default as shown below.
In this related section, we can specify that the term Client is related to this term Engagement as shown below. As we do not have any other term which is synonymous with this term, we will keep them blank.
Once both the terms are created, it would look hierarchically as shown below. These terms are in draft state.
We can change the state of these terms as well by editing the term and selecting the desired state of the term, based on which users can choose how to use this term for classifying data objects.
In this way, we can use Azure Purview to create a glossary of business terms in the data catalog and use it for data classification and a variety of data cataloging purposes.
In this article, we understood the significance of data cataloging as well as the business glossary. We learned how to create an Azure Purview account, create business terms as well as inter-associate these terms with each other to use them effectively for metadata management in the data catalog.
- Querying data from Azure Database for PostgreSQL using psql - October 3, 2022
- Introduction to basic psql commands - September 30, 2022
- Integrating Azure Data Explorer cluster with Azure Synapse - September 19, 2022