This article will show how to instrument queries executed on a dedicated SQL pool of Azure Synapse Analytics.
Transactional databases, Analytical databases and Data warehouses serve data requests by using queries. All these data repositories log queries and the metrics related to query execution which helps performance engineering teams to debug and optimize the performance of the query execution. While the metrics of the query execution can be monitored and analyzed, the purpose of the query at times is not obvious enough. Data analysts and support personnel often fire ad-hoc queries on different database objects. In regulated industries, one may have to justify the purpose of a query on confidential customer data. The purpose of these queries cannot be derived just by looking at the structure or the result set of the query. If the system or the personnel who executed the query would explicitly specify the intention of the query, then it would be easier to identify specific types of queries that were executed on the database or the data.
Data warehouses generally act as a central repository to a variety of downstream applications like reports and dashboards, machine learning workloads, ad-hoc analysis by data scientists, data analysts and many more. So, the probability of a variety of queries being executed on data warehouses is much higher comparatively, and most of such queries tend to be ad-hoc. In such an environment, it becomes harder to understand the purpose of various queries executed against a vast amount of data held in tens to hundreds of database objects held in data warehouses. Azure Synapse is Microsoft’s data warehouse offering on the Azure cloud. Azure Synapse integrates with a variety of Azure-based services, and the ability to instrument queries executed on the dedicated SQL pool can be very useful to understand the purpose of queries being executed on the data warehouse. Let’s go ahead and see how it is possible in Azure Synapse.
Using Query Labels with Azure Synapse
Azure Synapse supports a concept known as “query labels” that allows tagging any DDL or DML queries that are executed on the dedicated SQL pool. These labels can be queried using the dynamic management views (DMVs). One can use these labels to describe the purpose of the query or add any metadata to the query being executed and the same can be used later for instrumenting the queries, specifically to identify the queries that meet the desired search criteria. Let’s walk through a step-by-step exercise to understand this concept practically.
The first thing we need to have in place is an Azure Synapse Analytics workspace instance created on the Azure cloud account. It is assumed that this instance is already in place. Azure Synapse supports three types of pools – serverless pool, dedicated SQL pool and Spark pool, but the query labels are specifically supported only with dedicated SQL pool. So, one needs to create at least one SQL pool with the minimum capacity to perform this exercise. Create a dedicated SQL pool and navigate to the dashboard of the dedicated SQL pool. From the dashboard of the dedicated SQL pool, capture the host endpoint of the pool and connect to this endpoint using SQL Server Management Studio installed on a local machine. Once this connectivity is established, one can execute the desired queries on the dedicated SQL pool of Azure Synapse. Ensure that SSMS is successfully connected to the dedicated SQL pool as shown below.
Let’s look at one of the examples of how labels can be used in DDL queries while creating or modifying database objects. Let’s say that we intend to create a table from another table by using the CREATE TABLE AS command. Usually when one looks at a table definition, one won’t be able to identify that the table was created from another table. If we specify the label while executing a DDL statement that the table was created from another table, then we can use the label to analyze all DDL statements that used CTAS.
As shown below, here we are using the CREATE TABLE DDL statement to create a new table from an existing table named Address. We need to mandatorily specify a distribution key while using a CTAS command in Azure Synapse, so we have specified the same using the distribution option. After specifying the table definition where we select all the fields from a different table using the select query, we can specify the label using the table options as shown in the last line below. Once this query is executed it will result in the creation of a new table named sampletable and the query will be logged with the specified label.
Now let’s say that we intend to find all the DDL queries that were executed and have the label ‘CTAS from Address’ as the label. We can use the dynamic management view named dm_pdw_exec_requests under the sys schema. This view returns all the execution requests, and it contains a specific field named label which we can use to filter the queries of interest. As shown below, we can query this view to find the query that we just executed using the label that we tagged to the query.
Let’s say that we intend to execute a DML query where we intend to query all the tables available which can be used to create new tables. An easy way to do this is by querying the sys.tables view. Let’s say that the security policy required that one needs to specify the reason to query systems view, tables and other database objects, we can tag a label to the query which specifies the purpose of executing this query. As shown below, we are querying the sys.tables view and using the table option label, we are specifying that the purpose of our query is to check the CTAS candidate tables. Execute this query and we should be able to view all the tables and the query would be logged with the specified label.
Now that we have executed the query, let’s say that we intend to find all the queries that have CTAS as the keyword in the label. We do not intend to find an exact match as we cannot predict the exact text of the label for each query. But as long as the text contains the keyword ‘CTAS’, we intend to find out all such queries irrespective of whether it’s a DDL or a DML query. We can query the same view that we queried earlier, but this time using a like operator as shown below. This query retrieves all the queries which we executed in this exercise as well as any older queries that used the keyword ‘CTAS’ in the label.
In this way, using query labels, one can easily add textual metadata in the form of tags to the queries that are executed on the dedicated SQL pool of Azure Synapse to instrument queries. Applications, APIs, Microservices, ad-hoc queries executed by the different persona of users can specify the context using query labels for the queries executed by them, which can later be analyzed by administrators to understand the pattern of data usage by users as well as applications.
In this article, we learned the concept of query instrumentation by using metrics as well as query labels and understood the difference between the same. Then we worked on an Azure Synapse analytics cluster with a dedicated SQL pool. We learned how to add query labels to DDL and DML queries, and how to retrieve the queries based on any filter criteria for query labels.
- Visualizing data with dashboards in Azure Databricks - October 18, 2021
- Populating Delta Lake Tables in Azure Databricks with Azure Data Factory - October 13, 2021
- Creating Delta Lake Tables in Azure Databricks - October 11, 2021