Gauri Mahajan
Data Explorer in Cosmos DB

Querying JSON documents in Azure Cosmos DB using the SQL API

May 21, 2021 by

In this article, we will look briefly at the Azure Cosmos DB service and how you can query JSON documents to store, retrieve and process data using the SQL Application Programming Interface (API).

Introduction

Azure Cosmos DB is a multi-model PaaS offering on Microsoft Azure. It is a NoSQL globally distributed highly available database system with less than 10ms of latency for both reads and writes.

It works on the concept of Request Units per second (RU/s) for managing the performance and cost of databases, in short, the amount of computation and I/O resources required to perform a simple read request. We can provision throughput on databases or containers. We will walk through how we can set it up later while creating a Cosmos DB account.

Azure Cosmos DB is great for IoT, retail, marketing, web, and mobile applications. Apart from these, it is also used for mission-critical applications running globally like Skype, Xbox, and Azure. Additionally, Azure Cosmos DB offers not just the Strong and Eventual consistency levels but also a range of other levels including Bounded Staleness, Session and Consistent Prefix. To read more information on Consistency levels, you can check this Microsoft documentation.

Azure Cosmos DB supports several APIs like Core (SQL), Cassandra (to access Columnar data), Gremlin (for Graph data), MongoDB API (for Document data), and Azure Table (Azure Table Storage for Key-value data) that can be used to access a variety of data. This makes the job of various data professionals easier to work with Cosmos DB. We will be looking at Core (SQL) API in this article.

Azure Cosmos DB is generally available in Documents. It allows us to access these documents using the above interfaces. You can consider a document as a collection of fields provided with a Key, and it uses a structured JSON format with no pre-defined schema. That means, the fields don’t have to be the same in every entity. Documents in an Azure Cosmos DB are organized into containers, they are further grouped into partitions in Containers. We will be creating these shortly in a lab.

Pre-requisites

To follow along with this article, you will either need to have an Azure subscription or Cosmos DB trial account. With the latter, you will have 30 days of free access with no subscription required.

Azure Cosmos DB account

Let’s quickly jump to the Azure portal and quickly create a Cosmos DB account. Search for Azure Cosmos DB and click Add. This will bring you to a new Create Azure Cosmos DB Account window. Provide in detail like the Resource Group, Account Name, API – Core (SQL), Location, etc. and click on Review + create.

Quick note: Core (SQL) API is the native API in Cosmos DB and is also called SQL API. It supports a SQL-like query language that allows retrieving documents using SELECT and other basic SQL commands.

If you want a step-by-step guide, I would recommend you go through this great article, Start your journey with Azure Cosmos DB, to create a Cosmos DB in detail.

Azure Cosmos DB Account

You can choose between Provisioned throughput and Serverless (preview). We have selected the former that gives you 400 RU/s for free.

Creating a Cosmos DB Account

Once you click on Review + create, it will validate all the details, and then select Create button to create this resource.

Creating Cosmos DB Account

Click on Go to resource button to navigate to this newly created service. You can see below, Azure Cosmos DB named, apisqlcosmosdb is created and different statistics can be seen on the Overview tab as shown below.

Overview of Cosmos DB

Adding a database and container to the Azure Cosmos DB account

Click Data Explorer blade on the left side and click on New Container and a new Add Container window gets opened on the right. We need to provide Database id (Employees), Container id (Personnel) and Partition key (/employeeId). If you notice below, we have an option to set Throughput at both the database or container level, the minimum throughput that can be allocated to a database or container is 400 RU/s.

Data Explorer in Cosmos DB

You can scroll down to fill in other details.

Adding a container in Cosmos DB

There are two options (Autoscale and Manual) to provision Throughput. We are selecting 400 RU/s for now. In case, you predict to have large workloads in the future, you need to increase the throughput to avoid getting an HTTP 429 error (Request rate is too large).

Finally hit OK to create the Employees database and Personnel container in it.

Container properties in Cosmos DB

Once done, expand DATA under your SQL API account, and you would find the Items option under the Personnel container. Click on it and you will see New Item in the top menu bar to add data to your container.

Creating JSON items in the container

Since items are stored as JSON, we will add three records in JSON format that would include hypothetical employees’ personal information like name, age, gender, pets they own, phone numbers etc. Here, streetAddress, city and state attributes are all nested in the Address property.

You can see three employee ids (3 items) added as shown below:

New items added in the database Personnel.

Querying JSON data using the SQL API

Now comes the main part of this article, i.e. learning to work with JSON data using SQL query language in an Azure Cosmos DB account. Click on the New SQL Query icon on the top menu bar to open a query window. We will start with the basic queries using SELECT, WHERE, ORDER BY, TOP, Between and IN clauses, and further understand Joins, aggregations, arrays, different formats in which we can retrieve the output.

Querying JSON data using the SQL API.

SELECT clause

Type “Select * FROM Personnel” and click on Execute Query button on the top right-hand side. This will return all the records present in the container as shown below:

SELECT clause in the Cosmos DB account

SELECT clause using dot(.) notation for Projections

In order to retrieve only selected properties from the documents, you can use (.) notation in the SELECT clause. Projection helps us limit the properties that are returned in the result.

  • Quick note: The attribute name after (.) is also case sensitive

You can also use the quoted notation (“) in place of (.) and it will give the same results. One example is shown below:

SELECT clause

FROM clause with Property names

You can use the expression <containername>.<propertyname> in the FROM clause to return the list of properties/fields. Below are some examples:

Projection in Cosmos DB

Projection in Cosmos DB

WHERE and Between clauses

Just like SQL queries, the WHERE clause helps to apply a filter on the JSON data as well in order to retrieve selected data. For example, you want to return data based on the age of an employee. We have used BETWEEN clause to output the range of such data.

WHERE and Between clause

TOP and ORDER BY clauses

The below query will show how you can use the TOP and ORDER BY clause to display the top 2 results in the descending order of employee ids.

ORDER BY clause

The above clauses were straight-forward and the only point to keep in mind here is even though the language is SQL, the output is in JSON format.

Aliasing and JSON expressions

While working with data, data professionals often have to format data that is originally stored in Cosmos DB. One such example is shown below where the city is retrieved as CityName using the alias keyword, AS and since nothing was provided for Postal Code and Pets properties, the clause used implicit argument variable names $1 and $2 respectively for them. We are fetching results for only New Jersey state.

Aliasing JSON expressions

Working with arrays in JSON using the SQL API

In the JSON items used in this article, we have Address property as the nested property and apart from that JSON also supports arrays. For example, property phoneNumbers is an array with type and number as its attributes. Like for any array, we access them by referencing its position [0],[1],[2].. and so on. The following query retrieves records for employees whose phone number type is Travel. [0] indicates, it scans the first record in the phoneNumbers array.

Working with arrays in JSON using the SQL API.

Using JOINS

In conventional relational database systems, Joins are usually applied on Keys like Primary Keys or Foreign Keys, however, containers in Azure Cosmos DB contain JSON items that are schema-free, so we are basically working with the denormalized data model and self-joins here. Below are a few JOINS examples using the SQL API.

Here, this query returns the selected properties with the age of employees more than 30 and ordered by lastName attribute.

JOINS in Cosmos DB

The below query displays a more conventional form of Join using the IN keyword and helps to show results based on the type of phone numbers (Work/Travel/Home) available for employees.

JOINS in Cosmos DB using the SQL API.

Another interesting example could be the below where we want to find ‘work’ phone numbers of employees in the company.

JOINS in Cosmos DB

Aggregates in JSON

Aggregations play a crucial role in the data world. The SQL API supports various aggregation functions like SUM and AVG (for numeric properties) and COUNT, MIN, MAX (for strings and numbers). The following query shows an aggregation example that calculates the average age and count of employees.

Querying Aggregate functions using the SQL API.

Using Group By clause

Like in SQL, Group By clause is used to arrange similar data in different groups, using various functions like Count, Min, Max etc. The following example grouped the employees based on the gender property i.e. Male or Female.

GROUP BY clause in Cosmos DB using the SQL API

However, there are some limitations for using the GROUP BY clause in Azure Cosmos DB, for example, the ORDER BY clause cannot be used along with it. For more information on this, check out this Microsoft documentation.

Conclusion

We began this article learning briefly about Azure Cosmos DB and the different facets associated with it. Along the way, we explored how an account can be created and added JSON data to a container. And finally, we learned how we can work with JSON documents to store, retrieve, and process data in the Cosmos DB account using the SQL API.

Gauri Mahajan
Azure, Azure Cosmos DB, JSON, SQL Azure

About Gauri Mahajan

Gauri is a SQL Server Professional and has 6+ years experience of working with global multinational consulting and technology organizations. She is very passionate about working on SQL Server topics like Azure SQL Database, SQL Server Reporting Services, R, Python, Power BI, Database engine, etc. She has years of experience in technical documentation and is fond of technology authoring. She has a deep experience in designing data and analytics solutions and ensuring its stability, reliability, and performance. She is also certified in SQL Server and have passed certifications like 70-463: Implementing Data Warehouses with Microsoft SQL Server. View all posts by Gauri Mahajan

168 Views