Aveek Das

Learn NoSQL in Azure: An overview of Azure Cosmos DB

June 16, 2021 by

In this article, we are going to learn Azure Cosmos DB. This article is a part of the series Learn NoSQL in Azure, where we will explore all the different types of non-relational databases that are supported in Azure at the moment. Azure is one of the most popular public cloud platforms that has a big market share all over the world. Cosmos DB is a part of the Databases section in Azure that allows customers to create and use NoSQL or non-relational databases and consume these at scale. You can leverage Cosmos DB to build highly scalable and robust cloud-based applications that support modern big data workloads. Let us understand more about what a NoSQL database is all about and how it is different from a relational database. Although this article focuses on the NoSQL related to Azure, it is to be known that other open-source projects support NoSQL databases like Apache Cassandra, etc. However, these topics are out of the scope of this article and we will focus on Azure mostly.

Why do we need a NoSQL Database?

Overall these decades, developers have been using relational database management systems to develop applications across all domains. Even today, relational databases are used heavily in most modern applications. However, as the applications and databases grew in size, it became difficult for the relational databases to scale and the need for highly scalable databases grew. Applications needed to be highly responsive and available most of the time. Due to these requirements, databases had to be scaled and distributed to achieve high performance and low latency.

However, relational databases were based on relationships, and distributing these databases across multiple systems gets very costly, as these relationships had to be maintained across all the nodes within the cluster. These databases are originally architected to run on single servers in order to maintain the integrity of the databases. This meant that relational databases can be scaled vertically but preferably not horizontally. Vertical scaling could be done by increasing the resources available on the server, but it was limited, unlike horizontal scaling. These limitations gave rise to the evolution of the NoSQL databases as these could be scaled both vertically and horizontally without having to worry about keeping relationships intact.

Introduction to NoSQL Databases

As the name suggests, a NoSQL database is basically a non-relational database. It is different from the fact that data in a NoSQL database is stored in documents as opposed to tables in relational database management systems (RDBMS). Since there are no tables in the database, there aren’t any relationships between the different entities within the database. There are many types of NoSQL databases like Key-Value databases, Columnar Databases, Document Database, Graph Databases, etc. The main form of storage in a NoSQL database is JSON. Let us look at how a NoSQL Database looks like.

Comparing SQL with NoSQL Databases

Figure 1 – Comparing SQL with NoSQL Databases (Source)

As you can see in Figure 1, on the left we have two relational tables – “Orders” and “OrderDetails”. And on the right, we have a JSON document that relates to the structure from the tables. This JSON document is known as a single document in a Document Database. The detailed data from the OrderDetails have been incorporated within the same Orders in a nested form. This is a denormalized form of the data and helps in faster reads as compared to reading data from multiple tables. Here, in a NoSQL database, the data is stored in the form of documents, which means we are going to have one single document for each order. In this way, as the orders increase, they can be distributed to multiple nodes and scaled out accordingly. Notice that since the detailed data are nested within the same document, there is no need to maintain complex relationships within the two entities.

Azure Cosmos DB to the rescue

Now that you have some idea about what NoSQL databases are, let us look into the major NoSQL database provider available in Azure. As per the definition of Wikipedia, “Azure Cosmos DB is Microsoft’s proprietary globally-distributed, multi-model database service for managing data at planet-scale launched in May 2017. It is schema-agnostic, horizontally scalable and generally classified as a NoSQL database.”. CosmosDB was launched by Microsoft in the year 2017 as a multi-model database system. We will look into the multiple database models available within Azure Cosmos DB in the latter part of this article.

Cosmos DB is an entirely managed service by Azure, meaning that customers are not required to worry about any sort of infrastructure before using it. They just need to have an active Azure subscription in order to get started with Azure Cosmos DB. It has rich features like global distribution and multi-master replication modes that enable data to be stored in multiple locations within the region. Additionally, Cosmos DB also supports some sort of analytical workloads and Artificial Intelligence based on the data that exists on the databases.

Azure Cosmos DB Structure

Now that we have some idea about Cosmos DB, let us understand the underlying structure of the components within it.

Cosmos DB Structure

Figure 2 – Cosmos DB Structure

As you can see in the figure above, the top-level item is the Database Accounts. Within a Database Account, there can be one or more databases. Each of the databases is comprised of something known as containers. Azure Cosmos DB offers multiple APIs for a NoSQL database and depending on the API, the containers can be of different types like a collection, a table or a graph, etc. However, irrespective of the container API, there is a set of features that each collection has. These are items within the collection, stored procedures, user-defined functions, triggers, merge procedures, conflicts, etc. Again, items vary depending on the database API that is being used. The following table shows the association between the database API, collections and item types.

Database API

Collection Type

Item Type

Key Value

Collection

Document

Document

Collection

Document

Columnar Store

Table

Rows

Graph

Graph

Nodes and Edges

Key features of Azure Cosmos DB

In this section, I am going to highlight some of the key features of Cosmos DB.

Key Features of Azure Cosmos DB

Figure 3 – Key Features of Azure Cosmos DB (Source)

  • Global Distribution – Cosmos DB is a globally distributed database across the world, and this makes it very fast and serves low latency systems
  • Linear Scalability – This is the other name for horizontal scalability. Cosmos DB can be scaled out to multiple nodes making it one of the fastest databases to support multiple reads and writes every second
  • Multi-Model Database – Cosmos DB is a combination of multiple database APIs within the same database system. Users can choose from different APIs such as Document, Key-Value, Columnar or Graph Databases
  • High Availability – The availability SLA for Cosmos DB is too high, around 99.999% for multi-region read and writes whereas 99.99% for single region reads and writes. This means there is a very low downtime and the applications will continue running most of the time

Conclusion

In this article, we have learned about the various types of non-relational databases and how we can utilize Azure Cosmos DB to work on these databases. NoSQL or non-relational databases are primarily of four types – Key-Value, Document Stores, Columnar Database and Graph Databases. All these non-relational databases have some added advantages over the traditional relational database, and these are being readily used in today’s modern cloud-based applications. NoSQL database can be highly scalable both horizontally and vertically and this is the reason for these being used so frequently.

To learn more about Azure Cosmos DB, you can start by reading the official documentation.

Table of contents

Learn NoSQL in Azure: An overview of Azure Cosmos DB
Learn NoSQL in Azure: Diving Deeper into Azure Cosmos DB
Learn NoSQL in Azure: Getting started with DocumentDB SQL API
Aveek Das
Azure, Azure Cosmos DB

About Aveek Das

Aveek is an experienced Data and Analytics Engineer, currently working in Dublin, Ireland. His main areas of technical interest include SQL Server, SSIS/ETL, SSAS, Python, Big Data tools like Apache Spark, Kafka, and cloud technologies such as AWS/Amazon and Azure. He is a prolific author, with over 100 articles published on various technical blogs, including his own blog, and a frequent contributor to different technical forums. In his leisure time, he enjoys amateur photography mostly street imagery and still life. Some glimpses of his work can be found on Instagram. You can also find him on LinkedIn View all posts by Aveek Das

168 Views