Minette Steynberg

8 things to know about Azure Cosmos DB (formerly DocumentDB)

September 4, 2017 by

Introduction

Azure Cosmos DB is a low-latency, high throughput, globally distributed, a multi-model database which can scale within minutes and offers 5 consistency options to let you decide how to deal with the CAP theorem.

Azure Cosmos DB used to be known as Document DB, but since additional features were added it has now morphed into Azure Cosmos DB. The name was chosen to spark the innovation and imagination of developers around the globe.

Figure 1: Azure Cosmos DB Architecture Symbol

1. Azure Cosmos DB is globally Distributed

Azure Cosmos DB is classified as a foundational service, which means that it is available in every Azure region. This allows you to have your data replicated to as many data centres in as many regions as you wish.

An acdditional region can be added by selecting the Replicate data globally option, and then simply clicking on the regions you want to replicate to on the map.

Figure 2: Enable global replication

It also allows you to specify priorities for your data regions, so you can decide to where it should failover in case of a disaster. You can specify either manual or automatic failover.

Figure 3: Manual or automatic failover

Figure 4: Specify failover priorities

Because your application connects to logical endpoints, nothing has to be changed when there is a regional failover.

2. Azure Cosmos DB has 5 consistencies to choose from

If you are familiar with the CAP theorem, you will know that there is no such thing as perfect consistency. There are always trade-offs. Azure Cosmos DB offers 5 consistency models at the moment so that you can decide for yourself what you deem more important and what you are willing to sacrifice. Consistency is a topic on its own so I will only touch on it briefly here.

The current available consistencies are:

  • Strong
    With strong consistency, you are guaranteed to always read the latest version of an item similar to read committed isolation in SQL Server. You can only ever see data which is durably committed. Strong consistency is scoped to a single region.
  • Bounded-staleness
    In bounded-staleness consistency read will lag behind writes, and guarantees global order and not scoped to a single region.
    When configuring bounded-staleness consistency you need to specify the maximum lag by:
    • Operations
      For a single region the maximum operations lag must be between 10 and 1 000 000, and for the multi region, it will be between 100 000 and 1 000 000.
    • Time
      The maximum lag must be between 5 seconds and 1 day for either single or multi-regions.
  • Session
    This is the most popular consistency level, since it provides consistency guarantees but also has better throughput.
  • Consistent Prefix
    A global order is preserved, and prefix order is guaranteed. A user will never see writes in a different order than that is which it was written.
  • Eventual
    Basically, this is like asynchronous synchronization. It guarantees that all changes will be replicated eventually, and as such, it also has the lowest latency because it does not need to wait on any commits.

When you create your Azure Cosmos DB, you can choose which consistency level to use as the default, but a different consistency level can be specified every time you read from the database.

The default of the default is Session consistency because it is the most widely used. If you want to change the default you can simply go to the Settings section on your Cosmos DB blade and click on Default consistency, which will then allow you to choose a different default.

Figure 5: Select the default consistency

3. Azure Cosmos DB is a multi-model database

Azure Cosmos DB uses an atom-record-sequence (ARS) system. Basically, the Cosmos DB translates all data models into atom-record-sequence based models.

So, everything becomes either an atom, a record or a sequence.

  • An atom is a primitive type.
  • A record is a struct
  • A sequence is an array of either atoms, records or structs.

At the moment Azure Cosmos DB supports

  • Key-value pairs
  • Column family
  • Document and
  • Graph

Figure 6: Supported models

4. Azure Cosmos DB support multiple APIs

Azure Cosmos DB supports multiple APIs which can interact with all the data regardless of what data model is being used, so developers are able to develop in their preferred technology. It currently supports 4 APIs.

  • SQL
  • Mongo DB API
  • Tables API
  • Gremlin

5. Azure Cosmos DB indexes automatically

Azure Cosmos DB handles indexing automatically and does not need any schema or any secondary indexes. It uses Bw-trees instead of B-trees. Bw-trees makes use of multi-core technology, it does not use latches, and it never updates a memory page in place as to avoid cache invalidations. It also takes advantage of flash storage.

Even though Cosmos DB takes care of indexing automatically. It still allows developers to perform fine grained tuning by allowing a developer to implement a custom indexing policy.

You can change it in the Azure Portal by going to your Cosmos DB blade. Click on Settings and then on the indexing policy option.

Figure 7: Custom indexing policy option

6. Azure Cosmos DB has very low latency

Azure Cosmos DB guarantees less than 10 ms for reads and less than 15 ms latency for writes.

During the 15 milliseconds, the data is actually written, committed, replicated and indexed in less than 15 milliseconds.

Basically, all reads and writes are served from a local region and then replicated according to the selected consistency model.

7. Azure Cosmos DB can scale storage and throughput independently

If you need more storage you can add additional storage without having to pay for additional throughput. These two things can scale independently of each other.

To scale throughput the first thing you should do is to estimate how much throughput and storage you will actually need. You can use the handy Capacity Planner site to perform the calculation. It will give you an estimate of how many RUs (Request Units) you need as well as the amount of Data Storage.

Once that is done, the number of RUs and storage can be increased using the Scale option on the Cosmos DB server blade in Azure.

8. Azure Cosmos DB has a free emulator for testing

When you develop an application to use Azure Cosmos DB, you don’t need to spend your money until you are ready to launch your application. You can use the Azure Cosmos DB Emulator, which will allow you to develop and test your applications before having to use your Azure Cosmos DB account.

Figure 8: Explorer view of Azure Cosmos DB Emulator

Minette Steynberg
SQL Azure

About Minette Steynberg

Minette Steynberg has over 15 years’ experience in working with data in different IT roles including SQL developer and SQL Server DBA to name but a few. Minette enjoys being an active member of the SQL Server community by writing articles and the occasional talk at SQL user groups. Minette currently works as a Data Platform Solution Architect at Microsoft South Africa. View all posts by Minette Steynberg

168 Views