This article will help you understand the Azure storage options available in the Microsoft Azure cloud. Microsoft Azure provides various services to store data depending on their type, nature, shape, and size. Data could be anything like an image, a video, a text file, a database file to store customer’s data, or data that comes from a digital medium like online retail websites or YouTube, etc. There are various types of data available, and one storage solution cannot fit all distinct types of data storage requirements.
Before computers, humans used to save and store data in the form of paper or physical paper files and secure it in a safe location for further use. Post computers, storing data has started digitized with the help of computers in the form of magnetic tapes, floppy disks, hard drives, etc. As data sizes grew along with their new shape and sizes over time, its storage solutions also evolved significantly. We have progressed throughout from punch card to magnetic tape drive to hard disk drives to floppy disk to CD to DVD to SD cards, to USB drives to SSD drives to finally cloud storage solutions. The reason behind this evolution was due to change in data shapes and sizes over time and existing solutions were not enough to address the requirements of new sizes and types of data nature.
Data storage solutions will keep on evolving in the future as well based on the new upcoming requirements and challenges. Let’s understand various storage solutions offered by Microsoft Azure to store and process data in the Azure cloud environment.
Azure storage provides various options to store data in the Microsoft cloud environment. It comes with loaded features that reduce the complexities of storage manageability and maintenance. Azure storage provides redundancy by storing multiple copies of data to keep it highly available and secure in case of any hardware failure or issues to the underline systems. You can also configure your data to replicate locally (LRS), across the data center (ZRS), or to other geographical regions (GRS) to protect them from natural or man-made disasters.
All data stored in Azure storage is encrypted to ensure data protection. Azure storage protects data at rest and in transit. Nobody can access data stored in Azure storage unless they have been authorized to access that data.
Azure storage can be scaled easily as per increased workload or demand. It is self-managed from maintenance prospects; it means all maintenance and updates will be handled by Microsoft Azure. You can easily access Azure storage from anywhere in this world using the internet. It also supports almost all coding languages, whether your applications are using .NET, JAVA, or PHP, they can access data stored in Azure cloud storage through their respective APIs.
Azure storage provides the below storage options to store customers’ data based on its shape, size, and nature.
- Azure BLOBs
- Azure Files
- Azure Queues
- Azure Tables
- Azure Disks
Each option is designed to address different requirements. There are other ways as well to store data in Azure cloud, like using Azure SQL databases, Azure cosmos DB, etc. but the underlining core storage solution for these services is also built on any of the above Azure storage options.
Let’s discuss each of the above Azure storage solutions a little more in the below sections.
The BLOB stands for Binary Large Objects. It is used to store an enormous amount of unstructured data that might come in form of images, text, files, videos, or a mix of all these types. It is also used as underline storage for Azure data lake analytics solutions and managed disk subsystems for Azure virtual machines.
Azure BLOB is suitable for:
- Storing unstructured data like images, text, files, videos, or documents
- Need to access unstructured data directly from storage
- Streaming video or audio applications
- Database backups, archive files, etc.
- Azure data lake analytics solutions
This Azure storage solution is designed to address network file share requirements. It is a fully managed file share solution that can be accessed from anywhere using SMB protocol. We can use it for the cloud as well as on-prem workloads.
Azure Files are suitable for:
- File shares that various applications use
- Network file share which can be accessed by various virtual machines
- Replacing on-premises file share
- Lift and shift application migration
- Keeping any file in a centralized location that can be accessed by various other machines
- Placing log files, metrics files, and crash dump files which can be accessed and analyzed later using another machine
It is designed to address messaging requirements. It’s a kind of messaging store which is used to store the large number of messaging data that can be used between application components. This messaging data can be used from anywhere using its authentication credentials.
This type of Azure storage is used to save NoSQL workloads which are categorized under semi-structured data. It is suitable for Azure Cosmos DB service, a NoSQL offering from Microsoft Azure cloud. Azure Tables can be used to store large data sets used in web applications.
Azure disks are designed to serve virtual machines’ requirements. It is a virtual hard disk same as you use for on-prem machines. Data can be stored on Azure disks and can be accessed from that virtual machine. Data stored in Azure disks cannot be accessed outside the virtual machine where it is attached. This is also popularly known as Azure Managed Disks. Azure Disks uses Azure BLOB as underline storage solutions so it can be categorized under the Azure BLOB storage solution option.
Azure Disks are suitable for:
- Azure virtual machines
- Store database files of SQL Server workload
- Data drive for IaaS machines
Selecting the right data storage solution
We should carefully choose a data storage solution that can help us to address easy accessibility and compatibility with the nature of data. Storing data in any storage solution is not the only activity we perform; we also need to access that data over time for various application logics and business requirements. We should make sure that data should be accessed smoothly without having any performance issues while choosing a data storage solution.
Before choosing any storage solution you must be aware of the nature of data like its data classification and its uses like how you are going to use that data. Once you are clear about data classification, uses, and acceptable performance requirements, then it would be easy for you to choose an optimal storage solution.
Let’s discuss how to understand the nature of data.
Understand your data classification
The very first step towards choosing the right Azure storage solution is to understand your data. It’s type, size, nature, and growth, etc. Is it the image, videos, text files, or mix of all these or do you want to store data for your application that will be hosted online to sell various products, or do you want to store data for your video streaming website, etc? There could be various requirements for which you want to save data. You cannot have one solution for all these requirements. You need a different storage solution for each data requirement. To better understand it, we can classify all types of data in to below 3 categories.
- Structured Data
- Semi-structured Data
- Unstructured Data
As its name suggests Structured Data is properly structured in a specific form so that you can easily access them later or use the application wherever you need anything from this data. The data in this category is organized in schema format with defined fields and properties which is a little complex to change frequently. The RDBMS database systems like SQL Server, Oracle, DB2 come under the Structure data category. RDBMS stands for Relation Database Management System under which data is organized in form of tables, rows, and columns and there is the relationship between one table to another table using a key column to make their accessibility very easy. We use a query language SQL (Structured Query Language) to access data from tables created in these databases. The structure data is suitable for applications where you need to maintain the ACID property like for banking applications, ticket booking systems, ERP applications, etc.
Initially, data was used to classify into two categories only, structured, and unstructured data. But later, data grew enormously over time and the evolution of digital technology has further amplified the challenges of existing database systems. This has forced IT professionals to think about their existing data handling and then this new classification evolves i.e., Semi-structured Data. The Semi-structured Data classification addresses all the issues and challenges which structured data solutions were not able to address. This type of classification is also popularly known as NoSQL databases. It means Not Only SQL.
The semi-structured data stores data in the non-relational format and it is not organized like structured data that is why SQL language is not suitable to access its data. Data is stored in a key-value format in specific tags in the semi-structure category and not in the table, rows, and columns format. The tag-based key-value format gives the flexibility to add new fields easily as per new requirements and you don’t need to update older data because of future requirements which might need to add new fields. We can access this data using serialization languages like XML, JSON, or YAML.
Next, the unstructured data does not follow a specific schema, definition, or data model like structured data. You can classify your data under Unstructured Data if your data is in the forms of texts, images, videos, files, or a mix of all these data formats.
Now, you have identified the nature of the data. The next step is to identify the uses of your data which I will cover in the below section.
Now, understand the use of your data. What is the primary purpose of your data? You want to use your data for an online application to sell products, or you want it for the online streaming website, or you just want to save data for data protection. You can get the answer to this question by evaluating the below questions.
- Is your data being used by an application or you will access data directly from Azure storage?
- Does the application require DML operations on data stored in the database?
- Do you need to JOIN tables and access fields using their IDs?
- Does your application need to maintain ACID properties for the database transactions?
- How frequently application will access data?
- What are performance expectations while accessing the data?
- Do you need file shares to place your files in a centralized place?
You will understand your data uses by evaluating the answers to the above questions.
I will demonstrate using a case study to showcase how I am going to choose different storage solutions for our application.
Use case of choosing the right Azure Storage types
Let’s consider an example using which we will identify which Azure storage solution will be suitable for our requirement.
Suppose you are developing an online retail portal. You will need various data sets to be stored and accessed while a customer is searching and buying the products on your portal. If someone is searching for a specific product then a list of those products from all sellers will appear along with its availability, images, videos, and feedback ratings, and while buying customer must provide financial data online to buy that product. Here we can see there are multiple data sets involved in this process like specific product catalog, product inventory, images, videos, sales data, shared files, and reports, etc. which we need to evaluate before deciding its storage solutions.
Product Catalog & Inventory
Data should be in structured order because the product catalog and inventory must be updated instantly to reflect the remaining products for other users. It means if a product is sold, inventory must be updated with the reduced number. High Read and write operations.
It will be complex if you want to add new product lines in the future in the case of structured data or relational data. So, the best choice would be semi-structured for this requirement.
Azure Cosmos DB, we can consider Azure SQL database or Azure Tables but these two will have some limitations over Azure Cosmos DB
Product photos, videos
This data will not be updated or modified by customers, but the accessibility of these products must be fast to serve customers during product search.
Azure BLOB storage
The use of this data is mostly for analytical purposes based on its historical data.
Azure SQL Database
Need to create network file shares so that they can be used and accessed by business analysts for centralized files and analytics reports.
Azure File shares
Create an Azure Storage account
I have explored an overview of Azure storage and how to choose the right storage solution in the Azure cloud in the above sections. Now, I will show you how to create a storage account under which you can configure your selected storage solution using the Azure portal. The storage account works as a container to store all Azure storage services which are created under a specific storage account.
Log in to the Azure portal.
Type “Storage Accounts” from the search console and click on this option once it will appear in the search dropdown. The below screen will appear once you will click on storage accounts.
Click on the “+ Create” option showing in the dark red rectangle of the above screen. You will see a “Create a storage account” form appear in which we need to fill in all required details.
There are 6 tabs in this form.
The first tab Basic has two sections. Here you need to choose project related details like a subscription in which you want to create this storage along with its resource group name and instance related details like the name of the storage account, the region where you want to create this storage account, whether you need premium or standard storage under Performance and its redundancy settings.
- Resource Group name
- Storage account name
Have a look at the below screenshot of this tab. If you want to create a legacy storage account type Gen P1 then you can click on the link attached just above the storage account name. It will give you an option to choose a legacy storage type.
The second tab is Advanced where we must configure the type of Azure storage, we must create along with its security configurations.
- Require secure transfer for REST API operations
- Enable infrastructure encryption
- Enable BLOB public access
- Enable storage account key access
- Default to Azure Active Directory authentication in the Azure portal
- Minimum TLS version
Data Lake Storage Gen2
- Enable hierarchical namespace
- Enable network file share
- Allow cross tenant replication
- Access tier
- Enable large files to share
Tables and Queues
- Enable support for customer-managed keys
Have a look at the below screenshot.
The third tab is Networking where we need to configure Network connectivity and Network routing settings.
The fourth tab is Data Protection where we must choose Recovery set to protect data from accidental deletes and Tracking to manage versions.
Firth tab is for Tags and then finally Review + create tab to validate all our settings before creating this storage account.
I would recommend you to carefully choose any options from all the above settings. Make sure to understand what you are choosing for your configuration. Almost all settings are self-explanatory. I have not made any changes and kept all default configurations as it for this storage account creation.
Storage account mdsstorage1 has been created as shown in the above screen. You can click on the “Go to resource” tab shown in the above image to switch to this storage overview dashboard page.
We can create any data storage option from a newly created storage account by accessing the data storage section of the below image. Click at any option based on your requirement and create that type of Azure storage in this storage account. You can create multiple or mixed data storage services in the same storage account.
Microsoft Azure offers multiple options to store data. Azure storage is a Microsoft cloud solution to address data storage requirements. I have explained Azure storage, its features, and its various types in this article. I have also described how to choose the right storage solution for your workload. First, we need to identify the nature of the data and then its uses like how we are going to use that data. This will help us identify the right data storage solutions.
- Difference between SQL SELECT UNIQUE and SELECT DISTINCT - January 16, 2024
- How to do a group by clause - April 12, 2023
- PostgreSQL vs MySQL: Understanding their differences - March 13, 2023