This article will help you understand the process to create an Azure Synapse Analytics workspace and some other features related to it.
In my previous article, Understanding Azure Synapse Analytics (formerly SQL DW) on Azure Synapse, we learned how to get started with Azure Synapse Analytics. To get started with the implementation on the ground, the first step is to create an Azure Synapse Analytics Workspace, which provides an environment to access different features and aspects related to Azure Synapse Analytics. In this article, we will learn how to create a workspace as well as understand how to operate and access various features once the workspace is created.
Azure Synapse Analytics Workspace
As of the draft of this article, the synapse workspace is in preview. So, there may be more features that may get added to the user interface of the workspace. Also, before this service goes into general availability, there may be minor changes to the user interface. This should be kept in view while following the below exercise.
We are going to create a new workspace. To get started, navigate to Azure Synapse Analytics Workspace by the search for this service from the search bar, and you would land on the home page of this service as shown below.
Assuming that it’s the first time you are creating a workspace, you would find a blank page. Click on the Create Synapse workspace button to invoke the wizard that will lead us to the creation of a new workspace. The first page of the wizard would look as shown below. In the Basics section, we need to provide basic details like the Azure Subscription and Resource Group to start with. As shown below, when you are creating your very first workspace, you may be shown a warning stating that the Synapse resource provider is not registered with your subscription. Till a resource provider of a service is not registered with your subscription, one cannot use that service within the subscription.
Click on the “Click here to register” button to register the synapse provider with the subscription. Once registered, the warning sign won’t appear as shown below.
The next step is to provide details for the next set of configuration options. Provide the name of the resource group in which the workspace should be created. Then we need to provide details related to the workspace itself, like the Workspace name and the region in which the workspace should be created as shown below.
Data can be accessed in Azure Synapse from different repositories. Synapse natively can read data from Azure Data Lake Storage Gen2. We need to have an account as well as a file system for the same. These files can be accessed from Azure Synapse. If you already have created an account, you can specify the same by selecting from the subscription or manually specifying the URL of the account. If you do not have any Azure Data Lake Storage Gen2 account in place, we need to create a new one. This can be easily done by clicking on the Create New button under Account Name, which will pop-up a dialog as shown below.
Provide an appropriate value for the name of the account and click on the Ok button. This will create a new Azure Data Lake Storage Gen2 account. Under this account, we need to have a file system as well on which we can store files. Click on the Create New button under the File System Name box, which will pop-up a similar dialog. Provide an appropriate name for the file system and click on the Ok button. This will create a new file system as shown below.
To access data from the Azure Data Lake Storage Gen2 account, one would need Contributor level access at least on this account. As explained in the information section, Contributor level access is required by different Synapse features to access data from Azure Data Lake Storage Gen 2. So, click on the checkbox titled “Assign myself the Storage Blob Data Contributor role on the Data Lake Storage Gen2 account as shown below. Click on the Next button to move on to Securing + Networking section.
In this step, we need to provide the administrator credentials that would be used to connect to the SQL pools. In Azure Synapse Analytics, there are two types of runtime that can be created – SQL runtime and Spark runtime. Each runtime is accessed by creating pools. To access the SQL pools, which are available in the form of SQL on-demand as well as SQL Standard pool, we need to provide credentials for the SQL Standard pool as shown below.
Apart from the credentials, we can integrate data pipelines (identical to the ones available in Azure Data Factory) with Azure Synapse analytics. To allow these pipelines to use the managed identity provided by the workspace, so that the pipelines can access the SQL pools, we can click on the checkbox titled “Allow pipelines (running as workspace’s system assigned identity) to access SQL pools”, as shown below.
The next section relates to networking related aspects. If we intend to always use the Azure internal network via Azure Private Links, so that the traffic between the workspace and the data sources does not use the open internet, then we can enable a Synapse-managed virtual network. This managed virtual network will ensure that the traffic uses the Azure internal network. To enable the same, click on “Enable managed virtual network” as shown below. Please keep in view, that this feature would have a minor additional cost.
The other networking aspect to configure is specifying which IP addresses can connect to this workspace. There is an option to allow all IP addresses to connect to the workspace by selecting “Allow connections from all IP addresses”. This does not mean that we are opening the workspace to open the internet. It means that all the configured and Azure internal IPs can connect to this workspace. This is required so that Azure client tools like Azure Synapse Studio, which is a web-based SaaS tool on Azure can connect to this workspace. To restrict IPs, you can configure the same in the firewall settings later. Once done, click on the Next button to move on to the next section.
In this section, we can add any Tags to add metadata to this workspace instance as shown below.
Now we are done with providing all the required details to create the Azure Synapse workspace. Click on the Next button to navigate to the Summary section, and you would be able to review all the details that we have specified till now as shown below.
Another important point to note is that when the workspace is created, the SQL On-demand pool would be created and provisioned by default. The estimated cost of a SQL on-demand pool is $5/TB of data scanned. The rest of the pools needs to be explicitly created as and when required. Review the details and click on the Create button to start the creation of the workspace.
Within a short time, the Azure Synapse Workspace should get created as shown below. Click on the Go to resource button to open the workspace and land on the dashboard page.
Once you are on the workspace dashboard page, it would look as shown below. You can view the different properties and endpoints from this page. You can also create new pools, reset your credentials, change firewall settings to allow access to desired IP ranges, open the Synapse Studio, perform monitoring and administrative operations from this workspace.
In this way, one can create an Azure Synapse Workspace, which acts as the central console to access a wide variety of tools and features related to Azure Synapse Analytics.
In this article, we learned about Azure Synapse Analytics Workspace and navigated different settings and options that can provide to create a new workspace. We also learned to configure integration options like managed networking, azure data lake account gen2 and other options.