Azure Machine Learning (also known as Azure ML) is cloud-based machine learning solution of Microsoft. Microsoft Azure Machine Learning is a fully-managed cloud-based service that provides the ability to create and train predictive analytic solutions. Another advantage of Azure ML is that you can access and easily make changes anywhere in machine learning models with help of Microsoft Azure Machine Learning Studio.
We can sign into Azure Machine Learning Studio in a web browser and create, change and train machine learning experiments. Also, we can convert these training experiments to predictive experiment and publish it as a web service. In my opinion, the major feature of Azure ML Studio is drag-and-drop because this option allows us to create and connect analyze modules very easily. In addition, drag-and-drop provides the benefit to write very few code. Another key feature of Azure ML is R module support.
According to Microsoft documentation; Azure ML supports more than 400 R packages. This means that Azure ML allows us to use tons of R function collections. At the same time this feature increases the power of Azure ML. If we make a short list of the main advantages of Azure ML;
- A fully-managed cloud-based service
- Interactive and easy access workspace of Microsoft Azure Machine Learning Studio
- R and Python support
- Easy deployment and usage with web service support
After this brief information about Azure ML, we can focus on the main idea of this article. In this article, we will explore how to use JSON data in an Azure ML experiment as a dataset. Data is an indispensable part of machine learning experiments. The main and essential inputs of machine learning experiments are data because the selected algorithm of the experiment will process and create output with help of this dataset. On the other hand, JSON is the most popular key-value pair data interchange format and great number of applications use this data interchange format.
In this article (How to use JSON data in SSRS) you can find detail information about JSON and JSON data structure.
How to import data to Azure ML?
The Import data module is the main module to load data from external sources for Azure but there is a lack of an import data module that supports JSON. The Import data module supports the following data sources but this list does not include any provider for JSON data.
- Web URL via HTTP
- Hive Query
- Azure SQL Database
- Azure Table
- Import from Azure Blob Storage
- Data Feed Providers
- Import from On-Premises SQL Server Database
- Azure Cosmos DB
Microsoft recommends that if we need to import data from JSON we can use Execute Python Script or Execute R Script modules. In this article, we will use Execute R Script module. In the following demonstrations, we will use Execute R Script module. This module is used to execute R script codes in Azure ML Studio.
The Execute R Script module has three input parameters. These are Dataset1, Dataset2 and Script Bundle. With help of Dataset1 and Dataset2 inputs you can import data to Execute R Script module. The Script Bundle port can take a zip file and this zip file can contain several file types. In our demonstration, we will use this Script Bundle port to load JSON.
In the right side of screen, you can find R Script text box. The R script text box helps us to write and execute R script codes.
How to upload and use a JSON file as a dataset in Azure Machine Learning
In this part, we will demonstrate how to import a zipped JSON file to Azure ML. Imagine that we have a JSON data file and we want to use this JSON file as a dataset for Azure ML. You can download the sample JSON data in JSONPlaceholder and then you have to zip this JSON file.
Select Dataset tab in Azure ML Studio and click (+) New
Select FROM LOCAL FILE
Select the path of zipped JSON file and chose the SELECT A TYPE FOR THE NEW DATASET as a Zip File
Click the check icon
After the successful uploading process, you will get the Upload of the dataset ‘SampeJSONData.zip’ has completed notification and SampleJSONData file shows in the MY DATASETS tab.
Select Experiments tab and click (+) NEW.
Chose the Blank Experiment option and then create a new experiment.
Find SampleJSONData.zip then drag and drop SampleJSONData.zip to design panel.
Drag and drop Execute R Script Module.
Connect the SampeJSONData.zip output to Execute R Script Module Script Bundle port.
Paste the following R script code to the R script textbox.
myjsondata <- fromJSON("src/posts.txt")
Now, we will explain the above R script code line by line.
library(jsonlite) : This code part allows us to use Jsonlite package of R
myjsondata <- fromJSON(“src/posts.txt”): In this part of code we load posts.txt json file and assign the output of this process to myjsondata variable. The “src/posts.txt” part defines our zipped file name.
maml.mapOutputPort(“myjsondata”) : In this code, line is sent to myjsondata variable result to output Result Dataset port of Execute R Script Module.
Run the experiment then right click Result Dataset port of Execute R Script module and select Visualize
Finally, we got the JSON data from zip file and converted it to usable format in Azure ML experiments. Now, we can get JSON data from any web site.
How to get JSON data from a web site in Azure Machine Learning?
Actually, this option is very similar to the previous demonstration. In this option, we only change the source of JSON data file. In the previous demonstration, we used to have a zip file but, in this demonstration, we will use JSON data on website.
Modify the R script module code which like the following.
myjsondata <- fromJSON("https://jsonplaceholder.typicode.com/posts ")
Run the experiment.
When you visualize the Result Dataset of R script module you can see it getting the JSONPlaceholder posts JSON data.
In the below image we can see the compare of JSON form and visualization form.
In this article, we learned detail usage of JSON data in Azure Machine Learning. The main idea of this solution is based on R support of Azure Machine Learning. Azure Machine Learning is definitely bringing it’s A-game with R and Python support.
Most of his career has been focused on SQL Server Database Administration and Development. His current interests are in database administration and Business Intelligence. You can find him on LinkedIn.
View all posts by Esat Erkec