Dinesh Asanka
Recommendations of Products for each user.

Recommender Systems for Customer Reviews

July 29, 2021 by


Until now, we have discussed several topics in Text Analytics in Azure Machine Learning in many aspects from the last couple of articles and in this article, we will be discussing the Recommender Systems for Customer Reviews.

Before this article, we have discussed the most popular machine learning techniques such as Regressio n analysis, Classification Analysis, Clustering, Recommender Systems and Anomaly detection of Time Series in Azure Machine Learning by using different sample datasets including data access to Azure SQL Database. Further, we have discussed the basic cleaning techniques, feature selection techniques and Principal component analysis, Comparing Models and Cross-Validation and Hyper Tune parameters in this article series as data engineering techniques that are important processes in Machine Learning.

In the first article on Text Analytics, we had a detailed discussion on Language detection and Preprocessing of text in order to organize textual data for better analytics and how to recognize Named entities in Text Analytics. In the last couple of articles of the series, we have discussed how to perform Filter Based Feature Selection in Text Analytics and Latent Dirichlet Allocation.

This article will combine the discussions of Recommender Systems and Latent Dirichlet Allocation to build Recommender Systems for Customer Reviews.

Azure Machine Learning Experiment

Before we moved into the details of Recommender Systems for Customer Reviews, you can download and view the complicated Azure Machine Learning experiment from here. We will not be introducing any new controls as we have discussed all the controls of this Experiment in previous articles.




Sample data set from Amazon was used.

Select Columns in Dataset

Remove unnecessary columns to improve readability.

Detect Languages

Detect English language text and eliminate the text with other languages.

Split Data

Split data depending on a criterion and Recommender Split

Enter Data Manually

Enter data for stop words for reviews

Preprocess Text

Preprocessing text so that the content can be modelled

Clean Missing Data

Remove empty rows if exists

Latent Dirichlet Allocation

Apply LDA for the dataset to build topics for each review

Apply SQL Transformation

Calculate Averages for users and products

Train Matchbox Recommender

Training the Recommender model

Score Matchbox Recommender

Provide the recommendations

Evaluate Recommender

Evaluate Recommender


This exercise needs a real-world data set to examine the features of Recommender systems for Customer Reviews. Therefore, a sample dataset was downloaded from https://jmcauley.ucsd.edu/data/amazon/. Please note that this has data in JSON format which must be converted to CSV as the JSON data format is not supported by Azure Machine Learning.

However, by using R Script or Python script, you can convert the compressed zipped JSON file to a CSV file. In this example, the JSON file was converted to CSV by a tool.

Let us look at a sample of the dataset.

Sample of the selected dataset in order to build Recommender Systems for Customer Reviews

In this review data, reviewerID is the user ID and asin is the product id. The helpful_0 and helpful_1 are the helpful options selected by the user. The reviewText attribute is the review text entered by the user. The Overall attribute is the rating column.

For the Hybrid Recommender in Azure Machine Learning, you need to supply three inputs to find better recommendations. The first input is the user-item-rating triplet. For the above example, reviewerId-asin-overall. The second input is the user dataset. Reviewerid is the userid while helpful_0 and helpful_1 attributes are relevant to the user profile. Since one user can provide multiple helpful values for different products, we need to aggregate the helpful attributes for each user. If you have any data for users such as designation, marital status, and gender you can use them to build a recommender system for Customer Reviews. The third input is the product dataset. In this dataset, we will use the review data. By using the Latent Dirichlet Allocation, review data can be converted to multiple topics as we did in the previous article.

Select Data for Ratings, User and Products in Azure Machine Learning.

As shown in the above figure, data is selected to suit three inputs for the Recommender Systems for Customer Reviews. Please remember that you need to provide the data in the same order of Ratings-User-Product to the first input of the recommender training.

reviewerID-asin-overall triplets

As said, the first input has reviewerID-asin-overall triplets as shown in the above figure. For the second input, we need to select columns and SQL transformation controls as shown below.

Applying transformation for the user dataset.

Following is the SQL query used for the Apply SQL Transformation control.

After the transformation, the output would be as shown in the below figure.

The Output of the SQL Transformation.

The third input is used for Product details and we will be using Customer reviews for this purpose. The following figure shows how to generate the dataset.

Applying Text Analytics for Customer Review Data.

After selecting necessary attributes from Select Columns in Dataset, we need to select reviews with the English language. In this dataset, there are 12 rows of the Spanish language that was identified by the Language Detection control. After eliminating those rows from the Split Data control, then we need to apply Preprocessing Text. From this control, we have removed special characters, URLs, numbers, stop words. Stop words can be configured from Enter Data Manually control. The Clean Missing Data is used to remove empty rows as from the Preprocessing text there can be empty rows.

After the data is cleaned, Latent Dirichlet Allocation is used to convert each review to five topics. Then the necessary columns are selected and the following is the output.

Latent Dirichlet Allocation is used convert each review to five topics and column selection.

As for the user, there can be multiple rows for each product. Since Recommender needs a unique row for Product. Therefore, we need to aggregate the data using SQL Transformation by using the following SQL query.

You can use other aggregation functions such as AVG, MIN, MAX, SUM, etc. to suit the Recommender Systems for Customer Reviews. Like for users, if we can find out product details such as Product color, brand, category, we can provide a much better recommendation.

With this configuration, now you are ready with all the inputs for the Train Match Recommender control. Then the Score Matchbox Recommender control is included recommendations.

Recommender controls in Azure Machine Learning.

In order to receive recommendations in the Recommender system for Customer Reviews, we have configured Score Matchbox Recommender control as below.

Configurations for Score Matchbox Recommender control

In the above configuration, we are looking for item recommendations with a maximum of five items and a minimum of two items as shown below.

Recommendations of Products for each user.

As shown in the above figure, a maximum of five items are recommended for each user. Web service can be configured to the output of the Matchbox recommender so that this experiment can be deployed.

As in any predictive project, we need to find out what is the accuracy of the outcome. For the recommender systems for Customer Reviews, we have normalized discounted cumulative gain (NDCG) measure in Azure Machine Learning.

DCG score for the Recommender System

As you can see from the above screen, NDCG is 96.30%. This means that the Recommender Systems or Customer has higher accuracy.


In this article of the Azure Machine Learning series, we have discussed how to perform recommendations for Recommender Systems for Customer Reviews. We have used Recommender options and Latent Dirichlet Allocation to perform recommendations in Azure Machine Learning. Further, we have used the NDCG score for the evaluation of Recommender Systems for Customer Reviews.


Following are the references for the dataset:

  • Ups and downs: Modeling the visual evolution of fashion trends with one-class collaborative filtering – R. He, J. McAuley, WWW, 2016
  • Image-based recommendations on styles and substitutes – J. McAuley, C. Targett, J. Shi, A. van den Hengel, SIGIR, 2015

Table of contents

Introduction to Azure Machine Learning using Azure ML Studio
Data Cleansing in Azure Machine Learning
Prediction in Azure Machine Learning
Feature Selection in Azure Machine Learning
Data Reduction Technique: Principal Component Analysis in Azure Machine Learning
Prediction with Regression in Azure Machine Learning
Prediction with Classification in Azure Machine Learning
Comparing models in Azure Machine Learning
Cross Validation in Azure Machine Learning
Clustering in Azure Machine Learning
Tune Model Hyperparameters for Azure Machine Learning models
Time Series Anomaly Detection in Azure Machine Learning
Designing Recommender Systems in Azure Machine Learning
Language Detection in Azure Machine Learning with basic Text Analytics Techniques
Azure Machine Learning: Named Entity Recognition in Text Analytics
Filter based Feature Selection in Text Analytics
Latent Dirichlet Allocation in Text Analytics
Recommender Systems for Customer Reviews
AutoML in Azure Machine Learning
AutoML in Azure Machine Learning for Regression and Time Series
Building Ensemble Classifiers in Azure Machine Learning

Dinesh Asanka