Recommender Systems for Customer Reviews

Introduction

Until now, we have discussed several topics in Text Analytics in Azure Machine Learning in many aspects from the last couple of articles and in this article, we will be discussing the Recommender Systems for Customer Reviews.

Before this article, we have discussed the most popular machine learning techniques such as Regressio n analysis, Classification Analysis, Clustering, Recommender Systems and Anomaly detection of Time Series in Azure Machine Learning by using different sample datasets including data access to Azure SQL Database. Further, we have discussed the basic cleaning techniques, feature selection techniques and Principal component analysis, Comparing Models and Cross-Validation and Hyper Tune parameters in this article series as data engineering techniques that are important processes in Machine Learning.

In the first article on Text Analytics, we had a detailed discussion on Language detection and Preprocessing of text in order to organize textual data for better analytics and how to recognize Named entities in Text Analytics. In the last couple of articles of the series, we have discussed how to perform Filter Based Feature Selection in Text Analytics and Latent Dirichlet Allocation.

This article will combine the discussions of Recommender Systems and Latent Dirichlet Allocation to build Recommender Systems for Customer Reviews.

Azure Machine Learning Experiment

Before we moved into the details of Recommender Systems for Customer Reviews, you can download and view the complicated Azure Machine Learning experiment from here. We will not be introducing any new controls as we have discussed all the controls of this Experiment in previous articles.

Control	Purpose
Dataset	Sample data set from Amazon was used.
Select Columns in Dataset	Remove unnecessary columns to improve readability.
Detect Languages	Detect English language text and eliminate the text with other languages.
Split Data	Split data depending on a criterion and Recommender Split
Enter Data Manually	Enter data for stop words for reviews
Preprocess Text	Preprocessing text so that the content can be modelled
Clean Missing Data	Remove empty rows if exists
Latent Dirichlet Allocation	Apply LDA for the dataset to build topics for each review
Apply SQL Transformation	Calculate Averages for users and products
Train Matchbox Recommender	Training the Recommender model
Score Matchbox Recommender	Provide the recommendations
Evaluate Recommender	Evaluate Recommender

Dataset

This exercise needs a real-world data set to examine the features of Recommender systems for Customer Reviews. Therefore, a sample dataset was downloaded from https://jmcauley.ucsd.edu/data/amazon/. Please note that this has data in JSON format which must be converted to CSV as the JSON data format is not supported by Azure Machine Learning.

However, by using R Script or Python script, you can convert the compressed zipped JSON file to a CSV file. In this example, the JSON file was converted to CSV by a tool.

Let us look at a sample of the dataset.

In this review data, reviewerID is the user ID and asin is the product id. The helpful_0 and helpful_1 are the helpful options selected by the user. The reviewText attribute is the review text entered by the user. The Overall attribute is the rating column.

For the Hybrid Recommender in Azure Machine Learning, you need to supply three inputs to find better recommendations. The first input is the user-item-rating triplet. For the above example, reviewerId-asin-overall. The second input is the user dataset. Reviewerid is the userid while helpful_0 and helpful_1 attributes are relevant to the user profile. Since one user can provide multiple helpful values for different products, we need to aggregate the helpful attributes for each user. If you have any data for users such as designation, marital status, and gender you can use them to build a recommender system for Customer Reviews. The third input is the product dataset. In this dataset, we will use the review data. By using the Latent Dirichlet Allocation, review data can be converted to multiple topics as we did in the previous article.

As shown in the above figure, data is selected to suit three inputs for the Recommender Systems for Customer Reviews. Please remember that you need to provide the data in the same order of Ratings-User-Product to the first input of the recommender training.

As said, the first input has reviewerID-asin-overall triplets as shown in the above figure. For the second input, we need to select columns and SQL transformation controls as shown below.

Following is the SQL query used for the Apply SQL Transformation control.

SELECT reviewerID,

AVG([helpful/0]) AS Helpfull0 ,

AVG([helpful/1]) AS Helpfull1

FROM t1

GROUP BY reviewerID;

After the transformation, the output would be as shown in the below figure.

The third input is used for Product details and we will be using Customer reviews for this purpose. The following figure shows how to generate the dataset.

After selecting necessary attributes from Select Columns in Dataset, we need to select reviews with the English language. In this dataset, there are 12 rows of the Spanish language that was identified by the Language Detection control. After eliminating those rows from the Split Data control, then we need to apply Preprocessing Text. From this control, we have removed special characters, URLs, numbers, stop words. Stop words can be configured from Enter Data Manually control. The Clean Missing Data is used to remove empty rows as from the Preprocessing text there can be empty rows.

After the data is cleaned, Latent Dirichlet Allocation is used to convert each review to five topics. Then the necessary columns are selected and the following is the output.

As for the user, there can be multiple rows for each product. Since Recommender needs a unique row for Product. Therefore, we need to aggregate the data using SQL Transformation by using the following SQL query.

SELECT asin,

AVG([Topic1]) AS Topic1,

AVG([Topic2]) AS Topic2,

AVG([Topic3]) AS Topic3,

AVG([Topic4]) AS Topic4,

AVG([Topic5]) AS Topic5

FROM t1

GROUP BY asin;

You can use other aggregation functions such as AVG, MIN, MAX, SUM, etc. to suit the Recommender Systems for Customer Reviews. Like for users, if we can find out product details such as Product color, brand, category, we can provide a much better recommendation.

With this configuration, now you are ready with all the inputs for the Train Match Recommender control. Then the Score Matchbox Recommender control is included recommendations.

In order to receive recommendations in the Recommender system for Customer Reviews, we have configured Score Matchbox Recommender control as below.

In the above configuration, we are looking for item recommendations with a maximum of five items and a minimum of two items as shown below.

As shown in the above figure, a maximum of five items are recommended for each user. Web service can be configured to the output of the Matchbox recommender so that this experiment can be deployed.

As in any predictive project, we need to find out what is the accuracy of the outcome. For the recommender systems for Customer Reviews, we have normalized discounted cumulative gain (NDCG) measure in Azure Machine Learning.

As you can see from the above screen, NDCG is 96.30%. This means that the Recommender Systems or Customer has higher accuracy.

Conclusion

In this article of the Azure Machine Learning series, we have discussed how to perform recommendations for Recommender Systems for Customer Reviews. We have used Recommender options and Latent Dirichlet Allocation to perform recommendations in Azure Machine Learning. Further, we have used the NDCG score for the evaluation of Recommender Systems for Customer Reviews.

References

Following are the references for the dataset:

Ups and downs: Modeling the visual evolution of fashion trends with one-class collaborative filtering – R. He, J. McAuley, WWW, 2016
Image-based recommendations on styles and substitutes – J. McAuley, C. Targett, J. Shi, A. van den Hengel, SIGIR, 2015

Introduction to Azure Machine Learning using Azure ML Studio

Data Cleansing in Azure Machine Learning

Prediction in Azure Machine Learning

Feature Selection in Azure Machine Learning

Data Reduction Technique: Principal Component Analysis in Azure Machine Learning

Prediction with Regression in Azure Machine Learning

Prediction with Classification in Azure Machine Learning

Comparing models in Azure Machine Learning

Cross Validation in Azure Machine Learning

Clustering in Azure Machine Learning

Tune Model Hyperparameters for Azure Machine Learning models

Time Series Anomaly Detection in Azure Machine Learning

Designing Recommender Systems in Azure Machine Learning

Language Detection in Azure Machine Learning with basic Text Analytics Techniques

Azure Machine Learning: Named Entity Recognition in Text Analytics

Filter based Feature Selection in Text Analytics

Latent Dirichlet Allocation in Text Analytics

Recommender Systems for Customer Reviews

AutoML in Azure Machine Learning

AutoML in Azure Machine Learning for Regression and Time Series

Building Ensemble Classifiers in Azure Machine Learning

Text Classification in Azure Machine Learning using Word Vectors

Author
Recent Posts

Dinesh Asanka

Dinesh Asanka is MVP for SQL Server Category for last 8 years. He has been working with SQL Server for more than 15 years, written articles and coauthored books. He is a presenter at various user groups and universities. He is always available to learn and share his knowledge.

View all posts by Dinesh Asanka

SQLShack

Recommender Systems for Customer Reviews

Introduction

Azure Machine Learning Experiment

Dataset

Conclusion

References

Table of contents

Introduction

Azure Machine Learning Experiment

Dataset

Conclusion

References

Table of contents

Related posts: