Gauri Mahajan
Diamonds Dataset

Import data using R in Power BI

July 27, 2022 by

In this article, we will learn how to configure Power BI to work with R and source the required datasets from it for reporting in Power BI Reports.

Introduction

In the data science practice, many frameworks offer a variety of functionalities for various areas of data science. Out of this multitude of data science frameworks, two of the frameworks that are leaders in this space are R and Python. The adoption and popularity of these frameworks are so huge that many tools have opened integration with this framework. In modern data engineering tools, almost along the entire data pipeline from data collection to data consumption, R and Python can be used at every hop of data in the end-to-end process. Examples of areas where R and Python can be used are ETL pipelines, batch processing using frameworks like Hadoop, rendering visualizations in reports and many more. The core engine of data science frameworks is the libraries that are used with these frameworks and the datasets that are used with them. Generally, any data science exercise starts with exploratory data analysis which uses data samples and then a major part of the exercise is refining this dataset to make it as accurate as possible so that machine learning or artificial intelligence models can infer from this distilled dataset. While these frameworks are not a database that hosts data and data structures, but the datasets created by data scientists as well as sample datasets that ship with thousands of libraries that are using in these frameworks, it makes them a source of data. This translates to frameworks like R becoming a source of data, which may need to be reported at times for various purposes. Reporting tools like Power BI provides options to source data from frameworks like R and render it on PowerBI reports.

R Installation and Setup

We would be using R as the source of data for Power BI reports. So, we have two pre-requisites in this case, a setup of R and a setup of PowerBI on the same machine. It is assumed that the latest version of PowerBI is already installed on the development machine. While report developers may be familiar with installing PowerBI, some may be new to the installation and setup of R. There are two ways in which one can set up and configure R on the local machine. The first and the most straightforward way of installing R is by downloading R from the official site. This installs the R setup at the location of choice and sets the path of the R installation in the path variable.

Another approach to installing R is by selecting the relevant components in the SQL Server installation. Power BI and SQL Server both are part of the Microsoft technology stack and generally, go together for data storage and reporting. If you happen to have Microsoft SQL Server installed on the local machine already, you can consider just adding R in the share features section and have it installed as part of the SQL Server installation. Once it is installed, you would find it in the directory named R Server as shown below. One can use this setup of R with PowerBI as well as with the SQL Server database engine. This mechanism of sharing the R installation with SQL Server and PowerBI helps in using common libraries and datasets seamlessly using a shared R installation.

R Installation

It is assumed that at least one R installation is available on the same machine on which PowerBI is installed.

Now that R is installed on the local machine, it is time to configure the Power BI Desktop tool to point to this R installation. Open PowerBI, click on the File menu -> Options and Settings -> Options menu -> R Scripting menu item. This would open the page relevant to R Scripting configuration in PowerBI. By default, the Detected R Home directories may not have any value. Click on the drop-down and navigate to the path of the R Home directory installation. Once done, it would show the path of the home directory as shown below.

Power BI Configuration

If we are using the setup of R that is installed using Microsoft SQL Server, some basic libraries would already be installed. In some cases, we may have to install some basic libraries if a specific or custom setup of R is installed. For now, we will assume that an installation of R with adequate libraries is installed and configured to be used with Power BI.

Importing data from R can be done in two ways. R itself is a data processing and programming framework, so it allows importing of data from various data sources using R scripting. PowerBI itself supports more than a hundred connectors to source data directly in PowerBI reports. At times, data analysts or data scientists create scripts that source data and process it as well using the R script. One may want to reuse such scripts to continue sourcing data in a consistent way without the need to stage such data in an interim repository to source it in PowerBI. For such use-cases, PowerBI provides an option to execute R Script using the R Script connector, which executes scripts using the R set up to which we pointed the Power BI Desktop tool earlier. Let’s say that we intend to import data from a file that is hosted on GitHub. Using a single line of code, we would be able to source data from such a file. The pre-requisite for sourcing data using an R connector is that the output should be a data frame, only then it would be sourced in PowerBI.

Click on the Get Data menu item and select the R Script connector. It would open a new dialog to specify R scripts. Add a script like the one shown below to read data from the CSV file hosted on GitHub.

Reading files from Github

Click on the OK button, and then Power BI would attempt to execute this script using the R installation. If the script execution is successful, the extracted data would be shown on the preview page as shown below. This means that it was able to successfully source data. We also have the option to transform this data in flight before its loaded in the report.

Data sources using R

Another use-case of importing data from an R installation is to source sample datasets that ship with the libraries that are installed or datasets that are created as an output of executing certain scripts in R. Such datasets can be sourced from R using only R script connector in Power BI. One of the most popular datasets that are widely used by data professionals who work with R is the diamonds dataset that ships with the ggplot2 library. Probably, this library would not be installed by default. One can navigate to the R executable, and execute the install.packages(“ggplot2”) command to install this library with its associated dependencies. We will be using this library in the upcoming example and it’s assumed that this library is already installed on your R setup.

Repeat the steps shown above and enter the script as shown below. Here we are importing the ggplot2 library in our script in the first step. In the next step, we are point to the diamond’s dataset using the data function.

R Script to source datasets from R

If the script gets executed successfully, we would be able to view the data on the preview page as shown below.

Diamonds Dataset

In this way, with just a few lines of code and using the R Script connector, we can start sourcing data from and through the R installation for use with Power BI reports.

Conclusion

In this article, we first learned how to install R directly as well as with the installation of SQL Server. We configured the Power BI Desktop tool to point to the R installation, and then we learned how to use the R connector and scripting to source data from R as well as from other sources using R.

Gauri Mahajan
PowerBI, Reporting Services (SSRS)

About Gauri Mahajan

Gauri is a SQL Server Professional and has 6+ years experience of working with global multinational consulting and technology organizations. She is very passionate about working on SQL Server topics like Azure SQL Database, SQL Server Reporting Services, R, Python, Power BI, Database engine, etc. She has years of experience in technical documentation and is fond of technology authoring. She has a deep experience in designing data and analytics solutions and ensuring its stability, reliability, and performance. She is also certified in SQL Server and have passed certifications like 70-463: Implementing Data Warehouses with Microsoft SQL Server. View all posts by Gauri Mahajan

222 Views