Rajendra Gupta
SQL Machine Learning R scripts installation

An overview of SQL Machine Learning with R scripts

November 25, 2020 by

In this article, we will focus on SQL Machine Learning using R scripts and the use of external packages.

SQL Server revolves now no more just around a traditional database system. It extended its capability using machine learning by R, Python and Java languages.

  • Python: Python is an interactive, high-level and Object-oriented programming language. You can run the Python code for complex algorithms and versatile workflows using friendly commands. You can browse SQLShack articles for more details on Python
  • R language: It is a popular programming language for statistical data computing. It is widely used by data scientists, statistical and mathematical computations. You can use it on UNIX, Windows and macOS

Install R in Machine Learning for SQL Server 2019

In this article, we don’t intend to go into details about R installation. You can refer to the article, Machine Learning Services – Configuring R Services in SQL Server for detailed steps.

In the SQL Server installation, put a check for the SQL Machine Learning Service and Language Extensions for R Scripts.

SQL Machine Learning R scripts installation

You can choose to select R, Python and Java as your preferred SQL Machine Learning language.

  • Machine Learning Services (In-Database): It installs the SQL Server Launchpad service to support R and Python script execution
  • R: It installs the R packages, interpreter, and open-source R to execute the R scripts
  • Python: It installs Microsoft Python packages, the Python 3.5 executable and default libraries

In the above screenshot, we see a Red box having shared features for the R and Python machine learning server (standalone). Do not select these shared features on the server where you installed the Machine learning Services. It is required for a standalone server where R services don’t exist.

Once the installation wizard completes, you get the following confirmation screen.

Confirmation screen

Create an environment variable for SQL Machine Learning

In the Windows Server, navigate to Control panel -> System and Security ->System ->Advanced System Settings -> Environment Variables.

Create an environment variable with the following details:

  • Variable Name: MJK_CBWR
  • Variable value: Auto

Create an environment variable

Click Ok and restart the Windows server.

Enable script execution using sp_configure

Connect to the SQL instance using SSMS and enable external services. It enables the interaction of SQL instance with the external SQL machine learning services for R scripts.

Reconfigure

Now, restart the SQL Services. After restarting SQL services, both SQL Server service and SQL Server Launchpad service should be in running state as shown below.

Launchpad service for SQL Machine Learning

Default R library

By default, R services are installed in the C:\Program Files\Microsoft SQL Server\MSSQL15.INST1\R_SERVICES where C is the root directory for your SQL instance. here, the INST1 is the SQL Server instance name.

Default R library

In this path, open the library folder, and it shows the installed instance library. SQL Server uses loads the respective functions from the instance library for running the R script.

installed instance library

Run the following SQL script to verify the default R package library. It uses the sp_execute_external_script stored procedure and specifies R in the language parameter.

verify the default R package library

For SQL Machine learning and statistical computing, R script uses various functions and packages to process the required data. By default, SQL Server installs useful packages. To get a list of installed packages in the R services, you can run the below T-SQL.

It gives you a list of 58 packages preconfigured with the SQL Server R installations.

58 packages preconfigured

Let’s go through the useful external R libraries.

Dplyr

Dplyr is useful for data manipulations. You can load data from the SQL tables or data sources to filter data or columns, sort or group data, add new variables, create new columns. Refer to the Dplyr documentation for more details.

Esquisse

This package is useful for data visualization purposes. You can visualize your data interactively using the graphs, charts, curves, histograms, scatter plots. You get drag and drop functionality to customize your visuals. Refer to these documents for more details.

Lubridate

We use the Lubridate library in R for the data wrangling. Many times, you need to deal with the data time data for time series data analysis. You can work with the date-range and time-spans in a friendly way. You can also update the components such as years, months, days, hours, minutes, and seconds with it. Refer to Lubridate documentation.

RODBC

RODBC is useful for the ODBC database interface to various data sources. Refer to this documentation.

Leaflet

The Leaflet package contains the JavaScript library for working with the interactive maps in the R console. You can design, customize the maps using the library functions. You can use these maps in the R studio or the markdown documents. Refer to this Documentation.

ggplot2

It is a widely used library for the data visualizations in R services by the data scientists. You can use this library to build graphics, plots, visual statistics. Refer to this documentation.

Janitor

Data cleaning is an essential aspect of data analysis. You do not want duplicate values, duplicate columns or columns without any value for your data analysis. It contains useful functions to remove duplicates, adds tabulating functionality such as adds a new column based on the other column values. Refer to this documentation.

XLConnect, xlsx2dfs ,xlutils3 xlsx

These packages are useful to interact with Microsoft Excel files.

  • XLConnect: It is an excel connection in the R scripts
  • xlsx2dfs: it helps you to read and write into the excel sheets using your data frame
  • xlutils3: If you need to work with multiple excel files, xlutils3 is helpful for you in extracting multiple sheets together
  • xlsx: You can read, write, format the excel files with the XLSX extension

readr

The Read Rectangular Text Data (readr) enables you to read the following files:

  • Comma-separated values (CSV)
  • Tab-separated values(TSV)
  • delimited values
  • fixed-width files(FWF)

mlr

It is also a popular package in the R scripts to perform data classifications and regressions. You can perform various activities such as bootstrapping, subsampling, Hyperparameter tuning and nested resampling with the mlr library. Refer documentation.

Magick

The magick library is useful for image processing using R scripts. You can access various file formats such as JPB, PNG, JPEG, PDF, TIFF using the magick. It gives you the ability to perform image manipulations operations such as rotate, scale, crop, trim, flip, blur.

You can refer to this documentation.

There are many packages in the R scripts that can be used as per your data and analysis requirements. You can check the package availability in your SQL R instance, using the stored procedure sp_execute_external_script. For example, if we want to check for the image processing package magick, use the following query.

You get the error message that there is no package called ‘magick’ for your R scripts.

Error message

To install a specific external package in R service, launch the R from the C:\Program Files\Microsoft SQL Server\MSSQL15.INST1\R_SERVICES\bin.

specific external package

Now, right-click on the R and launch it an administrator. It opens the following Microsoft R Client application.

Check the Version

Run the following command to install the magick package in R.

>install.packages(“magick”)

install the magick library

It downloads the magick R package, its dependencies as shown below.

install the magick package

Once the external package in R is downloaded, extracted and installed, you get the following screen.

Confirmation message

Now, you can rerun the query to check the magick package, and it gives you the following output. In this output, it returns the enabled and disabled features in the magick package.

Magick library version

Conclusion

In this article, we looked at the R services configuration in SQL Server 2019. Further, we explored useful libraries and installation methods. You can install the required package for your data analysis purpose. In the next article, we will explore the data processing using the R script in the SQL Server.

Rajendra Gupta
Machine learning, R, SQL Server 2019

About Rajendra Gupta

As an MCSA certified and Microsoft Certified Trainer in Gurgaon, India, with 13 years of experience, Rajendra works for a variety of large companies focusing on performance optimization, monitoring, high availability, and disaster recovery strategies and implementation. He is the author of hundreds of authoritative articles on SQL Server, Azure, MySQL, Linux, Power BI, Performance tuning, AWS/Amazon RDS, Git, and related technologies that have been viewed by over 10m readers to date. He is the creator of one of the biggest free online collections of articles on a single topic, with his 50-part series on SQL Server Always On Availability Groups. Based on his contribution to the SQL Server community, he has been recognized with various awards including the prestigious “Best author of the year" continuously in 2020 and 2021 at SQLShack. Raj is always interested in new challenges so if you need consulting help on any subject covered in his writings, he can be reached at rajendra.gupta16@gmail.com View all posts by Rajendra Gupta

1,894 Views