Rajendra Gupta
Import data from a CSV to SQL Server tables using R and SQL Server

Import and Export data using R and SQL Server

December 3, 2020 by

Data professionals get requests to import, export data into various formats. These formats can be such as Comma-separated data(.CSV), Excel, HTML, JSON, YAML, Tab-separated data(.TSV). Usually, we use SQL Server integration service ETL packages for data transformations, import or export data.

SQL Machine Learning can be useful in dealing with various file formats. In the article, External packages in R SQL Server, we explored the R services and the various external packages for performing tasks using R scripts.

In this article, we explore the useful Rio package developed by Thomas J. Leeper to simplify the data export and import process.

Environment requirements

For this article, you should have the SQL Server environment. In this article, I use the followings:

  • Version: SQL Server 2019
  • Machine learning Language: R
  • Node: SQNode2\INST1
  • SQL Server Launchpad and Database engine service should be in running state
  • SQL Server Management Studio

You can follow An overview of SQL Machine Learning with R scripts, to meet these requirements.

Data Import and Export using SQL Machine Learning R Scripts

R Script in SQL Server is capable of performing advanced analytic and predictive applications using the Microsoft R runtime. We can execute the R code directly from the SQL Server using the sp_execute_external_script stored procedure.

We can use the Rio package for data import and export in various formats using the R scripts. It is an external package and does not install when you configure the SQL Server Machine Learning for R.

To check the Rio package in your environment, you can run the following script.

As shown below, it gives an error message that there is no package called Rio.

error message

To install the Rio package, open the administrative R console from the C:\Program Files\Microsoft SQL Server\MSSQL15.INST1\R_SERVICES\bin path and run the following command.

install.packages(“rio”)

It downloads the required configuration files and installs it for your R server. You should have an internet connection for downloading these components.

administrative R console for R and SQL Server

Install the package

Once the Rio package installation completes, you get the following screen.

Rio package installation

You can find the Rio library in the default SQL instance directory C:\Program Files\Microsoft SQL Server\MSSQL15.INST1\R_SERVICES\library.

Rio library

Now, rerun the stored procedure, and it returns the Rio package version.

Rio package version

Download a CSV file from the Web URL and import it

For the article, let’s download a sample CSV file from the following download.file() function. It downloads the file and places it into the default R directory.

> download.file(“http://bit.ly/BostonSnowfallCSV”, “WinterSnowfalls.csv”)

Download a CSV file

You can see the CSV file in the C:\Program Files\Microsoft SQL Server\MSSQL15.INST1\R_SERVICES\bin folder.

Downloaded file

The Rio package uses the import() function to read the CSV file. You can use the import() function to read data from compressed directories and URLs (HTTP or HTTPS). Refer to the GitHub repository for supported file formats.

> Mydata <- rio::import(“WinterSnowfalls.csv”)
> Mydata

You can see data in the R client console, as shown below.

import() function

In the below R and SQL Server, we use \\ for specifying the CSV file path. R script considers \ as an escape character.

As shown below, using the R script in SQL Server, we get the data from the CSV file. It reads a total of 76 records from the specified CSV file.

  • Note: You can use any directory for the CSV file however your SQL service account should have permissions to access the files

Use of escape character

In the above screenshot, we get the data from the CSV, but it does not display any column names. Our CSV file has the column names. To display it with R and SQL Server, you can use WITH RESULT SETS and define columns, their t-SQL data types. In the below query, we define two columns [Winter], [Total] and their data types varchar(100) and float respectively.

In the query output, the T-SQL script returns the column names as well.

get the data from the CSV

You can use the print() function in R script. It displays the column names, their values in the message tab of SSMS.

the print() function in R script

Convert CSVs into Excel files using R and SQL Server

Run the following R scripts in SQL Server to convert the CSV into an excel file.

> CSV <- import(“WinterSnowfalls.csv”)
> OutputDataSet <- export(CSV,”WinterSnowfalls.xlsx”)

Run the script in the R console client from the bin directory.

Convert the CSV into an Excel file

It converts the file into an excel file and saves it into the source file directory.

Check excel file

To execute the R script from the SQL Server, embed it into the sp_execute_external_script stored procedure. Here, you need to specify the path to save the excel file.

Import an excel file using R and SQL Server

The Rio package uses the same import() function for the data import from an excel file. It can read both XLS and XLSX format files.

In the previous example, we converted the CSV into Excel using R scripts. Let’s try to read the excel file, and it should give output similar to the CSV data because we haven’t modified the contents.

As we can see below, it returned 76 rows from the excel file. You can compare the output of CSV and Excel files as well.

Import an excel file

Similar to the example shown for CSV, you can use the WITH RESULT SET or Print() function to display the column names as well.

Import JSON files using R and SQL Server

As highlighted earlier, the rio module can work with various file formats. Java Script Notation (JSON) is a popular format for storing log data. It is also used widely to manage the cloud infrastructure as well.

To generate the JSON data for this article, we use Azure Data Studio. It has an integrated functionality to save the T-SQL output in a JSON format.

Import JSON File

It converts the query output in JSON format, as shown below.

JSON format

Use import() function similar to CSV, Excel examples and it returns the JSON data in the output of sp_execute_external_script.

Import the JSON file

You can export the JSON into the CSV, Excel as per your requirements. In the below R script, we export the file and save it as AdventureWorks.xlsx

ToJSON <- import(“AdventureWorks.JSON”)
OutputDataSet <- export(ToJSON,”Adventureworks.xlsx”)

export the JSON into the CSV

You can validate the JSON data converted into Excel using the import() function. Here, I specified the varchar(100) for all columns for demonstration purposes. You should use the appropriate data types for every column.

Import data from a compressed file using R and SQL Server

Usually, to read data from a compressed file, we have to extract it and then read data. Now, using the Rio package in the R and SQL server, we can read data directly from a compressed file.

To prepare the data for this demonstration, right-click on the AdventureWorks.xlsx and navigate Send to-> Compressed (zipped) folder.

validate the JSON data

You can use third-party applications such as WinZip, G-Zip to prepare the compressed file. In the below screenshot, we have a compressed (zip) file. We do not see much compressed because our source file size is small.

compressed file

In the below R script, note that we provide input of a compressed file (C://Temp//AdventureWorks.zip)

It returns a similar result of data import directly from an excel file.

Import data from a compressed file

Import data from a CSV to SQL Server tables using RIO package

In the earlier examples, we imported data from a CSV, Excel or compressed file but did not store into SQL Server tables.

Before we import data directly into SQL Server tables, create the table with appropriate data types and columns. For example, for the above example data, we create a SQL table with the following script.

Now, we use Insert Into statement before executing the sp_execute_external_script stored procedure in the R script and SQL Server.

It imports data directly into the specified SQL table [DataImportR]. You can use the Select statement to verify the data.

Import data from a CSV to SQL Server tables using R and SQL Server

Conclusion

In this article, we explored the useful Rio package using R and SQL Server. It can read data from various file formats using the import() function. Further, we can export or convert data into another format. You can directly read a compressed file and import it into the SQL Server tables.

It is an exciting feature of the machine learning language in SQL Server. I would recommend you explore it.

Rajendra Gupta
Latest posts by Rajendra Gupta (see all)
168 Views