Data professionals get requests to import, export data into various formats. These formats can be such as Comma-separated data(.CSV), Excel, HTML, JSON, YAML, Tab-separated data(.TSV). Usually, we use SQL Server integration service ETL packages for data transformations, import or export data.
SQL Machine Learning can be useful in dealing with various file formats. In the article, External packages in R SQL Server, we explored the R services and the various external packages for performing tasks using R scripts.
In this article, we explore the useful Rio package developed by Thomas J. Leeper to simplify the data export and import process.
Environment requirements
For this article, you should have the SQL Server environment. In this article, I use the followings:
- Version: SQL Server 2019
- Machine learning Language: R
- Node: SQNode2\INST1
- SQL Server Launchpad and Database engine service should be in running state
- SQL Server Management Studio
You can follow An overview of SQL Machine Learning with R scripts, to meet these requirements.
Data Import and Export using SQL Machine Learning R Scripts
R Script in SQL Server is capable of performing advanced analytic and predictive applications using the Microsoft R runtime. We can execute the R code directly from the SQL Server using the sp_execute_external_script stored procedure.
We can use the Rio package for data import and export in various formats using the R scripts. It is an external package and does not install when you configure the SQL Server Machine Learning for R.
To check the Rio package in your environment, you can run the following script.
1 2 3 4 5 6 |
EXECUTE sp_execute_external_script @language = N'R', @script = N' library(rio) print (packageVersion("rio")) ' |
As shown below, it gives an error message that there is no package called Rio.
To install the Rio package, open the administrative R console from the C:\Program Files\Microsoft SQL Server\MSSQL15.INST1\R_SERVICES\bin path and run the following command.
install.packages(“rio”)
It downloads the required configuration files and installs it for your R server. You should have an internet connection for downloading these components.
Once the Rio package installation completes, you get the following screen.
You can find the Rio library in the default SQL instance directory C:\Program Files\Microsoft SQL Server\MSSQL15.INST1\R_SERVICES\library.
Now, rerun the stored procedure, and it returns the Rio package version.
Download a CSV file from the Web URL and import it
For the article, let’s download a sample CSV file from the following download.file() function. It downloads the file and places it into the default R directory.
> download.file(“http://bit.ly/BostonSnowfallCSV”, “WinterSnowfalls.csv”)
You can see the CSV file in the C:\Program Files\Microsoft SQL Server\MSSQL15.INST1\R_SERVICES\bin folder.
The Rio package uses the import() function to read the CSV file. You can use the import() function to read data from compressed directories and URLs (HTTP or HTTPS). Refer to the GitHub repository for supported file formats.
> Mydata <- rio::import(“WinterSnowfalls.csv”)
> Mydata
You can see data in the R client console, as shown below.
In the below R and SQL Server, we use \\ for specifying the CSV file path. R script considers \ as an escape character.
1 2 3 4 5 6 7 |
EXECUTE sp_execute_external_script @language = N'R', @script = N' library(rio) CSV <- import("C://Program Files//Microsoft SQL Server//MSSQL15.INST1//R_SERVICES//bin//WinterSnowfalls.csv") OutputDataSet <- CSV ' |
As shown below, using the R script in SQL Server, we get the data from the CSV file. It reads a total of 76 records from the specified CSV file.
- Note: You can use any directory for the CSV file however your SQL service account should have permissions to access the files
In the above screenshot, we get the data from the CSV, but it does not display any column names. Our CSV file has the column names. To display it with R and SQL Server, you can use WITH RESULT SETS and define columns, their t-SQL data types. In the below query, we define two columns [Winter], [Total] and their data types varchar(100) and float respectively.
1 2 3 4 5 6 7 8 |
EXECUTE sp_execute_external_script @language = N'R', @script = N' library(rio) CSV <- import("C://Program Files//Microsoft SQL Server//MSSQL15.INST1//R_SERVICES//bin//WinterSnowfalls.csv") OutputDataSet <- CSV ' WITH RESULT SETS((Winter varchar(100), Total float)) |
In the query output, the T-SQL script returns the column names as well.
You can use the print() function in R script. It displays the column names, their values in the message tab of SSMS.
Convert CSVs into Excel files using R and SQL Server
Run the following R scripts in SQL Server to convert the CSV into an excel file.
> CSV <- import(“WinterSnowfalls.csv”)
> OutputDataSet <- export(CSV,”WinterSnowfalls.xlsx”)
Run the script in the R console client from the bin directory.
It converts the file into an excel file and saves it into the source file directory.
To execute the R script from the SQL Server, embed it into the sp_execute_external_script stored procedure. Here, you need to specify the path to save the excel file.
1 2 3 4 5 6 7 |
EXECUTE sp_execute_external_script @language = N'R', @script = N' library(rio) CSV <- import("C://Program Files//Microsoft SQL Server//MSSQL15.INST1//R_SERVICES//bin//WinterSnowfalls.csv") OutputDataSet <- export(CSV,"C://temp//WinterSnowfalls.xlsx") ' |
Import an excel file using R and SQL Server
The Rio package uses the same import() function for the data import from an excel file. It can read both XLS and XLSX format files.
In the previous example, we converted the CSV into Excel using R scripts. Let’s try to read the excel file, and it should give output similar to the CSV data because we haven’t modified the contents.
1 2 3 4 5 6 7 |
EXECUTE sp_execute_external_script @language = N'R', @script = N' library(rio) Excel <- import("C://Temp//WinterSnowfalls.xlsx") OutputDataSet <-Excel ' |
As we can see below, it returned 76 rows from the excel file. You can compare the output of CSV and Excel files as well.
Similar to the example shown for CSV, you can use the WITH RESULT SET or Print() function to display the column names as well.
Import JSON files using R and SQL Server
As highlighted earlier, the rio module can work with various file formats. Java Script Notation (JSON) is a popular format for storing log data. It is also used widely to manage the cloud infrastructure as well.
To generate the JSON data for this article, we use Azure Data Studio. It has an integrated functionality to save the T-SQL output in a JSON format.
It converts the query output in JSON format, as shown below.
Use import() function similar to CSV, Excel examples and it returns the JSON data in the output of sp_execute_external_script.
1 2 3 4 5 6 7 |
EXECUTE sp_execute_external_script @language = N'R', @script = N' library(rio) JSON <- import("C://Temp//AdventureWorks.json") OutputDataSet <-JSON ' |
You can export the JSON into the CSV, Excel as per your requirements. In the below R script, we export the file and save it as AdventureWorks.xlsx
ToJSON <- import(“AdventureWorks.JSON”)
OutputDataSet <- export(ToJSON,”Adventureworks.xlsx”)
You can validate the JSON data converted into Excel using the import() function. Here, I specified the varchar(100) for all columns for demonstration purposes. You should use the appropriate data types for every column.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 |
EXECUTE sp_execute_external_script @language = N'R', @script = N' library(rio) Excel <- import("C://Temp//AdventureWorks.xlsx") OutputDataSet <- Excel ' WITH RESULT SETS(( [BusinessEntityID] varchar(100) ,[Title] varchar(100) ,[FirstName] varchar(100) ,[MiddleName] varchar(100) ,[LastName] varchar(100) ,[Suffix] varchar(100) ,[JobTitle] varchar(100) ,[PhoneNumber] varchar(100) ,[PhoneNumberType] varchar(100) ,[EmailAddress]varchar(100) ,[EmailPromotion] varchar(100) ,[AddressLine1] varchar(100) ,[AddressLine2] varchar(100) ,[City] varchar(100) ,[StateProvinceName] varchar(100) ,[PostalCode] varchar(100) ,[CountryRegionName]varchar(100) ,[AdditionalContactInfo] varchar(100) )) |
Import data from a compressed file using R and SQL Server
Usually, to read data from a compressed file, we have to extract it and then read data. Now, using the Rio package in the R and SQL server, we can read data directly from a compressed file.
To prepare the data for this demonstration, right-click on the AdventureWorks.xlsx and navigate Send to-> Compressed (zipped) folder.
You can use third-party applications such as WinZip, G-Zip to prepare the compressed file. In the below screenshot, we have a compressed (zip) file. We do not see much compressed because our source file size is small.
In the below R script, note that we provide input of a compressed file (C://Temp//AdventureWorks.zip)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 |
EXECUTE sp_execute_external_script @language = N'R', @script = N' library(rio) Excel <- import("C://Temp//AdventureWorks.zip") OutputDataSet <- Excel ' WITH RESULT SETS(( [BusinessEntityID] varchar(100) ,[Title] varchar(100) ,[FirstName] varchar(100) ,[MiddleName] varchar(100) ,[LastName] varchar(100) ,[Suffix] varchar(100) ,[JobTitle] varchar(100) ,[PhoneNumber] varchar(100) ,[PhoneNumberType] varchar(100) ,[EmailAddress]varchar(100) ,[EmailPromotion] varchar(100) ,[AddressLine1] varchar(100) ,[AddressLine2] varchar(100) ,[City] varchar(100) ,[StateProvinceName] varchar(100) ,[PostalCode] varchar(100) ,[CountryRegionName]varchar(100) ,[AdditionalContactInfo] varchar(100) )) |
It returns a similar result of data import directly from an excel file.
Import data from a CSV to SQL Server tables using RIO package
In the earlier examples, we imported data from a CSV, Excel or compressed file but did not store into SQL Server tables.
Before we import data directly into SQL Server tables, create the table with appropriate data types and columns. For example, for the above example data, we create a SQL table with the following script.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 |
Create table dataimportR ( [BusinessEntityID] varchar(100) ,[Title] varchar(100) ,[FirstName] varchar(100) ,[MiddleName] varchar(100) ,[LastName] varchar(100) ,[Suffix] varchar(100) ,[JobTitle] varchar(100) ,[PhoneNumber] varchar(100) ,[PhoneNumberType] varchar(100) ,[EmailAddress]varchar(100) ,[EmailPromotion] varchar(100) ,[AddressLine1] varchar(100) ,[AddressLine2] varchar(100) ,[City] varchar(100) ,[StateProvinceName] varchar(100) ,[PostalCode] varchar(100) ,[CountryRegionName]varchar(100) ,[AdditionalContactInfo] varchar(100) ) |
Now, we use Insert Into statement before executing the sp_execute_external_script stored procedure in the R script and SQL Server.
1 2 3 4 5 6 7 8 9 |
Insert into AdventureWorks2019.dbo.dataimportR EXECUTE sp_execute_external_script @language = N'R', @script = N' library(rio) Excel <- import("C://Temp//AdventureWorks.zip") OutputDataSet <- Excel ' Select * from AdventureWorks2019.dbo.dataimportR |
It imports data directly into the specified SQL table [DataImportR]. You can use the Select statement to verify the data.
Conclusion
In this article, we explored the useful Rio package using R and SQL Server. It can read data from various file formats using the import() function. Further, we can export or convert data into another format. You can directly read a compressed file and import it into the SQL Server tables.
It is an exciting feature of the machine learning language in SQL Server. I would recommend you explore it.
- How to install PostgreSQL on Ubuntu - July 13, 2023
- How to use the CROSSTAB function in PostgreSQL - February 17, 2023
- Learn the PostgreSQL COALESCE command - January 19, 2023