Prashanth Jayaram

How to use Python in SQL Server 2017 to obtain advanced data analytics

June 20, 2017 by

On the 19th of April 2017, Microsoft held an online conference called Microsoft Data Amp to showcase how Microsoft’s latest innovations put data, analytics and artificial intelligence at the heart of business transformation. Microsoft has, over the last few years, made great strides in accelerating the pace of innovation to enable businesses to meet the demands of a dynamic marketplace and harness the incredible power of data—more securely and faster than ever before.

After the conference, there were a few questions some of us had, though. Is Microsoft SQL Server 2017 emerging as an enterprise solution for data science? Does it provide the required capabilities—is the engine capable of handling huge data? It seems the answer is “Yes”, as starting with the CTP 2.0 release of SQL Server 2017, Microsoft has brought Python-based intelligence to data in SQL Server.

Python has gathered a lot of interest recently as a language of choice for data analysis. This language has the right set of libraries for data analysis and predictive modeling, not to mention a simpler learning curve.

The growing trends of data science and modeling predict a massive growth in data in the upcoming years. The propulsion towards innovation and adaptation to leading trends in the data technology might intrigue us enough to make us take a look at the current release of SQL Server 2017.

Data science is a combination of Data Mining, Machine Learning, Analytics and Big Data. The integration of SQL 2016 with data science language, R, into database the engine provides an interface that can efficiently run models and generate predictions using SQL R services. Python builds on the foundation laid for R Services in SQL Server 2016, and extends that mechanism to include Python support for in-database analytics and machine learning.

Moreover, the R–Python integration in SQL Server, apart from emphasizing productivity and code readability, also can take advantages of parallel query processing, security and better resource governance.

Now, Microsoft has renamed R Services as Machine Learning Services, bringing R and Python under the umbrella. The renamed Microsoft Machine Learning Services component enables Python to run directly on the database server, or along with embedded T-SQL scripts.

Developers have used stored procedure called sp_execute_external_script to run R code, whose first parameter is @language. Microsoft had designed this stored procedure to be open-ended.

The engine inherits the features of R, in order to adopt Python. To run Python code in SQL Server, we have to install SQL Server 2017 CTP 2.0, with the Machine Learning Services with Python feature. It’s noteworthy that other versions of SQL Server do not support Python integration.

The article explores

  • The installation of the database engine and the configuration of Machine Learning Services 
  • Configuration of Python
  • Configuration of the instance to allow execution of scripts that use an external executable
  • Data-type support
  • Samples to understand the use of Python in SQL

Installation

  • Download SQL Server 2017 CTP 2.0. During Feature Selection at the time of installation, Python is listed as a part of Machine Learning Services.
  • Run the setup wizard for SQL Server 2017.
  • On the Installation tab, click New SQL Server stand-alone installation or add features to an existing installation.
  • On the Feature Selection page, select both of the options as you see in the screenshot.
    • Database Engine Services: To use Python with SQL Server, you must install an instance of the database engine.
    • Machine Services (In-Database): This option installs the database services that support Python script execution.
  • Python: Check this option to get the Python 3.5 executable and select libraries from the Anaconda distribution

On the below page, Consent to Install Python, click Accept.

On the Ready to Install page, verify that selected components are included

After a successful installation, the instance is ready for enabling an external script execution parameter.

To enable SQL Instance to run Python scripts:

  • Open the SQL Server Management Studio.
  • Connect to the instance where Machine Learning Services is installed, and run the following command:

    sp_configure

  • To enable the external scripting feature that supports Python, run the following statement.

    EXEC sp_configure ‘external scripts enabled’, 1
    RECONFIGURE WITH OVERRIDE

  • Restart the SQL Instance.

sp_execute_external_script

The sp_execute_external_script is a system stored procedure that executes with a provided R/Python script as an argument. To enable normal functioning of this external stored procedure, you must have administrator access to your SQL Server instance, so that you can run the sp_configure command. The procedure will invoke the launchpad service to the respective library for its execution.


Parameter Purpose
@language = N’Python’ Scripting language parameter, In this case it’s Python
@script = N’ ‘ Python script body
@input_data_1 = N’ T-SQL Statement’ The T-SQL statement reads data from SQL Table
@output_data_1_name = N’ Data Frame Name’ Holds the data frame generated inside the Python Script.
WITH RESULT SETS ((Col1 DataType,Col2 DataType )) Specifies the Output column and Datatype of the dataframe columns. This is optional.

Execute Python Code in SQL Server

Microsoft has made it possible to embed Python code directly in SQL Server databases by including the code as a T-SQL stored procedure.

Datatype Support

Python supports a limited number of data types in comparison to SQL Server. As a result, whenever you use data from SQL Server in Python scripts, the data might be implicitly converted to a type compatible with Python. However, often an exact conversion cannot be performed automatically, and an error is returned. This table lists the implicit conversions that are provided. Other data types are not supported.

SQLTYPE PYTHON TYPE
bigint numeric
binary raw
bit bool
char str
float float64
int int32
nchar str
nvarchar str
nvarchar(max) str
real float32
smallint int16
tinyint uint8
varbinary bytes
varbinary(max) bytes
varchar(n) str
varchar(max) str

Examples

Print the input value

Find Mean of a given list


Format operator


Using loops and branches


Passing a table as input and generating a computed column called bonus

  • Create the EMP table
  • Inserting dummy values
  • Execute the Python script to generate the compute column

Read content from a text file using the DictReader function.


The following sample shows how to import the Pandas library to get access to dataframes and do the following tasks

  • Read CSV file using Pandas library
  • Find the number of Rows using Pandas object pyo
  • Fetch the first row of the csv file
  • Calculate the mean of every statistical column

Find the mean over the statistical column

The data is fed from the table and the computation happens using the Pandas library


Summary

This article covered how SQL Server 2017 introduces support for data analytics, and the use of Python in addition to R scripts. I’ve detailed the installation procedures, configuration details and execution of a sample Python SQL script.

The close integration of R/Python to SQL Server machine eliminates unnecessary movement of data across machines; think of moving millions/billions of rows to the client for modeling or scoring over the network—it is cumbersome, and a tedious job. This is one of the reasons why data scientists rely on sampling (Test set v/s Train Set). It’s a useful approach, especially where there are issues of data sovereignty and compliance. Your code runs inside the SQL Server security boundaries, triggered by a single call from the T-SQL stored procedures.

References


Prashanth Jayaram

Prashanth Jayaram

I’m a Database technologist having 11+ years of rich, hands-on experience on Database technologies. I am Microsoft Certified Professional and backed with a Degree in Master of Computer Application.

My specialty lies in designing & implementing High availability solutions and cross-platform DB Migration. The technologies currently working on are SQL Server, PowerShell, Oracle and MongoDB.

View all posts by Prashanth Jayaram
Prashanth Jayaram
2,655 Views