Hadi Fadlallah
passing parameters to python from SQL Server using xp_cmdshell

Different approaches to execute Python in SQL Server

May 10, 2021 by

This article briefly explains how to execute scripts written in Python in SQL Server and how Machine Learning services facilitate this process.

Introduction

Python is an interpreted programming language developed as a successor of an old programming language called ABC. It was first released in 1991 by a dutch developer called Guido van Rossum. In the past few years, Python became one of the most popular and efficient languages used in data science and machine learning. There are plenty of powerful libraries widely used to perform data manipulation, cleansing, analytics, and other operations.

Before SQL Server 2017, there was no straightforward solution to execute scripts written in Python in SQL Server. In SQL Server 2017, a new feature was released, allowing developers to execute Python scripts within SQL code.

In the next section, we will explain how to execute scripts written in Python in SQL Server without the Machine Learning Services (before SQL Server 2017).

Executing Python scripts before SQL Server 2017

In general, to run a Python script, we should use a Python interpreter. The Python interpreter is called using the command-line shell. As an example, the following command executes a Python script stored within a file called “HelloWorld.py”:

Since the Python interpreter is called using the command-line shell, it can execute any Python command from the SQL Server using the xp_cmdshell built-in extended procedure.

To learn more on how to get started with this extended procedure, you can refer to the following article previously published on SQL Shack: How to use the xp_cmdshell extended procedure.

    Note: We are running this example on a machine where Python 3.9 is installed and environment variables are configured. Also, xp_cmdshell is enabled on SQL Server

As an example, we created a python script named HelloWorld.py on the following path: “E:\HelloWorld.py”. This script contains only the following line:

Then, we created a batch file, “E:\ExecutePythonScript.bat”. This file contains the following command:

As mentioned in the official documentation, the xp_cmdshell spawns a Windows command shell and passes in a string for execution. Any output is returned as rows of text. So, if we need to store the output of Python in SQL Server, we should create a table having an NVARCHAR with 255 or higher length column like the following:

The xp_cmdshell output will show each line of code executed in the batch file with the output generated.

Executing python in SQL Server using xp_cmdshell

Figure 1 – Executing Python script from SQL Server code

The next step is to pass a parameter from SQL Server into the Python script. We edited E:\HelloWorld.py file as follows:

And we edited the “E:\ExecutePythonScript.bat” batch file to read one argument and to pass it to the python script as follows:

Now let’s try to run the following SQL script:

This script selects the last modified person’s name in the AdventureWorks person tables and defines it as a parameter in the command line. As shown in the image below, the values are correctly passed from SQL Server into the Python script.

passing parameters to python from SQL Server using xp_cmdshell

Figure 2 – Passing a parameter from SQL Server into the python script

One main limitation of using this approach to execute a script written in Python in SQL Server is that we can only pass a single value as a parameter to the script. Simultaneously, sometimes we need to execute a Python script over a resultset of an SQL command.

Using Machine Learning services to execute Python scripts

As shown in the previous section, execute scripts written in Python in SQL Server is complex and has some limitations. In SQL Server 2016, Microsoft introduced R Services, which allows users to execute external scripts written in R from SQL Server. In SQL Server 2017, this service was renamed Machine Learning Services, and Microsoft added support to Python.

In this section, we will not explain how to install machine learning services in SQL Server. There are plenty of articles describing how to install machine learning services in online and offline modes. We will briefly describe how SQL Server executes the Python scripts and the main advantage over the previous approach.

A built-in stored procedure called sp_execute_external_script was added to SQL Server, giving the ability to execute external scripts. This procedure is enabled using the following command:

When installing the Python external script execution support, a python interpreter with a bunch of data science python libraries is installed within the SQL Server installation. This Interpreter is using to execute the scripts written in Python in SQL Server. The main advantage of this feature is that users now have the ability to pass a SELECT statement result as an input to a python script. Manipulate this result using Python and give a result back to the SQL code. The SQL resultset will be converted into a Pandas data frame when passed to the python script.

For example, assume that we need to get all persons listed in the AdventureWorks person table, and we need to remove all rows containing a NULL value in any columns using the Pandas.datafram.dropna() function. We can simply use the following script:

Executing python in SQL Server using sp_execute_external_script

Figure 3 – Executing a python script using sp_execute_external_script

If interested to learn more about executing scripts written in Python in SQL Server, you can refer to the following articles published previously on SQL Shack:

Also, you can refer to the Microsoft official website, SQL Server machine learning tutorials.

Conclusion

This article explained two different approaches briefly to execute scripts written in Python in SQL Server and provided several examples and external resources.

Hadi Fadlallah
Python, SQL Server 2017

About Hadi Fadlallah

Hadi is an SQL Server professional with more than 10 years of experience. His main expertise is in data integration. He's one of the top ETL and SQL Server Integration Services contributors at Stackoverflow.com . Also, he published several article series about Biml, SSIS features, Search engines, Hadoop, and many other technologies. Besides working with SQL Server, he worked with different data technologies such as NoSQL databases, Hadoop, Apache Spark. He is a Neo4j and ArangoDB certified professional. On the academic level, Hadi holds two master's degrees in computer science and business computing. Currently, he is a Ph.D. candidate in data science focusing on Big Data quality assessment techniques. Hadi really enjoys learning new things everyday and sharing his knowledge. You can reach him on his personal website. View all posts by Hadi Fadlallah

288 Views