Prashanth Jayaram

The importance of Python in SQL Server Administration

January 8, 2018 by

Some of my previous articles on Python provided insight of the basics and the usage of Python in SQL Server 2017.

This article is an effort to collect all the missing pieces and try to showcase the importance of using Python programming in SQL Server.

Many say that PowerShell has the upper hand over Python in some aspects of the database administration. I too am under the same impression as many technical enthusiasts, but with very limited knowledge, we can also see the power of Python. Perhaps, instead of pitting PowerShell and Python against each other, we can look at them as complementing technologies.

In 2016, R, the statistical computing programming language was integrated with the SQL Server version, named for the same year. The other side of the coin was missing since Python is also a leading machine learning language and even having a large user base. In SQL Server 2017, Python is integrated. Now, R and Python are under the same umbrella of the feature called machine learning services.

As Python is being a common programming language adopted by data scientists and database administrators, the ability to run Python code as T-SQL script enables the machine learning capabilities, directly when it comes to dealing with large amounts of data. First, data no longer needs to be extracted from the database before it can be processed through a client program. This provides us with significant benefits in terms of security, integrity, and compliance that arise when data is otherwise moved outside of the highly controlled environment within the database engine. Further, computations are performed on the server itself without having to first transfer the data to a client, thereby placing a large load on network traffic. This also means that you can perform computations across the entire dataset without having to take representative samples as is common when processing data on a separate machine. And because the data stays in place, you can take full advantage of the performance benefits brought by SQL Server technologies such as in-memory tables and column-store indexes. Python code is also very easy to deploy and can be written directly inside of any Transact-SQL command.

Next, SQL Server 2017 supports the installation of any Python packages that we might need so you can build on top of the extensive collection of open source capabilities that have been developed by the wider Python community. And finally, Python integration is available in every edition of SQL Server 2017, even the free-to-use Express edition. So, no matter the scale of your application, you can take advantage of Python integration.

To get you started, SQL Server 2017 includes a number of libraries from Anaconda, a very popular data science platform. In addition to that, Microsoft has created two libraries that are installed with Machine Learning Services. 

  1. Revoscalepy

    Revoscalepy is a library of functions that supports distributed computing, remote compute contexts, and high performance algorithms

  2. Microsoftml

    The Microsoftml library contains functions for machine learning algorithms including the creation of linear models, decision trees, logistic regression, neural networks, and anomaly detection.

Let’s begin

The traditional ways of analysis within SQL Server using various Microsoft components such as SQL, MDX, DAX in PowerPivot gives the flexibility to transform data. Now, the R language, another rich superset of machine learning modules for data analytics is directly integrated with SQL Server 2016. R is another language which has a large user base, along with Python. With the available modules in Python, data analysis becomes more efficient and effective.

Let me take you through a few examples to prove that the use of Python within SQL Server is an efficient way to pull data from remote servers.

  1. See how to connect to a SQL Server data source using pyodbc
  2. Execute the SQL query, in this case build the connecting string which points to remote SQL instance and execute the Dynamic Management View query sys.dm_os_waitstas
  3. Assign the SQL output to the data frames
  4. Display the top 10 rows of the SQL result set using the head function
    The head command is used to display the first ‘n’ rows of the dataframe. This is just like the top command of the SQL Server. The first column (showing 0 to 9) is the default index for the dataframe
  5. In order to convert the dataframe data into related SQL columns, the WITH RESULTS SET clause is defined at the end of the code. This gives us the flexibility to define the column and associated types of each column.
  6. WITH RESULT SETS requires that the number of columns in the result definition must be equal to the number of columns returned by the stored procedure/SQL query. In the following examples, the output of the SQL query returns four columns: servername, waiting_type, waiting_tasks_count, wait_time_ms. These match the definition of the with result sets clause.

The above SQL query output can also be returned as a SQL table using with result clause.

Let us look at an example to push the sample data into the table. The following SQL query sample yields the database internal information, printed as a string output.

The following SQL creates a demo table on the target instance. SQLShackDemoDB is the name of the target database.

The stored procedure named P_SampleDBInventory_Python is created with two connections.

  1. Load the pyodbc Python module for data connectivity
  2. Build the target connection string. It includes the destination object details such as target instance, database and table

  3. For the target connection string,

  4. Open the cursor
  5. The second connection string is built using a file. The file is the source of the input server names. The query should traverse across the lists to generate the data sets. Then the dataset is traversed to pull the column details in the destination using the target connection string
  6. Build the SQL statement using triple quotation marks. The triple quotes are used to build regular strings that can span into multiple lines
  7. The SQL statement is executed using the defined cursor
  8. Results are loaded into the destination table
  9. Call the stored procedure
  10. Verify the SQL Output


Wrapping Up

So, there you have it. While the examples provided were fairly straightforward, I hope you can see how useful Python could be for a database administrator as well, and not just a BI or Data Scientist.

The real beauty is that all this code could happily sit inside a stored procedure; something that you can’t do all that well with PowerShell.

Python integration in SQL Server 2017 gives data scientists an easy way to interact with their data directly in the database engine and it gives developers an easy way to integrate Python models into their application through simple stored procedures.

Prashanth Jayaram

About Prashanth Jayaram

I’m a Database technologist having 11+ years of rich, hands-on experience on Database technologies. I am Microsoft Certified Professional and backed with a Degree in Master of Computer Application. My specialty lies in designing & implementing High availability solutions and cross-platform DB Migration. The technologies currently working on are SQL Server, PowerShell, Oracle and MongoDB. View all posts by Prashanth Jayaram