Rajendra Gupta
Python SQL Scripts example

Python scripts for pivot tables in SQL Server

March 13, 2020 by

This article talks about Python scripts for creating pivot tables in multiple ways.

Introduction

You might use PIVOT tables in Microsoft Excel for data analytics, preparing reports. It helps us to extract meaningful information from a large data set. We can transpose row into column along with aggregations on it.

Python SQL Scripts example

We can generate PIVOT data in different ways in SQL Server.

We can use Recursive CTE, While loop, SQL Concatenation, FOR XML PATH or CLR

Python is an interactive and easy to use programming language. We explored many use cases of Python scripts on SQLShack. This article explores the use of Python for Pivot data.

Python Scripts and Pivot

We will explore a few methods that you can use in Python for a Pivot table. For the demonstration, create the following table and insert data into it.

Open the following attachment to insert data into this table.

Method 1: Create a Pivot using data stored in the SQL table

For this demonstration, I am using data stored in the [SalesData] table of the SQLShackDemo database.

Sample data

We want a PIVOT table as highlighted in the following screenshot. Here, we require Quantity, SubTotal and UnitPrice for pivot column ProductName.

PIVOT table

We run Python scripts in SQL Server using sp_execute_external_script stored procedure.

Pre-requisites for running Python scripts in SQL Server

  • You should be on SQL Server 2017 or SQL Server 2019 for this article
  • Install Machine Services (In-Database) – Python
  • Enable external scripts execution using sp_configure command

For detailed instructions, refer to this article How to use Python in SQL Server 2017 to obtain advanced data analytics.

Once we have prepared the environment, execute the following Python code, and you get the output per our requirement specified above.

Let’s understand the Python script.

  • In the first block, we use sp_execute_external_script stored procedure and define Python language for the script

    We import the Pandas and Numpy python module in the script. These modules provide useful functions for Pivot, data sort, aggregations.

    sp_execute_external_script stored procedure

  • In the next part, we define a data frame for the input data set. We have a pivot_table Python function for creating a pivot table from input data

    In data.pivot_table, we define indexes and their value column. Here, we define [ProductName] as index column and [UnitPrice],[Quantity], [SubTotal] as data value columns. Data frame OutputDataSet captures this data. We can display the output using a PRINT function

    pivot_table Python function

  • We specify the SQL query to retrieve source data from [SalesData] table in the @input_data_1 argument

    Input data

Similarly, let’s look at another example of the following source data.

Source data

We want to generate pivot data in two formats.

  • Total Sales by each employee
  • Total Sales by each employee in a sales year

Execute the following Python script in SSMS.

You get the following PIVOT table in the output and it satisfies our first requirement.

PIVOT table in the output

For the second requirement, let’s add a new row for Raj and now we have two entries for Raj in the year 2019.

Add a new row

Now, for this requirement, we add [Year] column as well in the index columns for Pivot.

In the output, we get Pivot for employees along with yearly detail. We can note that for Raj it combined both entries of year 2019 and give a sum of sales in pivot output.

Pivot for employees data

Method 2: Read data from a CSV file and prepare a PIVOT data using Python scripts in SQL Server

In the previous examples, our source data was in SQL tables. Python can read CSV, Excel files as well using pandas’ modules.

We can store the CSV file locally in a directory, or it can read directly from a Web URL. In this demo, we use a CSV file stored on a Web URL.

In the following code, we use pd.read_csv function and input a Web URL as source data. Python directly takes data from this URL, but you should have an active internet connection for it.

Later, we define index columns (Team, Salary) and values (college) for it. We also use aggregate function np.sum on this.

Execute this Python script and view pivot data.

Read data from a CSV file

In the output, we can see a few records show zero in the college column. We do not have a college name in the CSV file for those records, and in PIVOT, it shows zero for the NULL or blank cells.

Suppose we do not want those records in the pivot table. Let’s drop these records having blank cells in CSV using the dropna() Python function. We use argument inplace=true to make changes in the data frame itself.

The complete code after adding the dropna() function is below.

Now, execute it and notice the change in pivot tables. It eliminated records with NULL values in the output as shown in the following image.

Eliminate NULL values

Method 3: PIVOT tables using groupby and lambda function in Python scripts

We can use groupby and lambda functions as well in the Python scripts for Pivot tables. For this example, I have a data set of a few states of India and their cities in a SQL table.

PIVOT tables using groupby

We need a pivot table from this data. In the output, it should list all cities for a state in a column; it should use || as a city name separator.

We use the following function for our scripts.

  • Groupby(): We use groupby() function in Python pandas data frame to split the data into the group as per defined criteria. It is a similar function as of SQL GROUP BY function. In the query, we specify the groupby() function on the state column
  • Lambda(): We can use a lambda function to construct anonymous functions in Python. We define expressions in this function
  • Sorted(): It sorts the results in an ascending or descending order. It is similar to an ORDER BY clause in SQL Server
  • Join(): The Join() function creates a concatenated string. We require it to concatenate || in the result set

Execute the following Python script to get the desired pivot data.

View Output Python query

Conclusion

In this article, we explored Python scripts for creating PIVOT tables similar to Microsoft Excel. You should explore Python as it is a popular, versatile, and useful programming language.

Rajendra Gupta
371 Views