Rajendra Gupta
Execcute and monitor status of SQL Server Python SSIS package

Using Python SQL scripts for Importing Data from Compressed files

March 4, 2019 by

Using Python SQL scripts is a powerful technical combination to help developers and database administrators to do data analytics activities. Python provides many useful modules to perform data computation and processing of data efficiently. We can run Python scripts starting from SQL Server 2017. We can create the ETL solutions to extract data from various sources and insert into SQL Server.

Suppose we are getting data in a flat file in a compressed format, we can use the ETL process to import these as well. It requires files to be extracted first using the tool to extract these files such as 7z.

I have seen the developers using an external SSIS tool CozyRoc. It includes a Zip task to compress and decompress the files in various formats such as Zip, GZip, and BZip2 etc. In my previous article, we imported the compressed CSV file using the 7z compression utility. Python can also play an important role in importing data into SQL Server from the compressed files.

In this article, I am using the SQL Server 2019 CTP 2.0 Verify SQL Server version

You should install the Machine Learning Services (Python) to run the Python SQL scripts. If you do have installed it before, you can start the SQL Server installer. We need to add a feature in the existing SQL Server installation. In the Feature selection page, put a check on the Machine Learning Services and Python as shown in the following image:

SQL Server feature selection for Python

It installs the SQL Server Launchpad service to run the Machine Learning Service. In the next page, you can set the service account it.

Specify service account for the SQL Server Launchpad service

In the next page, we need to provide Consent to install the Python. Once you click on Accept, it downloads the required software and does the installation.

Give consent to install Python

You can review the SQL Server 2019 feature before installation in the summary page. You can see we are going to install the Python with the Machine Learning Services.

Verify the SQLServer set up options

In the SQL Server Configuration Manager, SQL Server service and the SQL Server Launchpad service should be running to use the Python scripts in the SQL Server.

Verify SQL Server sevice and Launchpad service for Python

Connect to the SQL Server. We need to enable parameter external scripts enabled using the sp_configure command to run the Python SQL scripts

In the below screenshot, we can see the external script enabled is configured successfully.

Verify 'external scripts enabled' Python SQL results

Once You have enabled the external scripts, you need to restart both the services. You might get the following error if the SQL Services and the Launchpad service is not started after we run the sp_configure command mentioned above.

Error while using Python SQL script in SQL Server

We can run the following to test if the Python SQL script will process correctly in SQL Server.

Verify sample script out in Python

For this example, let us prepare data using the below query in the WideWorldImporters database.

Extract the records to import in SQL Server using Python

Save the output (1111 rows) in the CSV format in a designated directory.

Save the output in a CSV format.

Now right click on this CSV file and go to 7-Zip and click Add to archive.

Add to archive

We need to create a bzip2 compressed archive file.

Create a bzip2 compressed archive file

Click Ok and it creates a compressed file Employee.csv.bzip2.

Create a compressed file

Connect to SQL Server instance and run the following Python SQL script

You get data from the compressed CSV file as a output in SSMS. SQL Server uses Python code to interact with the compressed file and extract data using Python modules.

Verify the output using Python SQL script

Let us understand this query in the Python language.

Part 1: Import Python Module: We can use Pandas module in Python to extract data from the compressed file. Python does not have a data type for the date; therefore we also need to import a module datetime.

Part 2: Define the CSV columns and compression format: In this part, we need to define the following parameters.

  1. Absolute path for the compressed file in Python function pd.read.csv- In this query, we specified the CSV file path as OutputDataSet = pd.read_csv(“C:\sqlshack\Draft articles\Data\person.csv.bz2”
  2. Define the column names from the CSV – We can define CSV column name in the names parameter similar to names = [“PersonID”, “FullName”, “PreferredName”, “SearchName”, “IsPermittedToLogon”, “ValidFrom”]
  3. Header for the CSV file – In this query, we specified CSV header in the first column using header = 0
  4. Compression format – in this query, we specified compression format using compression = “bz2”

Part 3 – Define the output table column with data type: In this section, we need to define the table columns and their data types.

Now we need to import this data into the SQL Server. We can directly use Insert statement in Python query to insert in a SQL Server table. We should have a SQL Server table with the columns similar to the CSV file columns. Let us create a table in this article using the following query.

Once we have the SQL Server table in place, we need to insert data using Python SQL script.

We get the following output that shows the number of inserted records.

Python SQL script results

Let us understand this query to insert records directly into SQL Server table.

Part 1: Insert statement: In this step, we need to specify an insert statement into our existing SQL Server table.

Part 2: Define the CSV columns and compression format: This step is precisely similar to the step we looked while viewing data in SSMS.

We do not need to specify the column and their data type Part 3- Define the output table column with data type while importing data into SQL Server tables directly.

SSIS Package to Import compressed data into SQL Server using Python SQL script

We can also use Python query in an SSIS package and import data into SQL Server tables.

Open Visual Studio 2017 and create an Integration Service Project. Specify a valid directory and solution name for this SSIS package.

Open Visual Studio 2017 and create an Integration Service Project.

It creates a solution for the PythonImport SSIS package. Drag a Data Flow task in the Control Flow area.

Drag a Data Flow task in the Control Flow area.

Double click on Data Flow Task and drag OLE DB Source in data flow task area.

Specify drag OLE DB Source

In this OLE DB Source, specify the connection to SQL Server instance. We need to run Python command in this OLE DB Source to get the required data. Select data access mode as SQL Command and in SQL Command text, paste the Python code. It is same Python SQL code that we used to display data in SSMS.

Configure the OLE DB Source with connection and script

Click Ok and add an OLE DB Destination. In this OLE DB Destination, specify SQL Server instance and SQL Server table in which we want to insert data from CSV using the Python SQL query.

Select the name of the table to insert data

Click on Mapping and verify the mapping between CSV columns and SQL Server table columns. We can change the mapping between these columns if column name differs in CSV and table.

Verify the mapping between CSV columns and table columns

Click Ok, and you can see the SSIS package in the following screenshot. I renamed both Source and Destination as below.

  • Source: Python SQL Query
  • Destination: SQL Server Table

View the SSIS package configuration

If there is any configuration error, you will get a red colour cross on the source or destination task.

We will truncate our destination SQL Server table using the following query. It ensures that table is completely empty before running the SSIS package.

Now execute this SSIS package. In the following screenshot, you can see that we inserted 1111 rows from Python SQL Query to SQL Server Table. The SSIS package execution is successful as well.

Execcute and monitor status of SQL Server Python SSIS package

Verify records in SQL Server destination table. We have 1111 rows that are equivalent to number of rows in CSV file.

Verify the table count

We can perform data transformation as well in the SSIS package based on the requirement. Python makes the process easier for us to extract data from the compressed file. We do not need to extract the compressed file first.

Conclusion:

In this article, we explored to import compressed data into SQL Server using a Python SQL command.

Table of contents

Using Python SQL scripts for Importing Data from Compressed files
Importing Data into SQL Server from Compressed Files
Rajendra Gupta
Latest posts by Rajendra Gupta (see all)
168 Views