Rajendra Gupta
Problem sheet

Python scripts to format data in Microsoft Excel

March 9, 2020 by

In this article, we will use Python scripts for data formatting in Microsoft Excel sheet with various examples.

Introduction

Python is an object-oriented, high-level programming language popular among the data scientists. SQL Server also supports executing Python code from Azure data studio and SQL Server Management Studio. We have many useful publications on SQLShack for the Python, along with its use-cases, especially for DBAs and developers.

Before you go further, I would recommend you to go through the following articles for the basic understanding of it:

Usually, DBAs share export required data in an Excel sheet, do the formatting, and share it with the concerned authorities. It is a regular practice. Python integrates with Microsoft Excel very well.

This article covers the following topics:

  • Create a sample excel file using the Python
  • Import data from an excel file using Python
  • Format data in excel sheet using Python
  • Prepare excel charts using Python

Pre-requisites

  • In this article, we use SQL Notebooks of Azure Data Studio. Refer to SQL Notebooks introduction and overview for detailed information on it
  • Download the Python latest version 3.8.2 for Windows from this URL

    Download Python

  • Configure SQL Notebook for Python kernel. Refer to the article Use Python SQL scripts in SQL Notebooks of Azure Data Studio for it
  • Install XlsxWriter and xlrd Python module using pip utility

    • XlsxWriter: We use the XlsxWriter Python module to write the Microsoft Excel files in XLSX format. It provides various formatting options, charts, textboxes, Rich multi-format strings, Macros for excel files

      For the installation of XlsxWrite, use the following command in Python3 kernel of the SQL Notebook:

      It downloads the package and installs it as shown in the following screenshot:

      Download XlsxWrite package

    • Xlrd: It is also a Python library and useful to read data from the excel files. It supports both XLS and XLSX extension for reading data and formatting information from Excel files. It gives many formatting features as well for excel files

      Use the following command for installing the xlrd Python module:

      Download Xlrd package

    Work with excel files using Python scripts and libraries

    In the previous step, we installed the Python libraries XlsxWriter and xlrd using SQL Notebooks of Azure Data Studio.

    Create a basic excel file

    Let’s create a sample excel file without any formatting. Run the following code in Python:

    • Import Xlsxwrite module
    • Creates a workbook Welcome.xlsx in the C:\temp folder
    • Add a new worksheet in this workbook
    • Writes the text ‘Welcome to Python’ in the A1 column
    • Closes the workbook

    You can browse the directory and open the Welcome.xlsx excel workbook. It shows the data entered by the Python code:

    Verify excel

    Split the words into multiple columns

    Let’s modify the above code and split the words into different columns A1, B1, and C1:

    Once we execute the above code, it overwrites the existing Welcome.xlsx file. Open it, and you see words in different columns A1, B1, and C1:

    View output

    Data in multiple rows and columns along with column headers

    Suppose we want to create an excel sheet with the following data using Python scripts. It includes multiple columns and rows:

    Problem sheet

    We use the following Python scripts for preparing this kind of excel sheet:

    • We defined column names using worksheet.write()

    • We start index for rows and columns. For the first row and first column, it uses a zero indexing counter. We specified column names in the previous step, so my counter starts from row=1 and col=0
    • We write the required data in an array format in the data variable. It contains the data that we wish to display in the excel:

    • The code contains a FOR interaction loop to go through each row and column. It writes in the respective row and column for the worksheet:


    The complete Python scripts for preparing this excel is as follows:

    Bold characters using Python Scripts

    In many cases, we make column names in bold characters so that users can differentiate column with the actual data. In Python, we enable the bold property, as shown below:

    In the excel, you can view columns Name and Department in bold letters:

    Bold Characters using Python Scripts

    Modify a column width of Microsoft Excel columns

    Look at the following excel sheet. In this, text written in column B is span across multiple columns. Usually, in excel, we change the column width to display it appropriately. It does not conflict with text written in other columns:

    Change column width of Microsoft Excel columns

    We can define the row and column width for excel cells in Python as well. For this we use worksheet.set_column function along with column width. We specify column in format of [column:column].

    Look at the difference in the output. It changed the column width to the appropriate size.

    Change font color and size

    Now, let’s look at changing font color and size for the column headers.

    We define font color using font_color and font size using set_font_size variables in Python scripts. For this demonstration, let’s add red color for the column header with font size 16:

    Verify output

    Executing the code gives the following output:

    Change font color and size

    Add an underline for the column header

    We can add a column header using the cell format function set_underline(). Add the following line in the code, and it generates column headers with an underline:

    Let’s execute the following code for the column header underline:

    It gives the following output:

    Add an underline for the column header

    We have few underline options available as below:

    • 1 = Default format
    • 2 = Double underline
    • 33 = Single accounting underline
    • 34 = Double accounting underline

    We can format data for double accounting underline using the below code:

    Add an double accounting underline for the column header

    If we change the format to double underline, you get the following output:

    change the format

    Text Alignments

    In a Microsoft Excel worksheet, we can arrange a text in the left, right, center in respective columns. In the following screenshot, we applied formatting options:

    • The header should be center-aligned
    • Other rows of data color should be in blue and center-aligned

    Text Alignments

    In this case, we have two formats – one for the column header and another for the rest of the columns. In the following Python scripts, we defined cell_format and cell_format1:

    In the Cell_format, we add another line for font alignment.

    In another format, we defined font color using property font_color and set the alignment to center.

    We have two kinds of text format now. We need to apply them to appropriate columns only. For example, column headers look the same as in the previous example with a difference that it is aligned center. We use cell_format for the column headers:

    For the rest of the data, we require center alignment and font in blue color. We define another cell format for this:

    Later, we apply this new cell format for data from B2 to B5 and A1 to A5 using the worksheet.set_column function:

    We can view the complete code below and execute it to get the required results:

    Conclusion

    In this article, we explored Python scripts for data formatting in Microsoft Excel. Python is a powerful language, and we can do much work with a few lines of code. SQL Server 2017 onwards, we can execute Python code inside SQL Server. This article will help you in working with data in Python itself without doing formatting in Excel.

    Rajendra Gupta
17,911 Views