In this article, we will learn how to use Python to source data in Power BI using the python programming language.
Python is arguably one of the most popular and pervasive programming languages in the industry. On one hand, python can be used for web and application development, on the other hand, it is very popular in the data world as well especially for data science and data engineering. The pervasiveness of python has led to its adoption by many market-leading tools, software, frameworks and SDKs. For example, in the Microsoft technology stack, tools like Microsoft Visual Studio, Microsoft Power BI, Microsoft SQL Server and other such popular tools and technologies have very strong and integrated support for Python. Python has a unique ability to process data, being used for programming, frameworks like skikitlearn for data science, as well as rich libraries for rendering visualizations. Power BI is Microsoft’s primary tool and technology stack for reporting and dashboarding on-premises as well as on the Azure cloud. It supports rendering visualizations using Python as well as importing data using Python as well. PowerBI is equipped with more than a hundred connectors that support almost all the standard, established and emerging data sources. Data professionals use Python for a variety of data processing and data consumption related use-cases, and there is a large potential of reusing their existing codebase to source and process the data and use it in tools like PowerBI.
Python Installation and Setup
As we are going to use Python with Microsoft Power BI, we will need a setup of Python installed on the local machine. Python can be installed directly by downloading it on a local machine from Python.org. This is one of the most straight-forward ways of configuring a python setup on the local machine. Another way of installing python is configuring it as a part of Microsoft SQL Server in case you happen to have a SQL Server setup on a local machine. Installing SQL Server to install Python would not be an efficient approach. But the other way round can be one of the ways to set up Python. The limitation of using the Python Server component of SQL Server is that there can be some limitations or constraints as there are some wrappers that SQL Server may install on its python installation. Also, the distribution of Python that SQL Server installs is Anaconda distribution, and one may want a specific edition of Python, in that case, one should consider downloading the desired edition of Python and installing it independently on the local machine. Going forward, in this article, it is assumed that an installation of Python is already configured on the local machine.
Before we proceed with the actual exercise, let’s look at some basic but important locations of Python setup that one may need to use for administering Python specially to install packages i.e., libraries. The home directory of a Python installation would look as shown below.
The directory that hosts the binaries to build and install packages using the pip tool is the Scripts directory. Using the pip command on the command prompt, one can easily install libraries in the python installation. The actual libraries are installed in the Lib directory. One can either use the pip command to install the library or download an already built library and copy it in the Lib directory. To use python with Power BI, three libraries are mandatorily required to be installed – os, pandas and matplotlib. Generally, OS and Pandas (with Numpy) are the libraries that come pre-installed with most editions of Python. But one would have to install matplotlib library using the command “pip install matplotlib”. Once this library is successfully installed then the python setup can be considered ready to be used with Power BI. To build libraries, the pip tool in Python makes use of Visual C++ 14.0 or later, on the Microsoft Windows platform. Ensure that you have the corresponding redistributable installed on the local machine.
Assuming that all the required libraries are installed on the local machine and the latest version of PowerBI is installed as well on the local machine, open PowerBI. Navigate to the File menu -> Options and Settings -> Options -> Python scripting menu item. We need to configure this section to point PowerBI to the home directory of Python, so that when we use Python script to source datasets with or without pre-processing, it will use the configured Python setup for executing these scripts. Generally, PowerBI will detect the home directory of Python installation if the system variable PATH has been set. In any case, if it does not detect the home directory of Python, we can manually point PowerBI to the home directory of Python installation.
Once the Python setup has been configured, we can navigate back to the designer window. Click on the Get Data menu item and select Python scripting connector. It will open a new window to submit python scripts for execution. Only data frames would be parsed as tables and considered for sourcing in the Power BI report. As PowerBI already supports a wide variety of data sources, we can use direct connectors for sourcing data from such sources. In data science, it is a very common practice to scrape data from the web from sources like GitHub or other similar sources for exploratory data analysis. This is something that may need scripting for reading and formatting such data before sourcing it. So, we would be considering a similar example for use with our Python connector in Power BI.
Key in the python script as shown below. In this script, in the first line, we are importing the pandas python library that is generally used to read data and populate it in tabular data structures known as data frames. In the second line of the script, we are creating a variable named URL and assigning it the value of a CSV file hosted on GitHub. In the last line of the python script, we are using the pandas object and executing the read_csv file and passing it the URL variable as a parameter. This would result in data being read from this file and the resultant output being assigned to the variable df. As the output of the read_csv function is a data frame, the type of the variable df would be a data frame, which is exactly what we need to detect the output as a table for importing in our Power BI report.
Click on the Ok button to proceed to the next step. In the next step, Power BI would execute this script using the Python installation that we configured earlier, and then show the output of data for preview as shown below. We can choose to format the data in the script as well as transform the data beyond this stage in PowerBI as well before importing the final dataset in the PowerBI report.
Click on the Load button to populate the Power BI report with the data sourced using the Python script. In this way, we can use Python scripts in the Python Scripting connector in PowerBI to source data using a Python installation.
In this article, we learned how to download, install, and configure Python on a local machine. We explored the tools and utilities in python that allow us to install the required packages which are typically used for processing data. We configured PowerBI to point to the python installation and then used the Python connector for executing scripts that allow us to source data in the Power BI report.
- Introduction to basic psql commands - September 30, 2022
- Integrating Azure Data Explorer cluster with Azure Synapse - September 19, 2022
- Ingesting data into Azure Synapse Data Explorer Pool - September 16, 2022