Ben Richardson
Selecting the extract function via the add column menu

Extract column information using the Extract function in Power BI

December 3, 2020 by

In this article, you will see how to use the Extract function in Power BI to extract information from columns in a Power BI dataset.

For those of you who are not familiar with Power BI, it is Microsoft’s data analysis and BI tool. It is available under a number of different licenses (see this Power BI article for full details licensing) but in this article, we use the free Power BI Desktop which is available free.

While Microsoft’s Power BI comes with a lot of options for data analysis, one of the most exciting options is its ability to extract specific information from within a column. For example, with Power BI you can extract year, months, days, hours etc. from a complete date. You can also create new categories using a range of data.

In this article, we will look at a number of examples of using the extract function in Power BI.

Importing the Dataset

As Power BI is a data analytics tool, it goes without saying that you need to import data into Power BI before it can be analyzed. To demonstrate the use of the extract function in Power BI, we will use a freely available dataset on Boston crime that contains information about incidents that took place in Boston. The dataset is freely available at this Kaggle link. Download the dataset. You will be using the “crime.csv” file from the downloaded dataset. Follow these steps to import the dataset into Power BI.

Open the Power BI desktop application. From the dashboard click the “Get Data” option (in the top menu) and then select the “Text/CSV” option from the list of options that appears asking you to define what type of dataset you are working with. Finally, click “Connect”.

Dropdown menu when importaing data in power bi

The dataset will now be imported into Power BI (see screenshot below).

Now you have two options:

  1. You can directly import the data to reports view by click the “Transform Data” button, or
  2. You can import the data to the “Query Editor” by clicking the “Load” button

The extract function in Power BI is available in the Query editor, so please click the “Load” button to import the data.

Importing data in Power BI

Accessing Extract Function in Power BI

In the query editor, you have two options when using the Extract function. You can access the “Extract’ function from the “Add Column” menu as shown below.

Accessing the extract function in the Add Column menu

The other way to select the “Extract” function is via the “Transform” which is in the top menu as shown below.

Accessing Extract in the Transform menu

There is one main difference between the extract function in the “Add Column” menu and the extract function in the “Transform” menu. When you apply the “Extract” function via the “Add Column” menu, a new column is created and your extracted content appears in that new column. When you apply the extract function in Power bi via the Transform menu, no new column will be created. Instead, the extracted text will replace the old text in the same column.

Let’s look at both of these in more detail.

Extracting Content via the Add Column Menu

Let’s first see a couple of examples of extracting content using the extract function in Power BI from the “Add Column” menu.

Extracting Text

If you look at your dataset, you can see that it contains a “UCR_PART” column. This contains text like “Part One”, “Part Two” and “Part Three”. Let’s extract the number in the form of text i.e. “One”, “Two” and “Three” from the “UCR_PART” column.

To do this, click the header of the “UCR_PART” column, go to the “Add Column” option from the top menu, and then select the “Extract” function. From the dropdown list, select “Text After Delimiter”.

Selecting the extract function via the add column menu

We want to extract the text that exists after a single space. In the dialogue box that has appeared (shown below), enter a single space and then click “OK”.

Selecting text that appears after a single space

You will now see that a new column has been inserted after the last column called “Text After Delimiter”. You can see that the column only contains the numbers in the form of text i.e. “One”, “Two”, “Three”.

New column after running the extract function

You can rename the column name by clicking the column header and typing a new name. In the following screenshot, I’ve changed the column name to “UCR_PART_INTEGER”.

The extracted data in a renamed column

Next, let’s try changing the text numbers into digits. You can do this by clicking “Replace Values” from the “Transform” menu as shown.

Selecting replace values from the transform menu

In the replace values dialogue box, replace the text “One” with the digit ‘1’. Repeat the same step to replace “Two” with 2, “Three” with 3 and so on.

Specifying the values to be replaced and their replacements.

If you look at the newly created “UCR_PART_INTEGER” column you will see that it contains “ABC” before the column header name, which shows that it is a text column. However, the “UCR_PART_INTEGER” column now contains integers. To convert it to an integer type, click the “Detect Data Type” option from the “Transform” menu and you will see that the column type is changed to numbers as shown below:

Using detect data type to ensure column type is correct.

Extracting the Day from a Date

With the extract function in Power BI, you can also extract information such as the year, month, day, or time from a date-time column.

We will extract the day of the month from the “OCCURRED_ON_DATE” column. If you look at the column, the day of a month is located inside two forward slashes. To extract the day, you use the “Text Between Delimiters” option of the extract function in Power BI.

Selecting text between delimiters in the extract function.

Enter a forward slash “/” in both the “Start delimiter” and “End delimiter” fields as depicted by the following figure and click the “OK” button. This selects what is between the first two forward slashes in the field.

Configuring the Text Between Delimiters tool correctly.

You will see a new column containing the day of the month now appear at the end of the list of columns. Rename the column to “Day of Month”.

Renaming the new column which contains the day of the month data

Click the column header and drag it next to the “MONTH” column. Look at the following screenshot for reference. Here you can see that Day of Month column contains the day value as shown in the “OCCURRED_ON_DATE” column.

Repositioning the new column to ensure that the extraction has worked correctly.

Extracting the Time from a Date

Like the day, you can also extract the time from the “OCCURRED_ON_DATE” column. You can see that there is a single space between the date and time information. This means that again you can use the “Text After Delimiter” option from the extract function in Power BI to extract the time.

As we did previously add a single space in the “Text After Delimiter” field and click “OK”. Again a new column “Text After Delimiter” will be created. You should rename it “Time”.

The type of the newly created time column is “Text” at the moment. Again click the “Detect Data Type” option from the “Transform” menu, the “Time” column will be converted into a date-time column. This is shown by the clock icon which will now be visible before the column name in the “Time” column. Look at the following screenshot for reference.

The "Time" column showing the click icon to denote that it is a time date column.

Extracting Content via Transform Menu

As mentioned in the introduction, you can also use the Transform menu to extract text from a column. However, no new column will be created. Instead, the extracted text will replace the text in the selected column. Let’s look at an example.

Let’s transform the DISTRICT column by extracting only the numbers from the DISTRICT column. If you look at the DISTRICT column it contains the alphabet A, B, C, D or E, followed by a single- or two-digit number.

Let’s extract the last two characters from this column. Select “Transform” from the top menu and then click “Extract”. From the dropdown list, select “Last Characters

Choosing 'Last Characters' under the extract menu.

In the “Extract Last Characters” dialogue box, enter 2 and then click OK.

Configuring the 'Extract Last Characters' tool.

You will see that the DISTRICT column will be updated to now contain only the last two characters from the original values.

Updated data having run the extract function in power bi.

Since we want the DISTRICT column to contain numbers only, we will need to remove the letters by replacing them with empty spaces. To do this select the “Replace Values” option from the “Transform” menu.

Configuring the 'Replace Values' tool

You will see that the DISTRICT column now contains only digits.

Updated district data having removed the letters.

Finally, remember to change the type of the DISTRICT column from text to numeric by using the “Detect Data Type” option from the “Transform” menu.

Conclusion

This article has explained how to use Extract function in Power BI. We’ve looked at examples of using it to both insert new columns to contain the output, thus preserving the data we have run it on, and also using it to replace the data that we have run it on.

Ben Richardson
880 Views