Rajendra Gupta
Polybase - Query with predicate without pushdown

Enhanced PolyBase SQL 2019 – External tables SQL Server, Catalog view and PushDown

November 6, 2018 by

This article is part 4 of the series for SQL Server 2019 Enhanced PolyBase. Let quickly recap the previous articles.

  • Part 1: We installed SQL Server 2019 PolyBase feature along with Azure Data Studio and SQL Server 2019 preview extension to explore its features
  • Part 2: In this part, we learned to create an External table using Azure Data Studio ‘External table wizard’ for the Oracle data source
  • Part 3: We learned the useful features of External tables like joins and created an external table using t-SQL instead of the GUI mode for Oracle database in this series article

We have learned earlier that PolyBase in SQL Server 2019 Preview allows access to various data sources such as SQL Server, Oracle, MongoDB, Teradata, and ODBC based sources etc. Azure Data Studio SQL Server 2019 preview extension currently supports for SQL Server and Oracle data sources only from the External table wizard.

In this series, we will create an external table for SQL Server and explore some more features around it.

Launch Azure Data Studio and connect to the SQL Server 2019 preview instance. Right click on the database and launch ‘Create External Table’.

PolyBase - Create External Table SQL 2019

This opens up the wizard to create the external tables. Recently, I faced an issue where the wizard stuck in the ‘step 1’. Progress bar icon keeps rotating and does not show any error message or any progress.

PolyBase Create External table Error SQL 2019


After some time, we get the error message ‘Timeout expired. The timeout period elapsed prior to completion of the operation or the server is not responding.’ This is the general error message and does not point to clear error message.

PolyBase Create External table Error SQL 2019

I tried to close the wizard multiple times and launch it again but it remains the same. It does not show any error message to troubleshoot it further.

Later, during the investigation, I figured out that PolyBase services were in stopped condition.

  • SQL Server PolyBase Data Management Service
  • SQL Server PolyBase Engine

PolyBase service  SQL 2019

Let us start these services. PolyBase service  SQL 2019

Now again launch the ‘Create External Table’ wizard in Azure Data Studio. This starts the wizard successfully. Therefore, monitor the service status before launching the wizard to avoid any issues.

PolyBase External data source in Azure data studio SQL Server 2019

External table for SQL Server

In this section, we will use the below source and destination instances.

  • Source Instance (here we will create external table): SQL Server 2019 (Named instance – SQL2019)
  • Destination Instance (External table will point here): SQL Server 2019 (Default instance – MSSQLSERVER)

Click on the ‘SQL Server’ in the data source type of wizard and proceed to the next step. In the next step, create the Database Master Key to secure the credentials used by the external data source. We should use a complex password with a combination of lower case, upper case, alphanumeric and special characters.

PolyBase Create Master key SQL 2019

Go to the next step and create data source connection.

Server Name should in the format of [Instance Name IP Address].[Port]

PolyBase Create Data Source SQL 2019

This credential should have permission onto the SQL Server where we will point out external tables.

PolyBase data source error SQL 2019

  • Connect to the instance, create a login, and provide read permission to the user on WideWorldImporters database.

  • Now we can go further and see that no login failure issue occurred. Select the table from the database.

    PolyBase - Choose object for external table SQL 2019

We see here that the mapping as below:

  • Source Table: Sales.Invoices
  • Destination table: dbo.invoices

It automatically selects the destination schema as dbo since sales schema does not exist in our database. Therefore, let us create the schema and refresh the schema list to show it here:

Create schema

Select the sales schema from the drop-down. Click Next to view summary:

PolyBase Select the schema in drop down  SQL 2019

summary of create external table Azure data studio

Click on Create to configure an external table. Once the external table is created, we can access the data from it.

status message

PolyBase- View the records from external table

Catalog views for PolyBase

We can view the external tables in using the catalog view sys.external_tables. It shows all the external tables in the current database:

PolyBase catalog view sys.external_tables


We can also get the information about the data source using the catalog view sys.external_data_sources.

Using below query, we can see the name of the data source, location (location contains database and instance IP address along with the instance port address):


PolyBase catalog view the sys.external_data_sources.

Understand the script generated by the wizard

The Create an External table wizard can also be used to create scripts for the whole process. Let us understand the script generated by the Azure Data Studio for creating an external table in the above example by breaking the query into multiple parts.

  1. Below query creates the Master Key Encryption

    Create Master Key Encryption

  2. In this step, it created the database scoped credential.

    Create database scoped credential

  3. Create external data source pointing to SQL Server. In this external data source query, we need to specify the location in the format of <vendor>://<server>[:<port>].

    Since we are creating the external table for SQL Server, we need to specify the vendor as ‘sqlserver’.

    We also need to specify the port address with the colon. For example, in below query, we specified port number as 5290.

    Create external data source

  4. In the below section, we will create an external table. We need to create an external table similar to the relational database table with the column properties. We also need to specify a location for the object along with the data source.

    For example, in below query, we specify the location as [WideWorldImporters].[Sales].[Invoices] and DATA_SOURCE as [SQLServer]

    Create external table script

PUSHDOWN in PolyBase

We normally use predicates in the query in order to get a subset of the rows from the table. This subset allows pulling the records based on the conditions defined with where clause. These predicates can be as following as per the docs.

  • Binary comparison operators ( <, >, =, !=, <>, >=, <= ) for numeric, date, and time values.
  • Arithmetic operators ( +, -, *, /, % ).
  • Logical operators (AND, OR).
  • Unary operators (NOT, IS NULL, IS NOT NULL).

In PolyBase, we can use pushdown to improve the performance of the query for the external table. Mostly, we use this feature for the scale-out cluster cases where we can see significant improvement of the query performance. In this example, we will be using the standalone PolyBase configuration.

When we create an external data source for external table, we have the option to specify the value for PUSHDOWN as ON or OFF. The default value for pushdown is ON. Therefore, we do not need to specify a pushdown value if we want to enable it. Using PUSHDOWN, we can choose to move the computation to source system or not.

The syntax for an external data source with pushdown is as below:

We have already created a data source with a default value (Pushdown=ON) for the external table pointing to another SQL Server instance. Therefore, we will run the query with predicate with and without pushdown. To disable, pushdown we can use the predicate OPTION (DISABLE EXTERNAL PUSHDOWN) in the query.

Similarly, while creating the external data source if we disabled the pushdown, we can enable it while running the query as OPTION (FORCE EXTERNALPUSHDOWN);

Let us run the query and see the difference in performance.

  • Execute query with predicate and enabling Pushdown: In this query, we do not specify OPTION (FORCE EXTERNALPUSHDOWN)since it is by default enabled in the data source.

    Polybase -Execute query with predicate and enabling Pushdown SQL Server 2019

  • Query with predicate without pushdown: In this query, we disabled the pushdown with predicates OPTION (DISABLE EXTERNALPUSHDOWN):

    Polybase - Query with predicate without pushdown

We can see here the query without pushdown took 30.524 seconds while query with pushdown took 19.754 seconds so there is a significant performance improvement with this approach. PUSHDOWN allows moving computation source, which we can see improvement in performance.

Conclusion

In this latest article in our series, we have learned to create an external table for SQL Server data source with the Azure Data Studio Create external table wizard along with T-SQL as well. We also learned about the PushDown approach for computation queries. In the next series of the article, we will explore more on PolyBase for different data sources.

Table of contents

Enhanced PolyBase SQL 2019 – Installation and basic overview
Enhanced PolyBase SQL 2019 – External tables for Oracle DB
Enhanced PolyBase SQL 2019 – External tables using t-SQL
Enhanced PolyBase SQL 2019 – External tables SQL Server, Catalog view and PushDown
Enhanced PolyBase SQL 2019 – MongoDB and external table

Rajendra Gupta
Latest posts by Rajendra Gupta (see all)
168 Views