In this article on PolyBase, we will explore more use case scenarios for external tables using T-SQL.
SQL Server 2019 provides Data Virtualization through PolyBase to various data sources such as SQL Server, Oracle, Teradata, and ODBC based data sources etc. In my previous articles, we explored
- Installation of PolyBase in SQL Server 2019 Preview (CTP 2.0)
- Installation of Azure Data Studio
- Installation and setup database and objects in Oracle 11g Express Edition
- Creating with Azure Data Studio
- Accessing data
The external table works only if PolyBase is able to connect to the external data source. We might get the below error if there is an issue in the connection.
08:32:21 Started executing query at Line 1
OLE DB provider “SQLNCLI11” for linked server “(null)” returned message “Login timeout expired”.
OLE DB provider “SQLNCLI11” for linked server “(null)” returned message “A network-related or instance-specific error has occurred while establishing a connection to SQL Server. Server is not found or not accessible. Check if instance name is correct and if SQL Server is configured to allow remote connections. For more information see SQL Server Books Online.”
Msg 10061, Level 16, State 1, Line 0 TCP Provider: No connection could be made because the target machine actively refused it.
Total execution time: 00:01:04.003
It might be due to the following reasons
- The external database is not running.
- PolyBase Services are not running. If the External database is up and running but we do not have a PolyBase service in running status, we also get the same error message.
In part 2 of the series, we saw that the external table could be accessed similarly to a relational database table. One more advantage is that we can join them with any relational tables.
Let us see how we can join the external table with the relational DB tables. I have saved the data into a CSV file so we will import the table using my earlier article, SQL Server Data Import using Azure Data Studio. Therefore, you can follow the article in the same way in the Azure Data Studio also. I will just give high-level steps to import data from flat file into Azure Data Studio in this article.
-
Launch Import Wizard in Azure Data Studio
-
Specify the input file, table name, and schema name.
-
Preview data
-
Review the column properties and make changes if required
-
Import Data into SQL Server and we can see the successful message below:
We now have two tables in our table as highlighted below:
- Dbo.Departments: SQL Database relational table
- Dbo.Employees: External table pointing to Oracle Database
Now let us run the below query in Azure Data Studio to pull up data from both the tables.
1 2 3 4 5 6 7 |
SELECT e.employee_id, e.first_name, e.last_name, e.department_id, d.location_id,d.department_name FROM employees e, departments d WHERE e.department_id = d.department_id |
We can see here that we can join the external data with a relational database table like a normal join operation only. This provides a good way to use the external tables with the way we want in our queries. In the applications as well we can pull up the required information from external data sources using Polybase without going to each data source separately.
PolyBase external tables do not allow the update or insert data as of now. We can only select the data from that table. If we try to update one, we get the below error message:
Msg 46519, Level 16, State 16, Line 1 DML Operations are not supported with external tables.
Until now, we have used the External Table Wizard to create tables for Oracle DB. Now let us create the table using t-SQL. We will also use the SQL Server Management Studio 18.0 Preview 4.
In this example, we will use the below:
- SQL Server Database: ExternalTableDemo
- External DataSource: Oracle Database (Service Name – XE)
Steps to configure using t-SQL
-
The first step is to create a database master key. We need to specify the password to encrypt the master key in the specified DB. We should use complex password meeting the password policy.
Use the below query to create a database master key in the ExternalTableDemo database.
1CREATE MASTER KEY ENCRYPTION BY PASSWORD = 'Pass@word1';
-
Now we need to create a database-scoped credential. SQL Server uses this credential to access the external data source. We can use the below query to create database scoped credentials. We need to specify the Identity (name of the account ) and SECRET (password) to connect to Oracle Data Source.
1234Use ExternalTableDemoGoCREATE DATABASE SCOPED CREDENTIAL [OracleDB]WITH IDENTITY = 'system', SECRET = 'ABC@system1';
-
In this step, we will configure the external data source. We need to specify the connection string for the data source along with the credential created in above step. The connection string should be in format of ‘<vendor>://<server>[:<port>]’.
12CREATE EXTERNAL DATA SOURCE [OracleDBSource]WITH (LOCATION = 'oracle://192.168.225.185:1521', CREDENTIAL = [OracleDB]);
We can see the Location specified as oracle: //192.168.225.185:1521′ in the format specified above.
Vendor: Oracle Server: 192.168.225.185 Port: 1521
-
Now, we have completed setting up the external data source pointing to Oracle DB with the credentials specified. In this step, we need to create an external table in the SQL Server. We need to specify the table columns, data types, properties similar to a relational database table. We also need to specify the table location in the format of <database_name>.<schema_name>.<object_name>.
In the below query, we can see the location as [XE].[DEMOUSER].[REGIONS] which is in the line of the format specified above.
123456CREATE EXTERNAL TABLE [dbo].[REGIONS]([REGION_ID] FLOAT NOT NULL,[REGION_NAME] VARCHAR(25) COLLATE Latin1_General_CI_AS)WITH (LOCATION = '[XE].[DEMOUSER].[REGIONS]', DATA_SOURCE = [OracleDBSource]);Note: If there are any issues in the authentication to Oracle Data Source due to invalid username or password in the credential step, we will get the below error. You can see the error code is ORA-01017 that shows error raised by Oracle DB due to invalid credentials.
You can either recreate the credential or alter the credential with the correct password.
Now let us run the query again to create the external table. We can see below that query is executed successfully now.
-
We can view the table now in the database.
Run the select statement to view the data into the table.
To verify the records, let us view this table directly into the Oracle DB. We can see that the records are same in both the Oracle DB and the SQL DB. This also shows that the External Tables do not contains data instead they pull up the data from the external data source.
-
Create Statistics to improve performance.
This is an optional step.
1CREATE STATISTICS RegionsKeyStatistics ON regions (region_ID) WITH FULLSCAN;We will not see much performance improvement in this example due to only a few records, but you can see the difference as shown.
Conclusion
In this article on PolyBase, we explored the additional use case of the external case along with creating an external table with t-SQL. You can now create them using both the External table Wizard in Azure Data Studio and using t-SQL as well. I will cover creating an external table with SQL Server as Data Source in my next article. Stay tuned!
Table of contents
- How to install PostgreSQL on Ubuntu - July 13, 2023
- How to use the CROSSTAB function in PostgreSQL - February 17, 2023
- Learn the PostgreSQL COALESCE command - January 19, 2023