Peter Hoffmann Follow Director Data Engineering at Blue Yonder. Python Developer, Conference Speaker, Mountaineer May 25, 2020

Using turbodbc to access Azure Synapse SQL-on-Demand endpoints

Azure Synapse SQL-on-Demand offers a web client, the desktop version Azure Data studio and odbc access with turbodbc to query parquet files in the Azure Data Lake.

ODBC access via turbodbc/python

Azure Synapse SQL-on-Demand pools can be accessed though an odbc compatible client from python.

First you need to grant access to the sql endpoint for an external DB user:

CREATE LOGIN testuser WITH password='xxx';

SQL Analytics on-demand query reads files directly from Azure Storage. Since the storage account is an object that is external to SQL Analytics on-demand, appropriate credentials are required. A user needs the appropriate permissions granted to use the requisite credential.

Delegation of access to Azure blob storage accounts can be done with AAD pass-through or giving manual credentials

CREATE CREDENTIAL [https://blob.dfs.core.windows.net/benchmark]
WITH IDENTITY='SHARED ACCESS SIGNATURE'
, SECRET = 'sv=2018-03-28xxxx'
GO

GRANT REFERENCES ON CREDENTIAL::[https://blob.dfs.core.windows.net/benchmark] TO [testuser];

To connect to an Azure SQL-on-Demand endpoint you need to follow the installation of the ODBC driver for debian.

curl https://packages.microsoft.com/keys/microsoft.asc | apt-key add -
curl https://packages.microsoft.com/config/debian/9/prod.list > /etc/apt/sources.list.d/mssql-release.list
apt-get update
ACCEPT_EULA=Y apt-get install msodbcsql17
ACCEPT_EULA=Y apt-get install mssql-tools

echo 'export PATH="$PATH:/opt/mssql-tools/bin"' >> ~/.bash_profile
echo 'export PATH="$PATH:/opt/mssql-tools/bin"' >> ~/.bashrc

Now you can connect with turbodbc to the SQL-on-demand pool to execute your queries:

import turbodbc

server ='mysynapse-ondemand.sql.azuresynapse.net'
port = 1433
database="master"
uid="testuser"
pwd="xxx"

con = turbodbc.connect(driver='ODBC Driver 17 for SQL Server',
                       server=server,
                       port=port,
                       database=database,
                       uid=uid,
                       pwd=pwd)

stm = '''
SELECT
    TOP 100 *
FROM
    OPENROWSET(
        BULK 'https://blob.dfs.core.windows.net/benchmark/*/01.parquet',
        FORMAT='PARQUET'
    ) AS [r];
'''

cur = con.cursor()
cur.execute(stm)
print(cur.fetchall())

You can also use all the PyArrow/Pandas features in turbodbc to efficiently run workflows for data intensive machine learning applications.

cursor.execute(stm)
table = cursor.fetchallarrow()
df = table.to_pandas()

Clients

In addition to the odbc interface Azure offers a web and a desktop client to run ad hoc queries on the SQL service.

Azure Synapse Studio

Azure Synapse Studio is the integrated web client to interact with an Azure Synapse Workspace. It offers an online sql script editor and a browser for Azure Blob Storage Accounts. Based on parquet file inspection it can infer schemata and generate create external tables for parquet data in the storage accounts.

Access to the Workspace is based on the azure managed identities (AAD). Permissions can be granted to the SQL pools in the workspace. During creation of the workspace one can grant the managed identity CONTROL permissions on SQL pools.

Azure Synapse Studio

Azure Synapse Studio offers keyword completion, syntax highlighting and some keyboard shortcuts. Run on-demand SQL queries, view and save results as CSV export.

Azure Data Studio

Azure Data Studio is a cross platform sql editor and database tool from Microsoft. It supports connecting to a Azure Synapse SQL on Demand server through the managed azure identities (AAD).

Azure Data Studio

Azure Data Studio offers multiple tab windows, a rich SQL editor, IntelliSense, keyword completion, code snippets, code navigation, and source control integration (Git). It can run on-demand SQL queries, view and save results as CSV, JSON, or Excel.