Peter Hoffmann

Using turbodbc to access Azure Synapse SQL-on-demand endpoints

ODBC access via turbodbc/Python

Azure Synapse SQL-on-Demand pools can be accessed through an ODBC-compatible client from Python.

First, you need to grant access to the SQL endpoint for an external database user:

CREATE LOGIN testuser WITH password='xxx';

SQL on-demand queries read files directly from Azure Storage. Because the storage account is external to SQL on-demand, appropriate credentials are required. The user must have permission to use the credential.

Delegation of access to Azure Blob Storage can be done with AAD pass-through or manual credentials.

CREATE CREDENTIAL [https://blob.dfs.core.windows.net/benchmark]
WITH IDENTITY='SHARED ACCESS SIGNATURE',
     SECRET = 'sv=2018-03-28xxxx';
GO

GRANT REFERENCES ON CREDENTIAL::[https://blob.dfs.core.windows.net/benchmark] TO [testuser];

To connect to an Azure SQL-on-Demand endpoint, install the ODBC driver for Debian.

curl https://packages.microsoft.com/keys/microsoft.asc | apt-key add -
curl https://packages.microsoft.com/config/debian/9/prod.list > /etc/apt/sources.list.d/mssql-release.list
apt-get update
ACCEPT_EULA=Y apt-get install msodbcsql17
ACCEPT_EULA=Y apt-get install mssql-tools

echo 'export PATH="$PATH:/opt/mssql-tools/bin"' >> ~/.bash_profile
echo 'export PATH="$PATH:/opt/mssql-tools/bin"' >> ~/.bashrc

Now you can connect using turbodbc to the SQL-on-Demand pool and execute your queries:

import turbodbc

server = 'mysynapse-ondemand.sql.azuresynapse.net'
port = 1433
database = "master"
uid = "testuser"
pwd = "xxx"

con = turbodbc.connect(driver='ODBC Driver 17 for SQL Server',
                       server=server,
                       port=port,
                       database=database,
                       uid=uid,
                       pwd=pwd)

stm = '''
SELECT
    TOP 100 *
FROM
    OPENROWSET(
        BULK 'https://blob.dfs.core.windows.net/benchmark/*/01.parquet',
        FORMAT='PARQUET'
    ) AS [r];
'''

cur = con.cursor()
cur.execute(stm)
print(cur.fetchall())

You can also use the PyArrow/Pandas integration in turbodbc to efficiently run data-intensive machine learning workflows.

cur.execute(stm)
table = cur.fetchallarrow()
df = table.to_pandas()

Clients

In addition to the ODBC interface, Azure offers a web and a desktop client to run ad hoc queries on the SQL service.

Azure Synapse Studio

Azure Synapse Studio is the integrated web client to interact with an Azure Synapse Workspace. It offers an online SQL script editor and a browser for Azure Blob Storage accounts. Based on Parquet file inspection, it can infer schemas and generate CREATE EXTERNAL TABLE statements for Parquet data in the storage accounts.

Access to the Workspace is based on Azure AD managed identities (AAD). Permissions can be granted to the SQL pools in the workspace. During creation of the workspace, you can grant the managed identity CONTROL permission on SQL pools.

Azure Synapse Studio

Azure Synapse Studio offers keyword completion, syntax highlighting, and keyboard shortcuts. You can run on-demand SQL queries and view and save results as CSV exports.

Azure Data Studio

Azure Data Studio is a cross-platform SQL editor and database tool from Microsoft. It supports connecting to an Azure Synapse SQL-on-Demand server through managed Azure identities (AAD).

Azure Data Studio

Azure Data Studio offers multiple tab windows, a rich SQL editor, IntelliSense, keyword completion, code snippets, code navigation, and source control integration (Git). It can run on-demand SQL queries and view and save results as CSV, JSON, or Excel.