Azure Synapse SQL-on-Demand offers a web client, the desktop version Azure Data studio and odbc access with turbodbc to query parquet files in the Azure Data Lake.
Azure Synapse SQL-on-Demand pools can be accessed though an odbc compatible client from python.
First you need to grant access to the sql endpoint for an external DB user:
CREATE LOGIN testuser WITH password='xxx';
SQL Analytics on-demand query reads files directly from Azure Storage. Since the storage account is an object that is external to SQL Analytics on-demand, appropriate credentials are required. A user needs the appropriate permissions granted to use the requisite credential.
Delegation of access to Azure blob storage accounts can be done with AAD pass-through or giving manual credentials
CREATE CREDENTIAL [https://blob.dfs.core.windows.net/benchmark] WITH IDENTITY='SHARED ACCESS SIGNATURE' , SECRET = 'sv=2018-03-28xxxx' GO GRANT REFERENCES ON CREDENTIAL::[https://blob.dfs.core.windows.net/benchmark] TO [testuser];
To connect to an Azure SQL-on-Demand endpoint you need to follow the installation of the ODBC driver for debian.
curl https://packages.microsoft.com/keys/microsoft.asc | apt-key add - curl https://packages.microsoft.com/config/debian/9/prod.list > /etc/apt/sources.list.d/mssql-release.list apt-get update ACCEPT_EULA=Y apt-get install msodbcsql17 ACCEPT_EULA=Y apt-get install mssql-tools echo 'export PATH="$PATH:/opt/mssql-tools/bin"' >> ~/.bash_profile echo 'export PATH="$PATH:/opt/mssql-tools/bin"' >> ~/.bashrc
Now you can connect with turbodbc to the SQL-on-demand pool to execute your queries:
import turbodbc server ='mysynapse-ondemand.sql.azuresynapse.net' port = 1433 database="master" uid="testuser" pwd="xxx" con = turbodbc.connect(driver='ODBC Driver 17 for SQL Server', server=server, port=port, database=database, uid=uid, pwd=pwd) stm = ''' SELECT TOP 100 * FROM OPENROWSET( BULK 'https://blob.dfs.core.windows.net/benchmark/*/01.parquet', FORMAT='PARQUET' ) AS [r]; ''' cur = con.cursor() cur.execute(stm) print(cur.fetchall())
You can also use all the PyArrow/Pandas features in turbodbc to efficiently run workflows for data intensive machine learning applications.
cursor.execute(stm) table = cursor.fetchallarrow() df = table.to_pandas()
In addition to the odbc interface Azure offers a web and a desktop client to run ad hoc queries on the SQL service.
Azure Synapse Studio is the integrated web client to interact with an Azure Synapse Workspace. It offers an online sql script editor and a browser for Azure Blob Storage Accounts. Based on parquet file inspection it can infer schemata and generate create external tables for parquet data in the storage accounts.
Access to the Workspace is based on the azure managed identities (AAD). Permissions can be granted to the SQL pools in the workspace. During creation of the workspace one can grant the managed identity CONTROL permissions on SQL pools.
Azure Synapse Studio offers keyword completion, syntax highlighting and some keyboard shortcuts. Run on-demand SQL queries, view and save results as CSV export.
Azure Data Studio is a cross platform sql editor and database tool from Microsoft. It supports connecting to a Azure Synapse SQL on Demand server through the managed azure identities (AAD).
Azure Data Studio offers multiple tab windows, a rich SQL editor, IntelliSense, keyword completion, code snippets, code navigation, and source control integration (Git). It can run on-demand SQL queries, view and save results as CSV, JSON, or Excel.