TIL how to use fontawesome markers with folium.
#python
#pydata
#visualization
#til
Forecasts crave a rating that reflects the forecast's quality in the context
of what is possible in theory and what is reasonable to expect in practice.
#pydata
#python
#meetup
Snowflake offers different ways to access and call python from within their
compute infrastructure. This post will show how to access python in
user defined functions, via stored procedures and in snowpark.
#python
#sql
#snowflake
#pydata
Using SQLAlchemy to create openrowset common table expressions for Azure Synapse SQL-on-Demand
#python
#sql
#pydata
#azure
Azure Synapse SQL-on-Demand offers a web client, the desktop version
Azure Data studio and odbc access with turbodbc to query parquet files in
the Azure Data Lake.
#python
#sql
#pydata
#azure
Last summer Microsoft has rebranded the Azure Kusto Query engine as Azure Data Explorer. While it does not support fully elastic scaling, it at least allows to scale up and out a cluster via an API or the Azure portal to adapt to different workloads. It also offers parquet support out of the box which made me spend some time to look into it.
#python
#pydata
#azure
#parquet
Apache Parquet is a columnar file format to
work with gigabytes of data. Reading and writing parquet files is efficiently
exposed to python with pyarrow. Additional statistics allow clients to use
predicate pushdown to only read subsets of data to reduce I/O.
Organizing data by column allows for better
compression, as data is more homogeneous. Better compression also reduces the
bandwidth required to read the input.
#python
#pydata
#parquet
#arrow
#pandas
Microsoft has released a beta version of the python client azure-storage-file-datalake for the Azure Data Lake Storage Gen 2 service with support for hierarchical namespaces.
#python
#pydata
Exasol on Microsoft Azure – automatic deployment in less than 30 minutes
#exasol
#azure
#pydata
#talk
Cloudera Kudu is a distributed storage engine for fast data analytics.
The python api is in alpha stage but already usable.
#python
#pydata
#spark
Apache Spark is a computational engine for large-scale data processing.
PySpark exposes the Spark programming model to Python. It defines an API
for Resilient Distributed Datasets (RDDs) and the DataFrame API.
#python
#pydata
#spark
#talk
This Talk from PyData 2015 Berlin gives an overview of the PySpark Data Frame API.
#python
#pydata
#spark
#talk