Cloudera Kudu is a distributed storage engine for fast data analytics.
The python api is in alpha stage but already usable.
#python
#pydata
#spark
Apache Spark is a computational engine for large-scale data processing.
PySpark exposes the Spark programming model to Python. It defines an API
for Resilient Distributed Datasets (RDDs) and the DataFrame API.
#python
#pydata
#spark
#talk
This Talk from PyData 2015 Berlin gives an overview of the PySpark Data Frame API.
#python
#pydata
#spark
#talk