Cloudera Kudu is a distributed storage engine for fast data analytics. The python api is in alpha stage but already usable. Read More
Apache Spark is a computational engine for large-scale data processing. PySpark exposes the Spark programming model to Python. It defines an API for Resilient Distributed Datasets (RDDs) and the DataFrame API. Read More
This Talk from PyData 2015 Berlin gives an overview of the PySpark Data Frame API. Read More