Cloudera Kudu is a distributed storage engine for fast data analytics. The python api is in alpha stage but already usable.
Apache Spark is a computational engine for large-scale data processing. PySpark exposes the Spark programming model to Python. It defines an API for Resilient Distributed Datasets (RDDs) and the DataFrame API.
This Talk from PyData 2015 Berlin gives an overview of the PySpark Data Frame API.