#SPARK

#python #pydata #spark

Peter Hoffmann Peter Hoffmann

Cloudera Kudu is a distributed storage engine for fast data analytics. The python api is in alpha stage but already usable. Read More


#python #pydata #spark #talk

Peter Hoffmann Peter Hoffmann

Apache Spark is a computational engine for large-scale data processing. PySpark exposes the Spark programming model to Python. It defines an API for Resilient Distributed Datasets (RDDs) and the DataFrame API. Read More


#python #pydata #spark #talk

Peter Hoffmann Peter Hoffmann

This Talk from PyData 2015 Berlin gives an overview of the PySpark Data Frame API. Read More