#SPARK

Getting started with the Cloudera Kudu storage engine in python

Cloudera Kudu is a distributed storage engine for fast data analytics. The python api is in alpha stage but already usable.
#python #pydata #spark

Peter Hoffmann Peter Hoffmann

EuroPython 2015 PySpark - Data Processing in Python on top of Apache Spark

Apache Spark is a computational engine for large-scale data processing. PySpark exposes the Spark programming model to Python. It defines an API for Resilient Distributed Datasets (RDDs) and the DataFrame API.
#python #pydata #spark #talk

Peter Hoffmann Peter Hoffmann

PyData 2015 Berlin - Introduction to the PySpark DataFrame API

This Talk from PyData 2015 Berlin gives an overview of the PySpark Data Frame API.
#python #pydata #spark #talk

Peter Hoffmann Peter Hoffmann