Apache Spark is a computational engine for large-scale data processing. PySpark exposes the Spark programming model to Python. It defines an API for Resilient Distributed Datasets (RDDs) and the DataFrame API.
This Talk from PyData 2015 Berlin gives an overview of the PySpark Data Frame API.
When your application grows beyond one machine you need a central space to log, monitor and analyze what is going on. Logstash and elasticsearch store your logs in a structured way. Kibana is a web fronted to search and aggregate your logs.