Using Spark in conjunction with Pandas

When completing my domain normalisation project, I used Spark to do the heavy lifting – getting data in to a dataframe & aggregating (group by and sum) and then used Pandas for the domain manipulation. Finally, I converted my Pandas dataframe back to Spark, to write it to HDFS.

An example of this is included below. This shows how we can use Pandas and Spark in conjunction with one another.