Author: Kieran Keene

Join me on this career development project as I set out to develop the skills required to progress up the technology career ladder! Check out https://netshock.co.uk/about/ to find out more.

About Kieran Keene

has published 291 posts

Join me on this career development project as I set out to develop the skills required to progress up the technology career ladder! Check out https://netshock.co.uk/about/ to find out more.

PySpark & Coalesce

Let’s say you have a big dataset. It’s formed of 1,000,000 files, all around 1MB each, so we have a 1TB dataset. On HDFS, our default block size is 128MB. This dataset would be stored on 7,800 partitions (dataset size divided by HDFS block size), meaning your job...