Big data can be defined through the use of the 6 V’s. They are Volume, Variety, Velocity, Veracity, Valence and Value.
Let’s start with volume – it’s the one most spoken about and the one that we can all relate to. Volume is the size of the data, the greater the volume, the greater our data accessibility, storage and query challenges are. As part of our big data initiative, we aren’t looking to control the volume of data we analyse; our goal is to ingest it all, analyse it as efficiently as possible and gain maximum organizational insight.
Next we have variety. This refers to the different types of data ingest into the big data cluster. For example, images, audio, text, video. When we ask how heterogeneous our data is, we’re asking how much variety there is in the data we ingest. With variety, comes great complexity.
The speed at which we consume data into the cluster is the velocity. Imagine we had two Hadoop clusters. Both were going to receive 100GB of data. One of them, was going to receive that 100GB within 1 hour and the other was going to have the data drip fed to it over the next month. The two clusters would have entirely different architectures and entirely different capabilities. The speed (or velocity) of the data impacts the design decisions we make greatly.
Another factor that makes our lives as big data architects particularly challenging is the veracity of the data. We can think of Veracity as the uncertainty of data. That is, how reliable is the data? Can we trust it to make informed decisions? Data with low quality brings about all sorts of challenges in the big data space. Generally, veracity is linked to user generated data which is often of lower quality than system-generated data.
The 5th V of big data is valence. This is the connectedness of data. That is, the fraction of data that is actually connected to other items vs the total possible connections.
So, if we were to assign one word to each of the V’s to make them easy to remember, we would assign:
- Variety: Complexity
- Volume: Size
- Velocity: Speed
- Valence: Connectedness
- Veracity: Quality