The big data guide

The term is banded around a lot, with companies thinking that they have ‘big data’. But what is big data? How much data do you need to have for it to be classed as ‘big’? And why should you want it anyway?

What is Big Data

First and foremost let’s tackle the biggest question, what is big data? For me to be able to give you the best possible description, I’d like to walk you through a real world example.
Let’s look at Ebay, they have millions of people visiting their website each day, some buy stuff, some mark products on their watch list and others just come and go without interacting with any of the products on the website.

Each activity of a buyer, seller and website guest is stored somewhere in the depths of Ebay’s databases. Through this data, they are able to find out what products each user has been looking at, which are most popular products, the conversion rates (number of people viewing the product vs the number buying) and they know plenty more information too.

But storing the data isn’t much use to the organization, they need to be able to analyse and use the data to gain competitive advantage. They need to be able to analyse the data in real time to ascertain what sort of items you’re interested in and to make tailored recommendations, based on your behaviours and perceived interests.

The process of crunching this big data set and gaining insights is what we would class as a big data project. So, now that we know what big data projects are, it begs another question, how big does big data need to be?

Well, that’s one of those ‘how long is a piece of string’ questions. Depending on the capabilities of the organization that owns the dataset, it may or may not be classed as a big data project. For example, for a small company, with only Microsoft Excel at it’s disposal, 10GB of data could be huge and could take a long time to process. Conversely, global organizations, such as Amazon, will process thousands of gigabytes of data each hour – so to them, 10GB would not be considered a big data project at all.

Another consideration to help us to ascertain whether something should be considered as big data is the rate of growth of the data and how quickly it needs to be processed. So, you might have 100GB of data, but if it only grows by 1GB a year, and it doesn’t need to be processed quickly, I would not consider it to be big data.

The final dimension that influences whether something should be considered as big data is the variety of the data. A true big data project aggregates data from multiple sources and file types to crunch the datasets in parallel. As you can imagine, this provides true value to a business as they are able to see, in a single view, how all aspects of their product or service is performing. We could, for example, pull data from local databases, CSV files, text files and even web sources, such as Google Analytics.

If you can answer yes to the below questions, then you have a big data project on your hands:

  1. Is the data so big that Microsoft Excel is unable to handle it?
  2. Does the data grow at a fast rate?
  3. Does the data need to be processed soon after collection or in real time?
  4. Do you pull data from multiple sources / file types?

What are the benefits?

We just spoke about what a big data project is, but we need to understand the benefits of undertaking a technology project that is both time and investment heavy.

The major benefit of big data projects is that you’re able to obtain much more insights from your data than ever before, many of those insights can be delivered in real time – so no more waiting for scheduled reports to be delivered to your inbox (or worse, through the mail), you can see your product or service performance in real time from any internet connected device.

Customers come first

As we know, the customer comes first, so let’s talk about the benefits to them. Well, through your big data projects, you’ll be able to provide a much more tailored service to your customers. You’ll be able to analyse their behaviours, likes and dislikes and offer them special offers and discounts for the services that they’re interested in, which is far more tailored and specific to the client than the spam mail we’re so used to receiving.

You can go a step further too, I mean, why not remodel your web pages, based on the users likes / dislikes and the way that they interact with your pages? If you know that they always skip past the top 2 sections of the page without reading it, why not hide it for them – getting them to the content that they want faster?

Along the same lines, you’ll be able to understand how the demographics of your client base change over time. Many businesses customer demographics do stay fairly static, but I bet you that Facebook’s customer profiles have changed drastically over the last few years. Why do I think that? Well, Facebook used to be for the young, trendy individuals in their teens and early twenties, now the social network is pulling in users of all ages – even my mum is on Facebook!!!

Facebook will have realtime analytics that will tell them that lots of their users parents and grandparents are registering for the service. The introduction of different age ranges using the platform will influence the way they design new features and the way they communicate to their users.

You can see that Facebook is starting to tie people together too – if you’ve got mutual friends, they suggest them as potential members of your friendship circle. As time goes on, that functionality will become more refined and will become more accurate.

Identifying problems, before they become major problems

As much as you will hope and pray that your service or product does not face any major problems, the likelihood is that it will. Let’s look at Myspace for a minute, way back when, they were a major player in the social network market – as people started to move over to Facebook, they were caught on their back foot. Could realtime analytics have helped them keep their customers?

Well, yes and no. The service was inferior to Facebook, there is no doubt about it, so trying to persuade the masses to stay may not have worked. However, there were probably some early warning signs that they missed, through a lack of detailed data about their users.

Let’s say that they had access to the way people interacted with their site – they could quickly identify the bits people used and the bits they didn’t. By providing a more streamlined, slick user journey, they could have improved user satisfaction and possibly retention.

Anyway, I digress – realtime analytics helps you to identify problems with your service before they become problems. Examples include brute force attacks, DDOS attacks, reduction in site performance, site downtime and plenty more. All this is very useful information which will help you to maintain a solid service that performs well.

Identify process inefficiencies

Having your data crunched is useful for a number of reasons, which we’ve discussed above, but for me, process optimization is one of the major ones.

As an example, at my company we have an IT support system. When a customer raises an issue with a product or service, it gets sent to first line support. If they can’t resolve it, it gets sent on to second line support, and so on. Sometimes though, it seems that problems sit with a certain team for a very long time, resulting in disgruntled customers when they realize that their issue hasn’t even been reviewed yet.

The system that we have to monitor these support calls does not show on screen the length of time that the call sat with each time, however, that data is held in the database. As such, a big data project would be able to identify the weak links in the process and would enable the business to make the necessary corrections.

This is a very small scale example, but imagine how this could impact the efficiency of your supply chain!

How can I calculate ROI?

Big data projects initiatives are in their infancy, as such, making a compelling argument for adopting big data tools based on potential return on investment is pretty touch – remember how hard it was to get your company to adopt social media as a marketing channel, well times that was difficulty level 2 and this is level 100. Why so much higher? Well, it costs a lot more!

There are ways to sell these ideas to the business but before we look at them, it’s important to remember that big data isn’t going to tell you the answers to the questions you haven’t asked. That is to say, you need to figure out what data you need to answer the questions you need answering, before you start to even consider the potential advantages of a business intelligence initiative.

Once you have that information, consider how long it would take you to get that information now and how often you require it. Maybe there someone in the organization that spends a large proportion of time pulling data together already. For example, in one of my clients companies, they have someone that spends half of every month pulling together a web statistics report from Google Analytics (among other sources). That is not an effective use of his time, so for a big data project, the ROI would be 50% of his salary.

So, as above, one of the biggest return on investments that we can see from big data tools is the shift in resourcing requirements. In one of my current client companies, I can easily automate 200 hours of reporting per week using a tool called Qlikview (which we will discuss in more detail later on). Let’s say that on average, the individual works for 8 hours a day, that’s 25 days that this group of individuals spend reporting each week. Now, let’s assume that they earn £40,000 a year, that’s £3,845 a week that the company is spending on reporting, which could be automated.

Now, that is a compelling sales pitch! Imagine being able to say that you could save the company nearly £200,000 a year on reporting resourcing – oh and by the way, the reports can be provided much more frequently too, its a win, win situation.

To really prove the value of a big data solution, you should always run a proof of concept. Companies like Qlik provide free trials of their software, so you can test reporting possibilities, based on all your data sources, before you start – having something tangible to show your superiors can really make the difference between gaining approval, or not.

Image used under creative commons

Tagged under: