Big data series: an introduction to graph analysis

Graph data can be extremely useful for identifying relationships between things & deriving opinion, communities and links between elements.

On a graph, we call the circles ‘nodes’ or ‘entities’ and the lines ‘connections’ or ‘edges’.

Before going into any detail, let’s look at the below. So, John lives in Bristol and is looking for a café in London. He doesn’t know anyone that lives in London as he has spent his whole life living and working in Bristol. However, through his social network, we can see that he knows people that live in Surrey and in turn, they know people that live in London.

As we work through the graph, we can derive that Cafe1 is the most popular, why? Because it has the most blue ‘like’ lines adjoining to it. The power of the network has enabled us to find the most popular café, as determined by friends of friends.

Red lines = friends with
Green lines = live in
Blue Lines = Likes

Now, we can start talking about neighbourhoods. In the below example, the dotted lines represent the neighbourhood. So, Helen, James and April are in John’s first neighbourhood and everyone else is in John’s second neighbourhood as they’re two steps away from John.

In situations where we have densely populated or clustered nodes, we can class this as a community. This is well represented in the below image (source). This is a social network, where we can clearly see friendship groups / communities, interlinked by a just a few people that join those communities together.