BIG DATA

THE CLOUD

STRATEGY

BUSINESS ANALYSIS

Analytics & Machine Learning

LATEST POSTS

Strategy
Kieran Keene

An overview of the industry lifecycle

The industry lifecycle provides a view of the typical birth to death cycle of an industry. It has the phases: introduction, growth, maturity and decline – we will use the

Strategy
Kieran Keene

Corporate vs business strategy – what are they?

It’s been a little while, but we’re back with a series on business strategy and to kick it off, we’re going to look at the difference between corporate and business

Big Data
Kieran Keene

Overview of decision trees & random forests

A decision tree builds a model in the form of a tree structure – almost like a flow chart. In order to calculate the expected outcome, it uses decision points

Big Data
Kieran Keene

Using Spark in conjunction with Pandas

When completing my domain normalisation project, I used Spark to do the heavy lifting – getting data in to a dataframe & aggregating (group by and sum) and then used

Big Data
Kieran Keene

Hive: Partition an un-partitioned table

There is no way to automatically partition an un-partitioned table. So, we have to follow the below simple process as a workaround: Create a new table #SHOW THE CREATE STATEMENT

Big Data
Kieran Keene

How does machine learning work?

Machine learning uses statistical techniques to give computer systems the ability to ‘learn’ rather than being explicitly programmed. By learning from historical inputs. we’re able to achieve far greater accuracy

Big Data
Kieran Keene

Python: finding most travelled customers

Using customer usage logs, I need to identify the customers that travel most each day and understand the most popular routes across the world. Justification:  Understanding the most travelled customers

Big Data
Kieran Keene

Getting started with PySpark

In this article, we’ll look at some of the key components of PySpark, which is one of the most in-demand big data technologies at the current time. Spark Session Spark