BIG DATA

THE CLOUD

STRATEGY

BUSINESS ANALYSIS

Analytics & Machine Learning

LATEST POSTS

Technology blog news
Kieran Keene

Netshock topics for remainder of 2018

Below are the list of articles planned for Netshock Python Data Analysis: Sequence Functions Python Data Analysis: NumPy basics Python Data Analysis: Pandas basics Python Data Analysis: Working with text,

Big Data
Kieran Keene

Big Data Crash Course: HBase

HBase is NOSQL database that runs on top of HDFS. It’s suited to real time read and write access to large datasets that have a flexible schema. What does that

Big Data
Kieran Keene

Big Data Crash Course: Hive

Apache Hive provides us with a familiar SQL-like query language to access and analyse the data stored in HDFS. Hive translates our SQL-like queries into mapreduce jobs – so even

Big Data
Kieran Keene

Big Data Crash Course: Map Reduce

Map Reduce is a parallel computing framework that enables us to distribute computing across multiple data nodes. Let’s look at an example. We have a file, that has the below

Big Data
Kieran Keene

Big Data Crash Course: Hadoop Security Overview

Below are the core components of a security policy in Hadoop: ADMINISTRATION Define policies for the cluster. Who can access what? From where? Handled by : Apache Ranger AUTHENTICATION Requires

Big Data
Kieran Keene

Big Data Crash Course: HDFS

HDFS stands for Hadoop Distributed File System. A distributed file system manages files and folders across multiple servers for the benefit of resiliency and rapid data processing by utilizing parallelisation

Big Data
Kieran Keene

Big Data Crash Course: Apache Ranger

Apache Ranger enables us to monitor and manage data security across our Hadoop cluster. It enables us to define a security policy for users or groups of the cluster once

Big Data
Kieran Keene

Big Data Crash Course: Kerberos, Knox and Atlas

Kerberos Kerberos authenticates users. It can be complex to configure – to make everything a lot simpler, we can carry out simplified kerberos setup, config and maintenance through Ambari. Kerberos

Big Data
Kieran Keene

Big Data Crash Course: Combining Ingestion Tools

We have discussed earlier in this series the benefits of Flafka (Flume and Kafka) and NiFi coupled with Kafka. Here, we will review those configurations and also look at other

Big Data
Kieran Keene

Big Data Crash Course: Spark Streaming

Apache Spark is a framework available in Hadoop. That framework is comprised of SparkSQL, Spark Streaming and Spark Machine Learning Library (MLLib). We will go through the entire framework in

Big Data
Kieran Keene

Big Data Crash Course: Apache Storm

Apache Storm provides us with a distributed, real-time computation platform. It has been designed to reliably process streams of large data with high velocity. To put this into something we