Big Data Crash Course: Apache Ranger

Apache Ranger enables us to monitor and manage data security across our Hadoop cluster. It enables us to define a security policy for users or groups of the cluster once and apply it across all supported components in the Hadoop stack.

Ranger currently supports security policies for HDFS, YARN, Hive, HBase, Storm, Knox, Solr and Kafka. The below diagram highlights the ecosystem that it supports nicely.

In the above, the policy admin server is the central interface for security administration tasks. From here, we can create/update policies, view our audit data and manage users. The policy admin server supports LDAP, AD and Unix authentication.

The user sync server enables us to synchronise users from out UNIX, LDAP or AD groups. The users are stored in ranger but the policy definitions are stored on the policy database.

Plugins pull policies from the server and store them in the cache locally and act as an authentication module to evaluate requests before granting access. The plugin collects data from each request and stores it in the audit database.

In relation to HDFS, the plugin should be installed on all name nodes. The plugin evaluates requests and provides access based on the policy rules.

Permissions for a user or group can be enforced as policies on databases, tables, columns, folders and files and can be integrated with our directory service (e.g. AD) if we wish.

Within Hive, Ranger can control who can carry out select, update, create, drop, alter, index and lock activities on Hive.

A core Ranger component is their Key Management Service (KMS), which is a scalable key management service, enabling us to encrypt data at rest in HDFS; while Hadoop itself manages the encryption of network traffic as it comes into the Hadoop cluster (via RPC, HTTP, DTP or JDBC).