Big Data Crash Course: Kerberos, Knox and Atlas



Kerberos authenticates users. It can be complex to configure – to make everything a lot simpler, we can carry out simplified kerberos setup, config and maintenance through Ambari.

Kerberos is a network authentication protocol. It is designed to provide strong authentication for client/server applications by using secret-key cryptography. It has the following characteristics:

  • It is secure: it never sends a password unless it is encrypted.
  • Only a single login is required per session. User credentials are defined at login are then passed between resources without the need for additional logins.
  • The concept depends on a trusted third party – a Key Distribution Center (KDC). The KDC is aware of all systems in the network and is trusted by all of them.
  • It performs mutual authentication, where a client proves its identity to a server and a server proves its identity to the client.

Kerberos introduces the concept of a Ticket-Granting Server (TGS). A client that wishes to use a service has to receive a ticket – a time-limited cryptographic message – giving it access to the server.

Kerberos also requires an Authentication Server (AS) to verify clients. The two servers combined make up a KDC. Active Directory performs the functions of the KDC. The following figure shows the sequence of events required for a client to gain access to a service using Kerberos authentication. Each step is shown with the Kerberos message associated with it.

Step 1 : The user logs on to the workstation and requests service on the host. The workstation sends a message to the authorisation server requesting a ticket granting ticket (TGT).

Step 2 : The authorisation server verifies the user’s access rights in the user database and creates a TGT and session key. The authorisation server encrypts the results using a key derived from the user’s password and sends a message back to the user workstation.

The workstation prompts the user for a password and uses the password to decrypt the incoming message. When decryption succeeds, the user will be able to use the TGT to request a service ticket.

Step 3 : When the user wants access to a service, the workstation client application sends a request to the Ticket Granting Service containing the client name, realm name and a timestamp. The user proves his identity by sending an authenticator encrypted with the session key received in Step 2.

Step 4 : The TGS decrypts the ticket and authenticator, verifies the request, and creates a ticket for the requested server. The ticket contains the client name and optionally the client IP address. It also contains the realm name and ticket lifespan. The TGS returns the ticket to the user workstation. The returned message contains two copies of a server session key – one encrypted with the client password, and one encrypted by the service password.

Step 5 : The client application now sends a service request to the server containing the ticket received in Step 4 and an authenticator. The service authenticates the request by decrypting the session key. The server verifies that the ticket and authenticator match, and then grants access to the service. This step as described does not include the authorisation performed by the Intel AMT device, as described later.

Step 6 : If mutual authentication is required, then the server will reply with a server authentication message.

The Kerberos server knows “secrets” (encrypted passwords) for all clients and servers under its control, or it is in contact with other secure servers that have this information. These “secrets” are used to encrypt all of the messages shown in the figure above.

To prevent “replay attacks,” Kerberos uses timestamps as part of its protocol definition. For timestamps to work properly, the clocks of the client and the server need to be in sync as much as possible. In other words, both computers need to be set to the same time and date. Since the clocks of two computers are often out of sync, administrators can establish a policy to establish the maximum acceptable difference to Kerberos between a client’s clock and server’s clock. If the difference between a client’s clock and the server’s clock is less than the maximum time difference specified in this policy, any timestamp used in a session between the two computers will be considered authentic. The maximum difference is usually set to five minutes.


Apache Knox provides perimeter security for Hadoop. It enables us to extend Hadoop via rest API’s without needing Kerberos but still maintaining compliance with security policies. We can install, start, stop and configure Knox via Ambari.  

Knox integrates with Apache ranger for service level authentication and sits on top of Kerberos. It provides a single access point for REST and HTTP interactions with the Hadoop cluster.


Apache Atlas provides a metadata store for Hive, Ranger, Sqoop, Storm, Kafka, Falcon. This gives us a single view of metadata across the cluster, which makes searching far more powerful. Additionally, it enables us to map data lineage – for example, we’ll know if a table was derived from another table.

A major strength of Atlas is its ability to assign tags to metastable entries. We can then work with Apache Ranger to define policies based on those tags.

With Atlas, we can implement data classification rules and again, can work with Apache Ranger to implement policies around data of different classifications including