Kerberos & Hadoop: Securing Big Data (part II)

Celeste Duran Big Data Architecture, Technology Leave a Comment

In this second part of the Kerberos series, we’re going to review how to configure our system to get a properly secured environment. The post will provide some important tips about the configuration, but it isn’t a typical command guide. Also, if your are starting with the basics, in our previous post Kerberos & Hadoop: Securing Big Data (part I) we explained why we need to use security in a Hadoop environment, why Kerberos is the best option for that and some key points on how Kerberos works.

Kerberos has friends too

The overall perception is that Kerberos is the key to a secure system, but the best option is to combine Kerberos with LDAP or Active Directory. We should protect the information within our organization and between workgroups. It’s very possible that the Human Resources team doesn’t need to access the same information as the Finance team. By using Kerberos with LDAP or Active Directory we can create user groups and the applications can in turn, use these groups to define roles over them.

Solution Design

First, it’s important to choose correctly the technology that we’ll use for securing the environment: Kerberos + LDAP or Active Directory. For this post we’ll follow the LDAP+Kerberos path, where we’ll install a specific server for authentication that will have the OpenLDAP and MIT Kerberos services running.

Second, we’ll need to design how groups will be generated. For example we can apply some of the following criteria:

  • Read users, Write users and Admin users. It’s the most simple and easy to admin but should only be used if you have few users.
  • User classification based on work departments: Developers, Human Resources, Administrators, etc. For example, the Human Resources group should have read-write (or admin) rights over the payroll data, but read-only rights for everything else. This use is optimal when there are few interdepartmental movements and/or if data is strongly differentiated between departments.
  • Based on the data that you’re going to store: Weather data, driven information data, social network data, etc. Additionally, we would recommend to replicate each of the data groups in order to hold 3 versions of itself, each with one the following restrictions: Read-only, read-write and admin. If we use weather data as an example, we would create 3 groups for it: weather_ro, weather_rw, weather_a. This way, each user can belong to several groups (i.e. weather_ro and social_rw). In our experience, this is the best solution if you’re going to have a big diversity of users and data. Ultimately, this option is the most flexible, but also the most complex to admin because the elevated number of groups.

Another important aspect to be taken into account is whether we’ll be allowing generic users or not. When we use Kerberos, we’re implementing a customized environment, and if all users are going to be physical users, this may generate problems, should the company structure change. Imagine that someone has a scheduled task over a specific dataset (data ingestion of some sort, for example), this person leaves the company and her user is deleted. What happen with the ingestion? If you are going to have a lot of processes over your data, it’s recommended to create generic users to run those: by department, by application, by type of data, …

Configure the LDAP Server

Below, we’ll walk you though the configuration of the LDAP server:

1. Install OpenLDAP:

2. Configure OpenLDAP. Go to the OpenLDAP database directory to make the changes: /etc/openldap/slap.d/

2.1. Choose the LDAP passwd:

2.2. Change generic fields for LDAP configuration in this file: /etc/openldap/slap.d/cn=config/olcDatabase={2}bdb.ldif:

2.3. Review the LDAP accesses in file /etc/openldap/slap.d/cn=config/olcDatabase={1}monitor.ldif

3. Copy the example configuration to the LDAP directory: /var/lib/ldap

4. Activate LDAP. Change the configuration in file /etc/sysconfig/ldap (fields SLAPD_LDAP and SLAPD_LDAPS)

5. Generate your sign certificate on the directories that you configured in the second step.

6. Populate LDAP using the ldapadd comand.

Configure the Kerberos Server

With the LDAP Server already in place we’ll proceed with the installation of the Kerberos server.

1. Install MIT Kerberos packages: krb5-libs, krb5-server, krb5-workstation.

2. Add Kerberos’ schema to the LDAP configuration and add the new ldif (with all dependencies) to LDAP.

3. Modify the LDAP configuration to add bdb Kerberos. Use ldapmodify to change this field:

4. Check if there’s a branch in LDAP to store our kerberos’ principals. If there isn’t, create it using the ladpadd command:

5. Select the admin user to comunicate Kerberos and LDAP. Usually you can use the same admin user that you already created for the LDAP server.

6. Modify your configuration Kerberos files: /etc/krb5.conf and /var/kerberos/krbkdc/kdc.conf

KRB5.CONF

6.1. Select the realm.

6.2. Configure which is the kdc, admin_server and database_module for your realm.

6.3. Configure the your realm’s domains (Dorne, The Vale of Aryn, The Iron Islands…) Ooops, I think we got slightly carried away! Sorry, let’s continue.

6.4. Configure the LDAP module options

KDC.CONF

6.5. Configure listen ports for your Kerberos server

7. Create the Kerberos database:

8. Create the passwd file for Kerberos (the path usually is /var/kerberos/krb5kdc/miFile.stash):

9. Very important step: Change permissions in the LDAP in order to enable user access to the Kerberos branch. Use ldapmodify command to change this field:

Conclusions


When we need to deploy a Hadoop environment we need to design it thinking in the users we have, which data we’re going to store and how this data is going to be used. We also need to think about the data growth rate and whether our company’s users move between departments often or not. Remember to pay attention to generic users in your environment too!!!

When dealing with LDAP and Kerberos configuration: relax, it isn’t easy, but it isn’t impossible.

In the next post we’ll explain how to configure Hadoop to put this Kerberos+LDAP configuration to good use!

Leave a Reply

Your email address will not be published. Required fields are marked *