Breaking

Friday, November 21, 2014

Hopes for Hadoop security ride on Apache Ranger

Enterprise interest in Hadoop has been tempered by security worries.

 Hadoop elephant code

Hadoop has begun a slow and inexorable move from prototypes and experimental projects to core enterprise data management. As the ecosystem of Hadoop projects has matured, new capabilities have filtered into Hadoop to grant it enterprise-ready status.

Mostly notably, Hadoop is finally getting serious capabilities to provide the level of security expected from a platform entrusted with the crown jewels of a firm’s data.
In May 2014, Horton works acquired XA Secure systems and began the process of open-sourcing its Hadoop security management system. The source code from the XA Secure platform has been donated to the Apache Software Foundation, where it forms the basis of the Apache Ranger project. Ranger (known as Apache Argus for a brief time) is now part of the Apache Incubator and recently celebrated its first release as an Incubator project.

Ranger provides a centralized, comprehensive platform for managing authorization, access control, auditing, administration, and data protection for data stored in Hadoop. Ranger hooks into HDFS, WebHDFS, Hive, HBase, Knox, and Storm, and it offers a central authorization provider that each of those projects can use to validate data requests. Ranger also provides a comprehensive audit log for viewing requests and their status, as well as a centralized, Web-based administration console for configuring access rights. As you'd expect for any enterprise-grade security solution, Ranger supports syncing user/group information from LDAP/Active Directory.

To accomplish this functionality, Ranger incorporates a stand-alone Daemon module, which is responsible for syncing with LDAP/AD, and distributing policies to nodes in the cluster. A lightweight agent runs embedded in the individual Hadoop components that need data protection (Hive, HBase, and so on) and uses the security hooks built into those components.

This agent also runs as part of the NameNode to provide access control for HDFS and gathers request details, which are stored in the audit log. Policy enforcement is performed at the level of local nodes, which means Ranger has no significant performance impact at runtime.

Ranger furthers deep integration with Hadoop by working with standard SQL authorization in Hive 1.3, allowing the use of SQL grant/revoke functionality at the table level. In addition, Ranger provides a mode that can validate access within HDFS by using Hadoop file-level permissions. Along with the Web-based administration console, Ranger supports a REST API for policy administration.

Ranger interoperates with existing IdM and SSO solutions, including Siteminder, Tivoli Access Manager, Oracle Access Management Suite, and solutions based on SAML. With Knox integration built in, Ranger can be used to protect any REST endpoint that provides perimeter access to data stored in the Hadoop cluster. Ranger even works to secure ODBC/JDBC connections using HiveServer2, so long as the Thrift gateway is configured to use HTTP for those calls.

Long term, Ranger has stated goals to cover the following aspects of Hadoop security:
  • Centralized security administration to manage all security related tasks
  • Fine-grained authorization for specific actions and/or operations with a Hadoop component/tool, managed through a central administration tool
  • Standardized authorization method across all Hadoop components
  • Enhanced support for different authorization methods, including role-based access control, attribute-based access control, and so on
  • Enablement of tag-based global policies
  • Centralized auditing of all user access and administrative actions related to security for all components of Hadoop.
Despite the substantial functionality in Ranger today, open questions persist about how it will fit into the larger Hadoop security ecosystem. For example, some Ranger goals overlap with those of Apache Sentry, and there seems to be little consensus to date about how the projects may synchronize their efforts. Also, because most Hadoop subcomponents are developed as separate projects (usually within the ASF), with different groups of committers and different PMCs, it's unclear whether all Hadoop projects will actively choose to use Ranger.

To increase its odds, the Ranger team must cultivate buy-in and adoption from the teams building most of the other Hadoop components. Fortunately, with Horton works, Ranger has the backing of a Hadoop vendor well positioned to support and cultivate the project into the future. Horton works recently filed for a pending IPO and has made Ranger a core element of its HDP 2.2 release.

Already, if you're evaluating Hadoop for any deployment involving sensitive data, Ranger is likely to play a significant role in protecting that data. Assuming that the proposed integrations with Falcon, Accumulo, and other Hadoop tools unfold as planned, Ranger may well evolve into the de facto Hadoop standard for centralized, comprehensive security management.

No comments:

Post a Comment