Securing Hadoop Cluster
part -1
Securing Hadoop Cluster part -1 (SSL/TLS for HDFS and Yarn)
Hadoop in Secure Mode :
Security features of Hadoop consist of authentication, service level authorization , authentication for Web consoles and data confidenciality.
- For client interaction, Authentication, and service level authorization can be achieved by using with Kerberos . The data transferred between hadoop services and clients can be encrypted by setting hadoop.rpc.protection to “privacy” in the core-site.xml .
- Data transfer between Web-consoles and client can be secured by implementing SSL/TLS (HTTPS).
- And finally the data communications between data nodes can be secured using encryption methods. Need to set dfs.encrypt.data.transfer to “true” in the hdfs-site.xml in order to activate data encryption for data transfer protocol of DataNode. If dfs.encrypt.data.transfer is set to true, then it supersedes the setting for dfs.data.transfer.protection and enforces that all connections must use a specialized encrypted SASL handshake.
First we will go through the Steps for securing the Web-consoles and all HTTP communications by implementing SSL.
SSL implementation
Procedure (in this example we will be securing a hadoop Cluster with three host i.e.
nn1.hadoop.com (Name node and Resource Manager)
dn1.hadoop.com (Secondary Name Node, Data Node and Node manager)
dn2.hadoop.com (Data Node and Node manager)
To enable SSL encryption for the web interfaces, you must install an SSL certificate, either a self-signed certificate or a trusted certificate that is signed and issued by a Certificate Authority (CA). A self-signed certificate is signed by the same entity whose identity it certifies, and is signed with its own private key. For more information, see How do I create an SSL Certificate? or How do I create and use my own Certificate Authority (CA)?
Set up SSL for Hadoop HDFS operations by using a self-signed certificate,
1. create ssl certificates, a keystore file, and a truststore file.
We will use “/opt/hadoop/security/CAcert” to keep all the certificates hence create thid directory on all host.
Run the following command on nn1.
# keytool -genkey -alias nn1.hadoop.com -keyalg rsa -keysize 1024 -dname “CN=nn1.hadoop.com,OU=demo,O=MyOrg,L=Pune,ST=MH,C=IN” -keypass host@123 -keystore nn1-keystore.jks -storepass host@123
# keytool -genkey -alias dn1.hadoop.com -keyalg rsa -keysize 1024 -dname “CN=dn1.hadoop.com,OU=demo,O=MyOrg,L=Pune,ST=MH,C=IN” -keypass host@123 -keystore dn1-keystore.jks -storepass host@123
# keytool -genkey -alias dn2.hadoop.com -keyalg rsa -keysize 1024 -dname “CN=dn2.hadoop.com,OU=demo,O=MyOrg,L=Pune,ST=MH,C=IN” -keypass host@123 -keystore dn2-keystore.jks -storepass host@123
Note: Ensure to keep both the password value for ” –keypass” and “storepas” same else will throw error “unable to recover key during service startup.
- For each host, export the certificate public key to a certificate file. – do it on nn1
# keytool -export -alias nn1.hadoop.com -keystore nn1-keystore.jks -rfc -file nn1.crt -storepass host@123
# keytool -export -alias dn1.hadoop.com -keystore dn1-keystore.jks -rfc -file dn1.crt -storepass host@123
# keytool -export -alias dn2.hadoop.com -keystore dn2-keystore.jks -rfc -file dn2.crt -storepass host@123
- For each host, import the certificate to the trust store file. – do it on nn1.
# keytool -import -noprompt -alias nn1.hadoop.com -file nn1.crt -keystore nn1-truststore.jks -storepass host@123
# keytool -import -noprompt -alias dn1.hadoop.com -file dn1.crt -keystore dn1-truststore.jks -storepass host@123
# keytool -import -noprompt -alias dn2.hadoop.com -file dn2.crt -keystore dn2-truststore.jks -storepass host@123
- Create a single trust tore file that contains the public key from all certificates.
# keytool -import -noprompt -alias nn1.hadoop.com -file nn1.crt -keystore truststore.jks -storepass host@123
# keytool -import -noprompt -alias dn1.hadoop.com -file dn1.crt -keystore truststore.jks -storepass host@123
# keytool -import -noprompt -alias dn2.hadoop.com -file dn2.crt -keystore truststore.jks -storepass host@123
- Copy the certificates to the required path and change the ownership and permissions.
cp truststore.jks /opt/security/jks/truststore.jks
cp nn1-keystore.jks /opt/security/CAcerts/host-keystore.jks
cp nn1-truststore.jks /opt/security/CAcerts/host_truststore.jks
you can verify the certificate contents using following command:
#keytool -list -v -keystore /opt/security/CAcerts/host-keystore.jks
# keytool -list -v -keystore /opt/security/CAcerts/host_truststore.jks
- Now copy the respective certificates to each host.
Note: while copying, we are keeping the certificate name same in all nodes (i.e host-keystore.jks , host_truststore.jks) to avoid changing the path and name in each host configuration files. And set owner anf group to hadoop:hadoop and permission to 755 or 750.
# scp dn1-keystore.jks dn1:/opt/security/CAcerts/host-keystore.jks
# scp dn1-truststore.jks dn1:/opt/security/CAcerts/host_truststore.jks
# scp dn2-keystore.jks dn2:/opt/security/CAcerts/host-keystore.jks
# scp dn2-truststore.jks dn2:/opt/security/CAcerts/host_truststore.jks
# scp truststore.jks dn1:/opt/security/jks/truststore.jks
# scp truststore.jks dn2:/opt/security/jks/truststore.jks
2. Update the Hadoop configuration files
- Stop the services if they are already running (stop-dfs.sh , stop-yarn.sh or stop-all.sh)
- Core-site.xml
for all nodes:
<property>
<name>hadoop.rpc.protection</name>
<value>privacy</value>
</property>
- For hdfs-site.xml:
for all nodes:
<property>
<name>dfs.namenode.secondary.https-address</name>
<value>192.168.1.30:50091</value>
</property>
<property>
<name>dfs.namenode.https-address</name>
<value>nn1.hadoop.com:50470</value>
</property>
<property>
<name>dfs.webhdfs.enabled</name>
<value>true</value>
</property>
<property>
<name>dfs.https.enable</name>
<value>true</value>
</property>
<property>
<name>dfs.http.policy</name>
<value>HTTPS_ONLY</value>
</property>
<property>
<name>dfs.encrypt.data.transfer</name>
<value>true</value>
</name>
<!– for data node, change the datanode hostname on each datanode –>
<property>
<name>dfs.datanode.https.address</name>
<value>dn2.hadoop.com:50475</value>
</property>
- For yarn site.xml
<property>
<name>yarn.http.policy</name>
<value>HTTPS_ONLY</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.https.address</name>
<value>nn1.hadoop.com:8089</value>
</property>
<property>
<name>yarn.log.server.url</name>
<value>https://nn1.hadoop.com:19889/jobhistory/logs</value>
</property>
<!–chnage the below on each namenode–>
<property>
<name>yarn.nodemanager.webapp.https.address</name>
<value>0.0.0.0:8090</value>
</property>
- For mapred-site.xml
<property>
<name>ma0uce.jobhistory.http.policy</name>
<value>HTTPS_ONLY</value>
</property>
<property>
<name>mapreduce.jobhistory.webapp.https.address</name>
<value>nn1.hadoop.com:19889</value>
</property>
- No we need add/update two more configuration files (i.e. ssl-server.xml and ssl-client.xml) at the same path where core-site.xml, hdfs-site.xml are present.
Set the following in ssl-server.xml
<?xml version=”1.0″ encoding=”UTF-8″?>
<?xml-stylesheet type=”text/xsl” href=”configuration.xsl”?>
<configuration>
<property>
<name>ssl.server.truststore.location</name>
<value>/opt/security/CAcerts/host_truststore.jks</value>
<description>Truststore to be used by NN and DN. Must be specified.
</description>
</property>
<property>
<name>ssl.server.truststore.password</name>
<value>host@123</value>
<description>Optional. Default value is “”.
</description>
</property>
<property>
<name>ssl.server.truststore.type</name>
<value>jks</value>
<description>Optional. The keystore file format, default value is “jks”.
</description>
</property>
<property>
<name>ssl.server.truststore.reload.interval</name>
<value>10000</value>
<description>Truststore reload check interval, in milliseconds.
Default value is 10000 (10 seconds).
</description>
</property>
<property>
<name>ssl.server.keystore.location</name>
<value>/opt/security/CAcerts/host-keystore.jks</value>
<description>Keystore to be used by NN and DN. Must be specified.
</description>
</property>
<property>
<name>ssl.server.keystore.password</name>
<value>host@123</value>
<description>Must be specified.
</description>
</property>
<property>
<name>ssl.server.keystore.keypassword</name>
<value>host@123</value>
<description>Must be specified.
</description>
</property>
<property>
<name>ssl.server.keystore.type</name>
<value>jks</value>
<description>Optional. The keystore file format, default value is “jks”.
</description>
</property>
</configuration>
=================
- Set the following in ssl-client.xml
<?xml version=”1.0″ encoding=”UTF-8″?>
<?xml-stylesheet type=”text/xsl” href=”configuration.xsl”?>
<configuration>
<property>
<name>ssl.client.truststore.location</name>
<value>/opt/security/jks/truststore.jks</value>
<description>Truststore to be used by clients like distcp. Must be
specified.
</description>
</property>
<property>
<name>ssl.client.truststore.password</name>
<value>host@123</value>
<description>Optional. Default value is “”.
</description>
</property>
<property>
<name>ssl.client.truststore.type</name>
<value>jks</value>
<description>Optional. The keystore file format, default value is “jks”.
</description>
</property>
</configuration>
- Once the config file has been updated on all host start the services. And verify the Name node URI now respond over https://nn1.hadoop.com:50470/ and similarly Resource manager URI over https://nn1.hadoop.com:8089 . If you select the nodes section in resource manager and select any node, you will be redirected to hhtps web-console , e.g. in our case it will be https://dn1.hadoop.com:8090/node
With this we have completed the SSL setup for HDFS and Yarn, similarly it can be implemented for other services as well ( impala, Hbase, Hue etc..)
****************************************************************************************
THANK YOU
Leave a Reply