VM setup for Hadoop Cluster (centos)

VM setup for Hadoop Cluster (centos)

Before you get hands on Hadoop ecosystem, be it for Hadoop Admin or Development exposure you will need your own small cluster setup which will help you understand how hadoop internally works. In order to have a distributed Hadoop cluster setup we need multiple hosts/servers and for obvious reason all of us can not have multiple servers at home,  however good thing is that now we can use  virtual machines even with Laptops having 8-16 GB of RAM and some GB of Hard disk space.

In this document I will walk you through the VM instance setup and all the pre-requisites we need to perform before starting with Hadoop cluster setup activity.

 

Table of Contents

Network Setup: 1

Some Pre-requisites for Hadoop Setup: 3

User creation and password less SSH setup. 4

SSH Setup. 4

 

First create 2-4 VM instances (depending on number of nodes you want to have in your cluster)

I have created 4 VM instances with CentOS image, and have given 3 GB RAM  to my master nodes  (where name node and Resource Manager will be running) and rest 3 instances have 2 GB Ram each (where datanode and node manager service will run)

Once your basic VM instance is ready  please go through the following steps.

 

Network Setup:

Step.1 : Disabled the firewall, and stop NetworkManager (the network bridge does not like it) and use the network service instead

Commands to run (as root user or with Sudo Privileges*):

# service iptables save
# service iptables stop
# service NetworkManager stop
# chkconfig NetworkManager off
# service ip6tables stop
# chkconfig iptables off
# chkconfig ip6tables off
# service iptables status

————————————————————————————————–

*Note:  In order to grant sudo privileges to a user (e.g. hadoop user) please run “# visudo” as root user and below entry.

hadoop  ALL=(ALL)       NOPASSWD: ALL

Step.2: Update the IP address and make it static.

to do that edit the file “/etc/sysconfig/network-scripts/ifcfg-eth0” and make following highlighted changes.

# vi /etc/sysconfig/network-scripts/ifcfg-eth0

DEVICE=”eth0″
BOOTPROTO=static
NM_CONTROLLED=”yes”
ONBOOT=”yes”
IPADDR=192.168.1.20
GATEWAY=192.168.1.1
DNS1=192.168.1.20
DNS2=4.2.2.2
TYPE=Ethernet
Note: change the IP address, Gateway and DNS1 as per your Host network settings and internet connation type you are using.

 

Step. 3:  update the host name of your VM.

to do that edit “/etc/sysconfig/network” file and replace “localhost.localdomain” with the desired host name as shown below.

# vi /etc/sysconfig/network
NETWORKING=yes
HOSTNAME=nn1.hadoop.com
GATEWAY=192.168.1.1

And run following command as well

# hostname  nn1.hadoop.com

Ste4. Now run the following command to enable network service and  start it.

# chkconfig network on
# service network start

It should return following output

# sudo service network start
Bringing up loopback interface:                            [  OK  ]
Bringing up interface eth0:                                [  OK  ]

 

Note: error might come during network service start if the VM image is a copy of existing VM. In that case  remove HAWDDR and MACaddr  entry from “/etc/sysconfig/network-scripts/ifcfg-eth0” file and

Delete file  “/etc/udev/rules.d/70-persistent-net.rules”

Then restart VM and then restart network service.

Once Network service has been started successfully :

Run following command to verify the IP address has been set correctly

# ip addr

It should return output with entry for inet with ipaddress we have set in ifcfg-eth0 file in step 2.

Also verify the host name is set correctly by running following command.

# hostname   (it should return correct hostname)

If IP address or host name is not set correctly please verify Step.1 till Step.4 .

Once all the above steps has been done for all the VM instances update the “/etc/hosts” file and make entry for each host.

#vi  /etc/hosts
127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4
::1         localhost localhost.localdomain localhost6 localhost6.localdomain6
192.168.1.20  nn1.hadoop.com nn1
192.168.1.30 dn1.hadoop.com dn1
192.168.1.31 dn2.hadoop.com dn2
[hadoop@nn1 ~]$

Now copy this file to all hosts using scp or edit the file on all host and make the entry for each host.

scp command to copy:
scp  /etc/hosts dn1.hadoop.cpm:/etc/hosts

It will prompt for password. Please enter the password.

Some Pre-requisites for Hadoop Setup:

Step 4:  Disable selinux.

Run following command to disable selinux
# setenforce 0   (this will disable only for current session)

To disbale it permamnently edit  “/etc/selinux/config” file
And replace the word “enforcing” with “disabled”

# vi etc/selinux/config
SELINUX=disabled

To check if selinux is disabled run following command.
# getenforce

 

Step.5 :  Reduce vm.swappiness to 0 or 10 from default 60.

Reduce vm.swappiness to 0 or 10 from default 60 by running following command

# sysctl -w vm.swappiness=10

 

User creation and password less SSH setup

 

Step.6:  creating user and group

(e.g. we will be using hadoop user and hadoop group for our initial setup)

Run the all the  following commands as root user or with sudo.

To create group “hadoop”:
# groupadd hadoop

To create user hadoop and map it to group hadoop.
# useradd hadoop -g hadoop

Or

# adduser hadoop (this will only create the user)
#  gpasswd -a hadoop hadoop  (add user to hadoop group)

Change password for user:
# passwd hadoop             (where hadoop is user id)

 

SSH Setup

In order to setup password less SSH, please perform following step.

In the below section we will setup password less SSH for hadoop user from host nn1 to dn1

  1. First login to host nn1 as hadoop user and run following command and just hit enter for all three prompts to enter file path (will keep it default)  passphrase (empty)  and confirmed passphrase (empty).
  2. # ssh-keygen

Generating public/private rsa key pair.
Enter file in which to save the key (/home/hadoop/.ssh/id_rsa):
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
Your identification has been saved in /home/hadoop/.ssh/id_rsa.
Your public key has been saved in /home/hadoop/.ssh/id_rsa.pub.
The key fingerprint is:
6e:d4:c4:32:14:5d:c1:35:9d:f5:12:b4:17:3d:99:89
hadoop@nn1.hadoop.com
The key’s randomart image is:
+–[ RSA 2048]—-+
|        oo oo+*oO|
|       . .. .E X=|
|        o o   o +|
|         =     o |
|        S .      |
|       o         |
|        o        |
|       .         |
|                 |
+—————–+
[hadoop@nn1 hadoop]$

Now  we need to copy the content of id-rsa.pub file to  authorized_keys file of each host (to which we want to connect via ssh without being prompted for password)

Command:

# ssh-copy-id –i .ssh/id_rsa.pub dn1.hadoop.com

it will prompt you to enter password, please provide the password.
Once the above command executed successfully verify ssh setup is  working

# ssh dn1.hadoop.com

You should be routed to dn1.hadoop.com without being prompted for password. To go back to nn1.hadoop.com just type “exit”.

 

No we are done with the VM setup.  The next document will help you to setup a small Core Apache Hadoop2 distributed cluster.

 

Thank you


Comments

Leave a Reply

Your email address will not be published. Required fields are marked *