Category: Big data
-
AWS CloudFormation Template for creating EMR Cluster with Autoscaling, Cloudwatch metrics and Lambda
—
There are various ways we can spin up an EMR cluster such as Manual approach: AWS console, using CLI command, e.g. as simple as : aws emr create-cluster —name “Test Spark cluster” \ –release-label emr-5.7.0 —applications Name=Spark —ec2-attributes KeyName=test1_Ec2_keypair – – instance-type m4.xlarge —instance-count 3 —use-default-roles with Advanced options: aws emr create-cluster –release-label emr-5.7.0 –name…
-
Oozie ssh action on EMR cluster
Oozie ssh action on EMR cluster Prerequisites for Oozie ssh action on EMR cluster: Please note that in case of Oozie ssh action, Oozie tries to ssh into remote host using oozie user. Hence we need to first ensure that we are able to ssh into remote host from Oozie server using oozie user. We also…
-
Unable to import graphframes with pyspark
—
Unable to import graphframes with pyspark You might hit into below error message while trying to import graphframe module into your pyspark session in an EMR cluster. >> print(spark.version) 2.1.0 >>> from graphframes import* Traceback (most recent call last): File “<stdin>”, line 1, in <module> ImportError: No module named graphframes >>> it will need…
-
Zookeeper Setup
Zookeeper Setup Required Software Zookeeper runs in Java release 1.6 or greater (JDK 6 or greater) hence please download and install JDK first. Zookeeper runs as an ensemble of Zookeeper servers, which should be of odd numbers, as zookeeper requires a majority. For example, with four machines ZooKeeper can only handle the failure of a single machine;…
-
Securing Hadoop Cluster part-2 KERBEROS SETUP
Securing Hadoop Cluster part-2 KERBEROS SETUP Contents Kerberos: 1 Kerberos Installation and setup: 2 Kerberos KDC server setup. 2 Kerberos Client Setup: 8 Create service principal and keytabs for Hadoop Services. 8 Update the configuration files for each Hadoop service. 10 Kerberos: –a secured netowrk authentication system developed by MIT in mid 1990.…
-
Securing Hadoop Cluster part -1 (SSL/TLS for HDFS and Yarn)
Securing Hadoop Cluster part -1 Securing Hadoop Cluster part -1 (SSL/TLS for HDFS and Yarn) Hadoop in Secure Mode : Security features of Hadoop consist of authentication, service level authorization , authentication for Web consoles and data confidenciality. For client interaction, Authentication, and service level authorization can be achieved by using with Kerberos . The data transferred between hadoop…