Hadoop2 HA(High Availability) 구성 및 환경변수 설정


VMWare 설치 및 서버 기본 설정

  1. VMWare에 CentOS 7 버전의 가상 서버를 3대 생성
    • Server01 IP: 192.168.111.128
    • Server02 IP: 192.168.111.129
    • Server03 IP: 192.168.111.130

  1. 각 서버의 hostname 및 hosts설정
    • sudo vi /etc/hosts
      192.168.111.128   Server01
      192.168.111.129   Server02
      192.168.111.130   Server03
      
    • sudo vi /etc/hostname : 각 Server 번호에 맞는 이름 부여

  1. 최소 버전으로 설치했다면 기본 툴 설치
    • yum install net-tools 이후, ifconfig 통해 ip 설정 잘 되었는지 확인
    • yum install java 이후, java -version으로 java 버전 확인
    • yum install java-devel 이후, javac -version으로 jave 컴파일러 버전 확인
    • yum groupinstall 'Development Tools'
    • yum groupinstall 'Additional Development'

  1. root 권한으로 Protocol buffer 설치
   su
   cd /usr/local
   wget https://github.com/google/protobuf/releases/download/v2.5.0/protobuf-2.5.0.tar.gz
   tar xvfz protobuf-2.5.0.tar.gz
   cd protobuf-2.5.0
   ./configure
   make
   makeinstall
   
   protoc --version 으로, protocol buffer 잘 설치되었는지 확인 

  1. ssh 설정 통해 Host server와 Remote server의 Password-less Connection 환경 구축
    • Server01에서 ssh-keygen -t rsa로 ssh key 생성
ssh-copy-id /home/hadoop/.ssh/id_rsa.pub hadoop@Server02
ssh-copy-id /home/hadoop/.ssh/id_rsa.pub hadoop@Server03

Hadoop2 설치 및 기본 환경변수 설정

  1. Hadoop2 파일 다운로드

  1. 기본 환경변수 설정
cd ~
vi .bash_profile

#hadoop
export HADOOP_HOME=/home/hadoop/hadoop-2.7.6
export HADOOP_INSTALL=$HADOOP_HOME
export HADOOP_MAPRED_HOME=$HADOOP_HOME
export HADOOP_COMMON_HOME=$HADOOP_HOME
export HADOOP_HDFS_HOME=$HADOOP_HOME
export YARN_HOME=$HADOOP_HOME
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export HADOOP_OPTS="-Djava.library.path=$HADOOP_COMMON_LIB_NATIVE_DIR"
export HADOOP_PID_DIR=${HADOOP_HOME}/pids
export PATH=$PATH:$HADOOP_HOME/sbin:$HADOOP_HOME/bin:
export CLASSPATH=$CLASSPATH:$HADOOP_HOME/lib/*

#java
export JAVA_HOME=/usr/lib/jvm/java/
export PATH=$PATH:$JAVA_HOME/bin
export CLASSPATH=.:$JAVA_HOME/jre/lib:$JAVA_HOME/bin:$JAVA_HOME/lib/tools.jar

이후 source .bash_profile 통해 환경변수 설정 적용

  1. Hadoop2 환경설정
cd hadoop-2.x.x/etc/hadoop

vi hadoop-env.sh
export JAVA_HOME=/usr/lib/jvm/java/
export HADOOP_PID_DIR=/home/hadoop/hadoop-2.7.6/pids

vi slaves
slaves = Server01, Server02, Server03

vi core-site.xml
<configuration>
        <property>
                <name>fs.defaultFS</name>
                <value>hdfs://hadoop-cluster</value>
        </property>
        <property>
                <name>ha.zookeeper.quorum</name>
                <value>Server01:2181,Server02:2181,Server03:2181</value>
        </property>
         <property>
            <name>hadoop.tmp.dir</name>
            <value>/home/hadoop/tmp/hadoop-${user.name}</value>
        </property>
</configuration>

vi hdfs-site.xml
<configuration>
        <property>
                <name>dfs.replication</name>
                <value>3</value>
        </property>
        <property>
                <name>dfs.namenode.name.dir</name>
                <value>/home/hadoop/data/dfs/namenode</value>
        </property>
        <property>
                <name>dfs.namenode.checkpoint.dir</name>
                <value>/home/hadoop/data/dfs/namesecondary</value>
        </property>
        <property>
                <name>dfs.datanode.data.dir</name>
                <value>/home/hadoop/data/dfs/datanode</value>
        </property>
        <property>
            <name>dfs.journalnode.edits.dir</name>
            <value>/home/hadoop/data/dfs/journalnode</value>
        </property>
        <property>
            <name>dfs.nameservices</name>
            <value>hadoop-cluster</value>
        </property>
        <property>
            <name>dfs.ha.namenodes.hadoop-cluster</name>
            <value>nn1,nn2</value>
        </property>
        <property>
            <name>dfs.namenode.rpc-address.hadoop-cluster.nn1</name>
            <value>Server01:8020</value>
        </property>
        <property>
            <name>dfs.namenode.rpc-address.hadoop-cluster.nn2</name>
            <value>Server02:8020</value>
        </property>
        <property>
            <name>dfs.namenode.http-address.hadoop-cluster.nn1</name>
            <value>Server01:50070</value>
        </property>
        <property>
            <name>dfs.namenode.http-address.hadoop-cluster.nn2</name>
            <value>Server02:50070</value>
        </property>
        <property>
            <name>dfs.namenode.shared.edits.dir</name>
            <value>qjournal://Server01:8485;Server02:8485;Server03:8485/hadoop-cluster</value>
        </property>
        <property>
            <name>dfs.client.failover.proxy.provider.hadoop-cluster</name>
            <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
        </property>
        <property>
            <name>dfs.ha.fencing.methods</name>
            <value>sshfence
                shell(/bin/true)
            </value>
        </property>
        <property>
            <name>dfs.ha.fencing.ssh.private-key-files</name>
            <value>/home/hadoop/.ssh/id_rsa</value>
        </property>
        <property>
            <name>dfs.ha.automatic-failover.enabled</name>
            <value>true</value>
        </property>
</configuration>

vi mapred-site.xml
<configuration>
    <property>
        <name>mapreduce.job.ubertask.enable</name>
        <value>true</value>
    </property>
    <property>
        <name>mapreduce.framework.name</name>
        <value>yarn</value>
    </property>
</configuration>

vi yarn-site.xml
<configuration>
    <property>
        <name>yarn.nodemanager.aux-services</name>
        <value>mapreduce_shuffle</value>
    </property>
    <property>
        <name>yarn.nodemanager.aux-services.mapreduce_shuffle.class</name>
        <value>org.apache.hadoop.mapred.ShuffleHandler</value>
    </property>
    <property>
        <name>yarn.nodemanager.local-dirs</name>
        <value>/home/hadoop/data/yarn/nm-local-dir</value>
    </property>
    <property>
        <name>yarn.resourcemanager.fs.state-store.uri</name>
        <value>/home/hadoop/data/yarn/system/rmstore</value>
    </property>
    <property>
        <name>yarn.resourcemanager.hostname</name>
        <value>Server01</value>
    </property>
    <property>
        <name>yarn.web-proxy.address</name>
        <value>0.0.0.0:8089</value>
    </property>
</configuration>

  1. ZooKeeper 다운로드 및 적용
tar xvfz zookeeper-3.4.10
cd zookeeper-3.4.10
mkdir data
vi data/vi myid

cf. Server01에는 1 Server02 2, Server03에는 3의 myid를 부여해야 

cd conf/
cp zoo-sample.cfg zoo.cfg
vi zoo.cfg
dataDir=/home/hadoop/zookeeper-3.4.10/data

최하단에 아래 설정 추가
server.1=Server01:2888:3888
server.2=Server02:2888:3888
server.3=Server03:2888:3888


cf) zkServer.sh status로 leader와 follower 선출 상태 볼 수 있음


  1. HA 환경 구동
    • Server01에서 ZooKeeper 초기화: hdfs zkfc -formatZK
    • 모든 서버에서 journalnode 시작: hadoop-daemon.sh start journalnode
    • Server01에서 Namenode 초기화: hdfs namenode -format
    • Server01에서 Namenode 실행: hadoop-daemon.sh start namenode
    • Server01에서 FailOverController 실행: hadoop-daemon.sh start zkfc
    • 모든 서버에서 Datanode 실행: hadoop-daemon.sh start datanode
    • Standby 모드를 설정할 2번 서버에 hdfs namenode -bootstrapStandby
    • Server02에서 Namenode 실행: hadoop-daemon.sh start namenode
    • Server02에서 FailOverController 실행: hadoop-daemon.sh start zkfc
    • Server01에서 start-yarn.sh 통해 Yarn을 위한 Resoucre Manager와 Node Manager들 구동
    • Server01에서 history server 실행: mr-jobhistory-daemon.sh start historyserver
    • jps로 세 서버의 jvm process 확인

@Server01
NameNode
ResourceManager
DataNode
NodeManager
JournalNode
JobHistoryServer
DFSZKFailoverController

@Server02
NameNode
DataNode
NodeManager
JournalNode
DFSZKFailoverController

@Server03
DataNode
NodeManager
JournalNode

  1. MapReduce의 wordcout Examples 수행
hdfs dfs -mkdir /user
hdfs dfs -mkdir /user/hadoop
hdfs dfs -mkdir /user/hadoop/input

vi test.txt
*********************************************
I am a girl
I am a boy
I am a student
*********************************************

hdfs dfs -put test.txt /user/hadoop/input/

yarn jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.6.jar wordcount input output

hdfs dfs -cat /user/hadoop/output/part-r-00000 통해 결과 확인

<결과>


'Software Convergence > Hadoop' 카테고리의 다른 글

Apache Sqoop 소개  (0) 2018.07.20
Apache Kafka 소개  (0) 2018.07.20
ZooKeeper configuration 설정  (0) 2018.07.19
네임노드가 실행되지 않는 문제의 해결  (0) 2018.07.06

+ Recent posts