Hadoop2 HA(High Availability) 구성 및 환경변수 설정
VMWare 설치 및 서버 기본 설정
- VMWare에 CentOS 7 버전의 가상 서버를 3대 생성
- Server01 IP: 192.168.111.128
- Server02 IP: 192.168.111.129
- Server03 IP: 192.168.111.130
- 각 서버의 hostname 및 hosts설정
- sudo vi /etc/hosts
192.168.111.128 Server01 192.168.111.129 Server02 192.168.111.130 Server03
- sudo vi /etc/hostname : 각 Server 번호에 맞는 이름 부여
- sudo vi /etc/hosts
- 최소 버전으로 설치했다면 기본 툴 설치
- yum install net-tools 이후, ifconfig 통해 ip 설정 잘 되었는지 확인
- yum install java 이후, java -version으로 java 버전 확인
- yum install java-devel 이후, javac -version으로 jave 컴파일러 버전 확인
- yum groupinstall 'Development Tools'
- yum groupinstall 'Additional Development'
- root 권한으로 Protocol buffer 설치
su
cd /usr/local
wget https://github.com/google/protobuf/releases/download/v2.5.0/protobuf-2.5.0.tar.gz
tar xvfz protobuf-2.5.0.tar.gz
cd protobuf-2.5.0
./configure
make
makeinstall
protoc --version 으로, protocol buffer 잘 설치되었는지 확인
- ssh 설정 통해 Host server와 Remote server의 Password-less Connection 환경 구축
- Server01에서 ssh-keygen -t rsa로 ssh key 생성
ssh-copy-id /home/hadoop/.ssh/id_rsa.pub hadoop@Server02
ssh-copy-id /home/hadoop/.ssh/id_rsa.pub hadoop@Server03
Hadoop2 설치 및 기본 환경변수 설정
- Hadoop2 파일 다운로드
- http://www.apache.org/dyn/closer.cgi/hadoop/common/hadoop-2.7.6/hadoop-2.7.6.tar.gz 에 접속
- 미러 사이트 목록 중 사이트의 링크 복사
- wget 복사한 링크 로 Hadoop2.x.x 다운로드
- tar xvfz hadoop-2.x.x.tar.gz
- 기본 환경변수 설정
cd ~
vi .bash_profile
#hadoop
export HADOOP_HOME=/home/hadoop/hadoop-2.7.6
export HADOOP_INSTALL=$HADOOP_HOME
export HADOOP_MAPRED_HOME=$HADOOP_HOME
export HADOOP_COMMON_HOME=$HADOOP_HOME
export HADOOP_HDFS_HOME=$HADOOP_HOME
export YARN_HOME=$HADOOP_HOME
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export HADOOP_OPTS="-Djava.library.path=$HADOOP_COMMON_LIB_NATIVE_DIR"
export HADOOP_PID_DIR=${HADOOP_HOME}/pids
export PATH=$PATH:$HADOOP_HOME/sbin:$HADOOP_HOME/bin:
export CLASSPATH=$CLASSPATH:$HADOOP_HOME/lib/*
#java
export JAVA_HOME=/usr/lib/jvm/java/
export PATH=$PATH:$JAVA_HOME/bin
export CLASSPATH=.:$JAVA_HOME/jre/lib:$JAVA_HOME/bin:$JAVA_HOME/lib/tools.jar
이후 source .bash_profile 통해 환경변수 설정 적용
- Hadoop2 환경설정
cd hadoop-2.x.x/etc/hadoop
vi hadoop-env.sh
export JAVA_HOME=/usr/lib/jvm/java/
export HADOOP_PID_DIR=/home/hadoop/hadoop-2.7.6/pids
vi slaves
slaves = Server01, Server02, Server03
vi core-site.xml
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://hadoop-cluster</value>
</property>
<property>
<name>ha.zookeeper.quorum</name>
<value>Server01:2181,Server02:2181,Server03:2181</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/home/hadoop/tmp/hadoop-${user.name}</value>
</property>
</configuration>
vi hdfs-site.xml
<configuration>
<property>
<name>dfs.replication</name>
<value>3</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>/home/hadoop/data/dfs/namenode</value>
</property>
<property>
<name>dfs.namenode.checkpoint.dir</name>
<value>/home/hadoop/data/dfs/namesecondary</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>/home/hadoop/data/dfs/datanode</value>
</property>
<property>
<name>dfs.journalnode.edits.dir</name>
<value>/home/hadoop/data/dfs/journalnode</value>
</property>
<property>
<name>dfs.nameservices</name>
<value>hadoop-cluster</value>
</property>
<property>
<name>dfs.ha.namenodes.hadoop-cluster</name>
<value>nn1,nn2</value>
</property>
<property>
<name>dfs.namenode.rpc-address.hadoop-cluster.nn1</name>
<value>Server01:8020</value>
</property>
<property>
<name>dfs.namenode.rpc-address.hadoop-cluster.nn2</name>
<value>Server02:8020</value>
</property>
<property>
<name>dfs.namenode.http-address.hadoop-cluster.nn1</name>
<value>Server01:50070</value>
</property>
<property>
<name>dfs.namenode.http-address.hadoop-cluster.nn2</name>
<value>Server02:50070</value>
</property>
<property>
<name>dfs.namenode.shared.edits.dir</name>
<value>qjournal://Server01:8485;Server02:8485;Server03:8485/hadoop-cluster</value>
</property>
<property>
<name>dfs.client.failover.proxy.provider.hadoop-cluster</name>
<value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
</property>
<property>
<name>dfs.ha.fencing.methods</name>
<value>sshfence
shell(/bin/true)
</value>
</property>
<property>
<name>dfs.ha.fencing.ssh.private-key-files</name>
<value>/home/hadoop/.ssh/id_rsa</value>
</property>
<property>
<name>dfs.ha.automatic-failover.enabled</name>
<value>true</value>
</property>
</configuration>
vi mapred-site.xml
<configuration>
<property>
<name>mapreduce.job.ubertask.enable</name>
<value>true</value>
</property>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
vi yarn-site.xml
<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce_shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
<property>
<name>yarn.nodemanager.local-dirs</name>
<value>/home/hadoop/data/yarn/nm-local-dir</value>
</property>
<property>
<name>yarn.resourcemanager.fs.state-store.uri</name>
<value>/home/hadoop/data/yarn/system/rmstore</value>
</property>
<property>
<name>yarn.resourcemanager.hostname</name>
<value>Server01</value>
</property>
<property>
<name>yarn.web-proxy.address</name>
<value>0.0.0.0:8089</value>
</property>
</configuration>
- ZooKeeper 다운로드 및 적용
- ZooKeeper 다운로드 wget https://apache.org/dist/zookeeper/zookeeper-3.4.10/zookeeper-3.4.10.tar.gz
- 설정 이후, 세 서버 모두에서 zkServer.sh start로 ZooKeeper 서버 실행
tar xvfz zookeeper-3.4.10
cd zookeeper-3.4.10
mkdir data
vi data/vi myid
cf. Server01에는 1 Server02 2, Server03에는 3의 myid를 부여해야
cd conf/
cp zoo-sample.cfg zoo.cfg
vi zoo.cfg
dataDir=/home/hadoop/zookeeper-3.4.10/data
최하단에 아래 설정 추가
server.1=Server01:2888:3888
server.2=Server02:2888:3888
server.3=Server03:2888:3888
cf) zkServer.sh status로 leader와 follower 선출 상태 볼 수 있음
- HA 환경 구동
- Server01에서 ZooKeeper 초기화: hdfs zkfc -formatZK
- 모든 서버에서 journalnode 시작: hadoop-daemon.sh start journalnode
- Server01에서 Namenode 초기화: hdfs namenode -format
- Server01에서 Namenode 실행: hadoop-daemon.sh start namenode
- Server01에서 FailOverController 실행: hadoop-daemon.sh start zkfc
- 모든 서버에서 Datanode 실행: hadoop-daemon.sh start datanode
- Standby 모드를 설정할 2번 서버에 hdfs namenode -bootstrapStandby
- Server02에서 Namenode 실행: hadoop-daemon.sh start namenode
- Server02에서 FailOverController 실행: hadoop-daemon.sh start zkfc
- Server01에서 start-yarn.sh 통해 Yarn을 위한 Resoucre Manager와 Node Manager들 구동
- Server01에서 history server 실행: mr-jobhistory-daemon.sh start historyserver
- jps로 세 서버의 jvm process 확인
@Server01
NameNode
ResourceManager
DataNode
NodeManager
JournalNode
JobHistoryServer
DFSZKFailoverController
@Server02
NameNode
DataNode
NodeManager
JournalNode
DFSZKFailoverController
@Server03
DataNode
NodeManager
JournalNode
- MapReduce의 wordcout Examples 수행
hdfs dfs -mkdir /user
hdfs dfs -mkdir /user/hadoop
hdfs dfs -mkdir /user/hadoop/input
vi test.txt
*********************************************
I am a girl
I am a boy
I am a student
*********************************************
hdfs dfs -put test.txt /user/hadoop/input/
yarn jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.6.jar wordcount input output
hdfs dfs -cat /user/hadoop/output/part-r-00000 통해 결과 확인
<결과>
'Software Convergence > Hadoop' 카테고리의 다른 글
Apache Sqoop 소개 (0) | 2018.07.20 |
---|---|
Apache Kafka 소개 (0) | 2018.07.20 |
ZooKeeper configuration 설정 (0) | 2018.07.19 |
네임노드가 실행되지 않는 문제의 해결 (0) | 2018.07.06 |