Develop/SERVER SIDE
하둡 클러스터 환경 및 R 분석 서버 구축
Rurony
2015. 12. 8. 20:07
※ 개발 서버 구성
/etc/hosts
192.168.0.101 namenode
192.168.0.102 datanode01
192.168.0.103 datanode02
192.168.0.104 datanode03
각 서버에 호스트 등록※ 하둡 환경 구축
- 환경 변수 등록 -> (each server)
~/.bash_profile
# User specific environment and startup programs
############################################################
### Java
############################################################
export JAVA_HOME=/usr/java/jdk1.7.0_79
export CALSSPATH=.:$JAVA_HOME/jre/lib:$JAVA_HOME/lib/tools.jar
export PATH=$PATH:$JAVA_HOME/bin
############################################################
### Hadoop
############################################################
export HADOOP_HOME=/usr/local/hadoop
export PATH=$PATH:$HADOOP_HOME/bin
export PATH=$PATH:$HADOOP_HOME/sbin
#export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib/navive"
alias HADOOP_START_ALL=$HADOOP_HOME/sbin/start-all.sh
alias HADOOP_STOP_ALL=$HADOOP_HOME/sbin/stop-all.sh
$HADOOP_HOME/etc/hadoop/hadoop_env.sh
export JAVA_HOME=/usr/java/jdk1.7.0_79
export HADOOP_HOME=/usr/local/hadoop
export HADOOP_MAPRED_HOME=$HADOOP_HOME
export HADOOP_COMMON_HOME=$HADOOP_HOME
export YARN_HOME=$HADOOP_HOME
export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
export YARN_CONF_DIR=$HADOOP_HOME/etc/hadoop
#export HADOOP_COMMON_LIB_NATIVE_DIR=${HADOOP_HOME}/lib/native
#export HADOOP_OPTS="${HADOOP_OPTS} -Djava.library.path=$HADOOP_HOME/lib"
$HADOOP_HOME/etc/hadoop/core-site.xml
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://192.168.0.101:9000</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/data/hadoop/tmp/</value>
</property>
</configuration>
$HADOOP_HOME/etc/hadoop/hdfs-site.xml -> namenode
<configuration>
<property>
<name>dfs.replication</name>
<value>3</value>
</property>
<property>
<name>dfs.permissions.enabled</name>
<value>false</value>
</property>
<property>
<name>dfs.namenode.secondary.http-address</name>
<value>datanode01:50090</value>
</property>
<property>
<name>dfs.namenode.secondary.https-address</name>
<value>datanode01:50091</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>/data/hadoop/dfs/name</value>
</property>
</configuration>
$HADOOP_HOME/etc/hadoop/hdfs-site.xml -> datanodes
<configuration>
<property>
<name>dfs.replication</name>
<value>3</value>
</property>
<property>
<name>dfs.permissions.enabled</name>
<value>false</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>/data/hadoop/dfs/data</value>
</property>
</configuration>
$HADOOP_HOME/etc/hadoop/mapred-site.xml
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
$HADOOP_HOME/etc/hadoop/yarn-site.xml
<configuration>
<property>
<name>yarn.resourcemanager.hostname</name>
<value>namenode</value>
</property>
<property>
<name>yarn.nodemanager.hostname</name>
<value>namenode</value>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
</configuration>
$HADOOP_HOME/etc/hadoop/slaves
namenode
datanode01
datanode02
datanode03
NameNode Infomation --> http://192.168.0.101:50070/
Nodes of the cluster --> http://192.168.0.101:8088/
※ Zookeeper 환경 구축
- 환경 변수 등록 -> (each server)
~/.bash_profile
export ZOOKEEPER_HOME=/usr/local/zookeeper
export PATH=$PATH:$ZOOKEEPER_HOME/bin
$ZOOKEEPER_HOME/conf/zoo.cfg <- dataDir & servers 등록
# The number of milliseconds of each tick
tickTime=2000
# The number of ticks that the initial
# synchronization phase can take
initLimit=10
# The number of ticks that can pass between
# sending a request and getting an acknowledgement
syncLimit=5
# the directory where the snapshot is stored.
# do not use /tmp for storage, /tmp here is just
# example sakes.
dataDir=/data/zookeeper
# the port at which the clients will connect
clientPort=2181
# the maximum number of client connections.
# increase this if you need to handle more clients
#maxClientCnxns=60
#
# Be sure to read the maintenance section of the
# administrator guide before turning on autopurge.
#
# http://zookeeper.apache.org/doc/current/zookeeperAdmin.html#sc_maintenance
#
# The number of snapshots to retain in dataDir
#autopurge.snapRetainCount=3
# Purge task interval in hours
# Set to "0" to disable auto purge feature
#autopurge.purgeInterval=1
server.1=namenode:2888:3888
server.2=datanode01:2888:3888
server.3=datanode02:2888:3888
server.4=datanode03:2888:3888
- Zookeeper myid 파일 생성 --> 서버 자신의 고유 ID (each server)
/data/zookeeper/myid
namenode -> 1
datanode01 -> 2
datanode02 -> 3
datanode03 -> 4
- Zookeeper 실행 -> (each server)
$ zkServer.sh start
- Zookeeper 실행 확인
$ jps
QuorumPeerMain
※ HBase 환경 구축
$HBASE_HOME/conf/hbase-env.sh
export JAVA_HOME=/usr/java/jdk1.7.0_79
export HBASE_CLASSPATH=/usr/local/hadoop/etc/hadoop <- hadoop conf dir
export HBASE_MANAGES_ZK=false
$HBASE_HOME/conf/regionservers -> only datanodes
datanode01
datanode02
datanode03
$HBASE_HOME/conf/hbase-site.xml
<configuration>
<property>
<name>hbase.rootdir</name>
<value>hdfs://namenode:9000/hbase</value>
</property>
<property>
<name>hbase.zookeeper.quorum</name>
<value>namenode,datanode01,datanode02,datanode03</value>
</property>
<property>
<name>hbase.zookeeper.property.dataDir</name>
<value>/data/zookeeper</value>
</property>
<property>
<name>hbase.cluster.distributed</name>
<value>true</value>
</property>
<property>
<name>hbase.dynamic.jars.dir</name>
<value>/usr/local/hbase/lib</value>
</property>
</configuration>
- HBase 실행 -> namenode
$ start-hbase.sh
HBase Master: namenode --> http://192.168.0.101:60010/
※ R HBase 연동 환경 구축
Install the dependencies for Thrift.
$ yum -y install automake libtool flex bison pkgconfig gcc-c++ boost-devel libevent-devel zlib-devel python-devel ruby-devel openssl-devel
Download Thrift archive
$ wget http://archive.apache.org/dist/thrift/0.8.0/thrift-0.8.0.tar.gz
$ tar -xvzf thrift-0.8.0.tar.gz
$ cd thrift-0.8.0
$ ./configure --without-ruby --without-python
$ make
$ make install
$ ln -s /usr/local/lib/libthrift-0.8.0.so /usr/lib64
- Thrift 확인
$ jps
ThriftServer