블로그 이미지

Rurony's Training Gym

Rurony의 트레이닝 도장! by Rurony


하둡 클러스터 환경 및 R 분석 서버 구축

※ 개발 서버 구성

/etc/hosts

192.168.0.101   namenode
192.168.0.102   datanode01
192.168.0.103   datanode02
192.168.0.104   datanode03

각 서버에 호스트 등록

※ 하둡 환경 구축

  • 하둡 설치
    $ get http://apache.mirror.cdnetworks.com/hadoop/common/hadoop-2.6.2/hadoop-2.6.2.tar.gz
    $ tar xvfz hadoop-2.6.2.tar.gz
    $ mv hadoop-2.6.2 /usr/local/hadoop
    $ chown -R hadoop:hadoop /usr/local/hadoop
    
  • 하둡 계정 생성 -> hadoop/hadoop (each server)
    $ useradd hadoop
    $ passwd hadoop
    
  • 하둡 계정 SSH Key 배포
    $ ssh-keygen -t rsa
    $ ssh-copy-id -i ~/.ssh/id_rsa.pub hadoop@namenode
    $ ssh-copy-id -i ~/.ssh/id_rsa.pub hadoop@datanode01
    $ ssh-copy-id -i ~/.ssh/id_rsa.pub hadoop@datanode02
    $ ssh-copy-id -i ~/.ssh/id_rsa.pub hadoop@datanode03
    
  • 하둡 데이터 폴더 생성 -> (each server)
    $ mkdir -p /data/hadoop/tmp
    $ mkdir -p /data/hadoop/dfs/name
    $ mkdir -p /data/hadoop/dfs/data
    
    chown -R hadoop:hadoop /data/hadoop/
    
  • 환경 변수 등록 -> (each server)

~/.bash_profile

# User specific environment and startup programs
############################################################
### Java
############################################################
export JAVA_HOME=/usr/java/jdk1.7.0_79
export CALSSPATH=.:$JAVA_HOME/jre/lib:$JAVA_HOME/lib/tools.jar
export PATH=$PATH:$JAVA_HOME/bin

############################################################
### Hadoop
############################################################
export HADOOP_HOME=/usr/local/hadoop
export PATH=$PATH:$HADOOP_HOME/bin
export PATH=$PATH:$HADOOP_HOME/sbin

#export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib/navive" 

alias HADOOP_START_ALL=$HADOOP_HOME/sbin/start-all.sh
alias HADOOP_STOP_ALL=$HADOOP_HOME/sbin/stop-all.sh
  • 하둡 설정

$HADOOP_HOME/etc/hadoop/hadoop_env.sh

export JAVA_HOME=/usr/java/jdk1.7.0_79
export HADOOP_HOME=/usr/local/hadoop
export HADOOP_MAPRED_HOME=$HADOOP_HOME
export HADOOP_COMMON_HOME=$HADOOP_HOME
export YARN_HOME=$HADOOP_HOME
export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
export YARN_CONF_DIR=$HADOOP_HOME/etc/hadoop
#export HADOOP_COMMON_LIB_NATIVE_DIR=${HADOOP_HOME}/lib/native
#export HADOOP_OPTS="${HADOOP_OPTS} -Djava.library.path=$HADOOP_HOME/lib" 

$HADOOP_HOME/etc/hadoop/core-site.xml

<configuration>
        <property>
                <name>fs.defaultFS</name>
                <value>hdfs://192.168.0.101:9000</value>
        </property>
        <property>
                <name>hadoop.tmp.dir</name>
                <value>/data/hadoop/tmp/</value>
        </property>
</configuration>

$HADOOP_HOME/etc/hadoop/hdfs-site.xml -> namenode

<configuration>
        <property>
                <name>dfs.replication</name>
                <value>3</value>
        </property>
        <property>
                <name>dfs.permissions.enabled</name>
                <value>false</value>
        </property>
        <property>
                <name>dfs.namenode.secondary.http-address</name>
                <value>datanode01:50090</value>
        </property>
        <property>
                <name>dfs.namenode.secondary.https-address</name>
                <value>datanode01:50091</value>
        </property>
        <property>
                <name>dfs.namenode.name.dir</name>
                <value>/data/hadoop/dfs/name</value>
        </property>
        <!--
        <property>
                <name>dfs.datanode.data.dir</name>
                <value>/data/hadoop/dfs/data</value>
        </property>
        -->
</configuration>

$HADOOP_HOME/etc/hadoop/hdfs-site.xml -> datanodes

<configuration>
        <property>
                <name>dfs.replication</name>
                <value>3</value>
        </property>
        <property>
                <name>dfs.permissions.enabled</name>
                <value>false</value>
        </property>
        <property>
                <name>dfs.datanode.data.dir</name>
                <value>/data/hadoop/dfs/data</value>
        </property>
</configuration>

$HADOOP_HOME/etc/hadoop/mapred-site.xml

<configuration>
        <property>
                <name>mapreduce.framework.name</name>
                <value>yarn</value>
        </property>
</configuration>

$HADOOP_HOME/etc/hadoop/yarn-site.xml

<configuration>
        <property>
                <name>yarn.resourcemanager.hostname</name>
                <value>namenode</value>
        </property>
        <property>
                <name>yarn.nodemanager.hostname</name>
                <value>namenode</value> <!-- or hslave1, hslave2, hslave3 -->
        </property>
        <property>
                <name>yarn.nodemanager.aux-services</name>
                <value>mapreduce_shuffle</value>
        </property>
</configuration>

$HADOOP_HOME/etc/hadoop/slaves

namenode
datanode01
datanode02
datanode03
  • 하둡 설치 파일 및 설정 파일 배포
    $ scp -r /usr/local/hadoop hadoop@datanode03:/usr/local/hadoop
    $ scp -r /usr/local/hadoop hadoop@datanode02:/usr/local/hadoop
    $ scp -r /usr/local/hadoop hadoop@datanode01:/usr/local/hadoop
    
    $ chown -R hadoop:hadoop /usr/local/hadoop
    
  • 하둡 실행 --> namenode
    -- HDFS format
    $ hadoop namenode -format <- 최초 1회
    
    -- start
    $ $HADOOP_HOME/sbin/start-all.sh
    
    -- stop
    $ $HADOOP_HOME/sbin/stop-all.sh
    
  • 하둡 실행 확인
    $ jps
    -- namenode
        NodeManager
        ResourceManager
        NameNode
        DataNode
    
    -- datanode
        SecondaryNameNode (192.168.0.102 only)
        NodeManager
        DataNode
    
  • 하둡 Web 관리 Console

NameNode Infomation --> http://192.168.0.101:50070/
Nodes of the cluster --> http://192.168.0.101:8088/

  • hdfs 확인 : namenode or datanode file add check, cluster namenode에서 조회 확인

    $ hadoop fs -mkdir /input $ hadoop fs -copyFromLocal start-all.sh /input

  • MapReduce 확인

    $ hadoop fs -copyFromLocal README.txt /input $ hadoop jar WordCount-1.0-SNAPSHOT.jar WordCount /output /input/README.txt $ hadoop fs -cat /output/part-r-00000 | more

※ R 환경 구축

  • Extra Packages for Enterprise Linux (EPEL)
    $ wget http://mirror.us.leaseweb.net/epel/6/x86_64/epel-release-6-8.noarch.rpm
    $ rpm -ivh epel-release-6-8.noarch.rpm
    
  • R Base 설치
    $ yum install R
    
  • RStudio Server 설치
    $ wget https://download2.rstudio.org/rstudio-server-rhel-0.99.489-x86_64.rpm
    $ sudo yum install --nogpgcheck rstudio-server-rhel-0.99.489-x86_64.rpm
    
  • RStudio Server 설치 확인
    $ rstudio-server verify-installation
        rstudio-server stop/waiting
        rstudio-server start/running, process 32626
    
  • RStudio Web 관리 Console

http://192.168.0.101:8787/
Developer user : os 계정

※ Zookeeper 환경 구축

  • Zookeeper 설치
    $ wget http://apache.mirror.cdnetworks.com/zookeeper/current/zookeeper-3.4.6.tar.gz
    $ tar xvzf zookeeper-3.4.6.tar.gz
    $ mv zookeeper-3.4.6 /usr/local/zookepper
    $ chown -R hadoop:hadoop /usr/local/zookepper
    
  • Zookeeper 데이터 폴더 생성 -> (each server)
    $ mkdir -p /data/zookeeper
    
    $ chown -R hadoop:hadoop /data/zookeeper/
    
  • 환경 변수 등록 -> (each server)

~/.bash_profile

export ZOOKEEPER_HOME=/usr/local/zookeeper
export PATH=$PATH:$ZOOKEEPER_HOME/bin
  • Zookeeper 설정

$ZOOKEEPER_HOME/conf/zoo.cfg <- dataDir & servers 등록

# The number of milliseconds of each tick
tickTime=2000
# The number of ticks that the initial
# synchronization phase can take
initLimit=10
# The number of ticks that can pass between
# sending a request and getting an acknowledgement
syncLimit=5
# the directory where the snapshot is stored.
# do not use /tmp for storage, /tmp here is just
# example sakes.
dataDir=/data/zookeeper
# the port at which the clients will connect
clientPort=2181
# the maximum number of client connections.
# increase this if you need to handle more clients
#maxClientCnxns=60
#
# Be sure to read the maintenance section of the
# administrator guide before turning on autopurge.
#
# http://zookeeper.apache.org/doc/current/zookeeperAdmin.html#sc_maintenance
#
# The number of snapshots to retain in dataDir
#autopurge.snapRetainCount=3
# Purge task interval in hours
# Set to "0" to disable auto purge feature
#autopurge.purgeInterval=1

server.1=namenode:2888:3888
server.2=datanode01:2888:3888
server.3=datanode02:2888:3888
server.4=datanode03:2888:3888
  • Zookeeper 설치 파일 및 설정 파일 배포
    $ scp -r /usr/local/zookeeper root@datanode01:/usr/local/zookeeper
    $ scp -r /usr/local/zookeeper root@datanode02:/usr/local/zookeeper
    $ scp -r /usr/local/zookeeper root@datanode03:/usr/local/zookeeper
    
    $ chown -R hadoop:hadoop /usr/local/zookepper
    
  • Zookeeper myid 파일 생성 --> 서버 자신의 고유 ID (each server)

/data/zookeeper/myid

namenode -> 1
datanode01 -> 2
datanode02 -> 3
datanode03 -> 4
  • Zookeeper 실행 -> (each server)
    $ zkServer.sh start
    
  • Zookeeper 실행 확인
    $ jps
        QuorumPeerMain
    

※ HBase 환경 구축

  • HBase 설치
    $ wget http://mirror.apache-kr.org/hbase/0.98.16.1/hbase-0.98.16.1-hadoop2-bin.tar.gz
    $ tar xvzf hbase-0.98.16.1-hadoop2-bin.tar.gz
    $ mv hbase-0.98.16.1-hadoop2 /usr/local/hbase
    $ chown -R hadoop:hadoop /usr/local/hbase
    
  • 환경 변수 등록 -> (each server)
    export HBASE_HOME=/usr/local/hbase
    export PATH=$PATH:$HBASE_HOME/bin
    
  • HBase 설정

$HBASE_HOME/conf/hbase-env.sh

export JAVA_HOME=/usr/java/jdk1.7.0_79
export HBASE_CLASSPATH=/usr/local/hadoop/etc/hadoop <- hadoop conf dir
export HBASE_MANAGES_ZK=false

$HBASE_HOME/conf/regionservers -> only datanodes

datanode01
datanode02
datanode03

$HBASE_HOME/conf/hbase-site.xml

<configuration>
        <property>
                <name>hbase.rootdir</name>
                <value>hdfs://namenode:9000/hbase</value>
        </property>
        <property>
                <name>hbase.zookeeper.quorum</name>
                <value>namenode,datanode01,datanode02,datanode03</value>
        </property>
        <property>
                <name>hbase.zookeeper.property.dataDir</name>
                <value>/data/zookeeper</value>
        </property>
        <property>
                <name>hbase.cluster.distributed</name>
                <value>true</value>
        </property>
        <property>
                <name>hbase.dynamic.jars.dir</name>
                <value>/usr/local/hbase/lib</value>
        </property>
</configuration>
  • HBase 설치 파일 및 설정 파일 배포
    $ scp -r /usr/local/hbase root@datanode01:/usr/local/hbase
    $ scp -r /usr/local/hbase root@datanode02:/usr/local/hbase
    $ scp -r /usr/local/hbase root@datanode03:/usr/local/hbase
    
    $ chown -R hadoop:hadoop /usr/local/hbase
    
  • HBase 실행 -> namenode
    $ start-hbase.sh
    
  • 하둡 실행 확인
    $ jps
    -- namenode
        HMaster
    
    -- datanode
        HRegionServer
    
  • Hbase Web 관리 Console

HBase Master: namenode --> http://192.168.0.101:60010/

※ R HBase 연동 환경 구축

  • Apache Thrift 설치

Install the dependencies for Thrift.

$ yum -y install automake libtool flex bison pkgconfig gcc-c++ boost-devel libevent-devel zlib-devel python-devel ruby-devel openssl-devel

Download Thrift archive

$ wget http://archive.apache.org/dist/thrift/0.8.0/thrift-0.8.0.tar.gz
$ tar -xvzf thrift-0.8.0.tar.gz
$ cd thrift-0.8.0
$ ./configure --without-ruby --without-python
$ make
$ make install
$ ln -s /usr/local/lib/libthrift-0.8.0.so /usr/lib64
  • 환경 변수 등록 (root user)
    export PKG_CONFIG_PATH=/usr/local/lib/pkgconfig
    
  • Thrift 실행
    hbase-daemon.sh start thrift
    
  • Thrift 확인
    $ jps
        ThriftServer
    
  • rhbase package 설치 (R COMMAND INSTALL)
    $ wget -O rhbase_1.2.1.tar.gz https://github.com/RevolutionAnalytics/rhbase/blob/master/build/rhbase_1.2.1.tar.gz?raw=true
    
    $ R CMD INSTALL rhbase_1.2.1.tar.gz


Top