在DIGITALOCEAN DROPLETS上部署HADOOP CLUSTER

随着DigitalOcean上专用网络来临,我想用一个基于droplet的集群取代我的本地物理Cloudera Hadoop集群。其中关于使用DigitalOcean droplet最好的就是,你可以快照任何图像和破坏VM,当他们不在使用和你无需他们。不好的方面是,专用网络上DigitalOcean droplet实施并不能保证任何安全,只要其他主机在同一专用网络上,你应该在使用专用网络时考虑到这一点;因频宽是自由的,它不是真正的私有。

 

这里我们概述了4主机群集耗资0.15美元 /小时(1X0.06美元+ 3X0.03美元/小时),使之成为一个连接率很高的平台。

 

如果你不熟悉DigitalOcean – 它们提供了非常简单,便宜的虚拟服务器(droplets在DigitalOcean说法)。

 

我将使用Cloudera Manager自动化安装程序指南,我发现这是管理群集一个很好的工具。

[Read more…]

HADOOP 2.6 WORDCOUNT EXAMPLE

HADOOP 2.6 WORDCOUNT EXAMPLE

 

 


root@hadoop2-VirtualBox:/usr/local/hadoop/share/hadoop/mapreduce# pwd
/usr/local/hadoop/share/hadoop/mapreduce

root@hadoop2-VirtualBox:/usr/local/hadoop/share/hadoop/mapreduce# cat f3.txt >> f2.txt
root@hadoop2-VirtualBox:/usr/local/hadoop/share/hadoop/mapreduce# wc -l f2.txt 
7395228 f2.txt
root@hadoop2-VirtualBox:/usr/local/hadoop/share/hadoop/mapreduce# hdfs dfs -put f2.txt /home/hadoop/input/f2.txt
15/02/06 21:06:50 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable





root@hadoop2-VirtualBox:/usr/local/hadoop/share/hadoop/mapreduce# hadoop jar hadoop-mapreduce-examples-2.6.0.jar wordcount /home/hadoop/input /home/hadoop/output4
15/02/06 21:07:25 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
15/02/06 21:07:26 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
15/02/06 21:07:27 INFO input.FileInputFormat: Total input paths to process : 2
15/02/06 21:07:27 INFO mapreduce.JobSubmitter: number of splits:2
15/02/06 21:07:27 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1423212832128_0004
15/02/06 21:07:28 INFO impl.YarnClientImpl: Submitted application application_1423212832128_0004
15/02/06 21:07:28 INFO mapreduce.Job: The url to track the job: http://hadoop2-VirtualBox:8088/proxy/application_1423212832128_0004/
15/02/06 21:07:28 INFO mapreduce.Job: Running job: job_1423212832128_0004
15/02/06 21:07:36 INFO mapreduce.Job: Job job_1423212832128_0004 running in uber mode : false
15/02/06 21:07:36 INFO mapreduce.Job:  map 0% reduce 0%
15/02/06 21:07:47 INFO mapreduce.Job:  map 50% reduce 0%
15/02/06 21:07:52 INFO mapreduce.Job:  map 58% reduce 0%
15/02/06 21:07:55 INFO mapreduce.Job:  map 63% reduce 0%
15/02/06 21:07:58 INFO mapreduce.Job:  map 65% reduce 0%
15/02/06 21:08:02 INFO mapreduce.Job:  map 68% reduce 0%
15/02/06 21:08:05 INFO mapreduce.Job:  map 71% reduce 0%
15/02/06 21:08:06 INFO mapreduce.Job:  map 71% reduce 17%
15/02/06 21:08:08 INFO mapreduce.Job:  map 77% reduce 17%
15/02/06 21:08:11 INFO mapreduce.Job:  map 80% reduce 17%
15/02/06 21:08:14 INFO mapreduce.Job:  map 100% reduce 17%
15/02/06 21:08:16 INFO mapreduce.Job:  map 100% reduce 100%
15/02/06 21:08:16 INFO mapreduce.Job: Job job_1423212832128_0004 completed successfully
15/02/06 21:08:16 INFO mapreduce.Job: Counters: 50
	File System Counters
		FILE: Number of bytes read=269
		FILE: Number of bytes written=317703
		FILE: Number of read operations=0
		FILE: Number of large read operations=0
		FILE: Number of write operations=0
		HDFS: Number of bytes read=125719110
		HDFS: Number of bytes written=49
		HDFS: Number of read operations=9
		HDFS: Number of large read operations=0
		HDFS: Number of write operations=2
	Job Counters 
		Killed map tasks=1
		Launched map tasks=3
		Launched reduce tasks=1
		Data-local map tasks=3
		Total time spent by all maps in occupied slots (ms)=65341
		Total time spent by all reduces in occupied slots (ms)=26179
		Total time spent by all map tasks (ms)=65341
		Total time spent by all reduce tasks (ms)=26179
		Total vcore-seconds taken by all map tasks=65341
		Total vcore-seconds taken by all reduce tasks=26179
		Total megabyte-seconds taken by all map tasks=66909184
		Total megabyte-seconds taken by all reduce tasks=26807296
	Map-Reduce Framework
		Map input records=7395229
		Map output records=14790458
		Map output bytes=184880720
		Map output materialized bytes=65
		Input split bytes=222
		Combine input records=14790470
		Combine output records=16
		Reduce input groups=4
		Reduce shuffle bytes=65
		Reduce input records=4
		Reduce output records=4
		Spilled Records=20
		Shuffled Maps =2
		Failed Shuffles=0
		Merged Map outputs=2
		GC time elapsed (ms)=714
		CPU time spent (ms)=14010
		Physical memory (bytes) snapshot=560402432
		Virtual memory (bytes) snapshot=2402254848
		Total committed heap usage (bytes)=378994688
	Shuffle Errors
		BAD_ID=0
		CONNECTION=0
		IO_ERROR=0
		WRONG_LENGTH=0
		WRONG_MAP=0
		WRONG_REDUCE=0
	File Input Format Counters 
		Bytes Read=125718888
	File Output Format Counters 
		Bytes Written=49


root@hadoop2-VirtualBox:/usr/local/hadoop/share/hadoop/mapreduce# hdfs dfs -ls  /home/hadoop/output4
15/02/06 21:16:46 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Found 2 items
-rw-r--r--   1 root supergroup          0 2015-02-06 21:08 /home/hadoop/output4/_SUCCESS
-rw-r--r--   1 root supergroup         49 2015-02-06 21:08 /home/hadoop/output4/part-r-00000
root@hadoop2-VirtualBox:/usr/local/hadoop/share/hadoop/mapreduce# 

root@hadoop2-VirtualBox:/usr/local/hadoop/share/hadoop/mapreduce# hdfs dfs -cat  /home/hadoop/output4/part-r-00000
15/02/06 21:17:18 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
FKDSFJKS	7395228
HFLESFL	7395228
hello	1
world	1


Hadoop单节点快速部署

Hadoop单节点快速部署



  sudo apt-get update
  sudo apt-get install openjdk-7-jdk
  java -version
  cd /usr/lib/jvm
  ln -s java-7-openjdk-amd64 jdk
  sudo addgroup hadoop_group
  sudo adduser --ingroup hadoop_group hduser1
  sudo adduser hduser1 sudo


    su - hduser1
    ssh-keygen -t rsa -P ""
    cat $HOME/.ssh/id_rsa.pub >> $HOME/.ssh/authorized_keys
    ssh localhost
    su - hduser1

# Set Hadoop-related environment variables
export HADOOP_HOME=/usr/local/hadoop
# Add Hadoop bin/ directory to PATH
export PATH= $PATH:$HADOOP_HOME/bin

wget http://ftp.yz.yamagata-u.ac.jp/pub/network/apache/hadoop/common/current/hadoop-2.7.0.tar.gz
tar -zxvf hadoop-2.7.0.tar.gz 
sudo mv hadoop-2.7.0 /usr/local/hadoop 

vi ~/.bashrc
增加

   #Hadoop variables
    export JAVA_HOME=/usr/lib/jvm/jdk/
    export HADOOP_INSTALL=/usr/local/hadoop
    export PATH=$PATH:$HADOOP_INSTALL/bin
    export PATH=$PATH:$HADOOP_INSTALL/sbin
    export HADOOP_MAPRED_HOME=$HADOOP_INSTALL
    export HADOOP_COMMON_HOME=$HADOOP_INSTALL
    export HADOOP_HDFS_HOME=$HADOOP_INSTALL
    export YARN_HOME=$HADOOP_INSTALL
    ###end of paste



 vi /usr/local/hadoop/etc/hadoop/hadoop-env.sh

 将JAVA_HOME 这一行修改为

   export JAVA_HOME=/usr/lib/jvm/jdk


vi  /usr/local/hadoop/etc/hadoop/core-site.xml
在
之间加入


修改为

      
           fs.default.name
           hdfs://localhost:9000
      




vi /usr/local/hadoop/etc/hadoop/yarn-site.xml

修改为

    
       yarn.nodemanager.aux-services
       mapreduce_shuffle
    
    
       yarn.nodemanager.aux-services.mapreduce.shuffle.class
       org.apache.hadoop.mapred.ShuffleHandler
    





cp /usr/local/hadoop/etc/hadoop/mapred-site.xml.template /usr/local/hadoop/etc/hadoop/mapred-site.xml


vi /usr/local/hadoop/etc/hadoop/mapred-site.xml 

修改为


    
       mapreduce.framework.name
       yarn
    



sudo mkdir -p /usr/local/hadoop_store/hdfs/namenode
sudo mkdir -p /usr/local/hadoop_store/hdfs/datanode
sudo chown hduser1 /usr/local/hadoop_store/hdfs/namenode
sudo chown hduser1 /usr/local/hadoop_store/hdfs/datanode


vi /usr/local/hadoop/etc/hadoop/hdfs-site.xml


修改为

    
       dfs.replication
       1
     
     
       dfs.namenode.name.dir
       file:/usr/local/hadoop_store/hdfs/namenode
     
     
       dfs.datanode.data.dir
       file:/usr/local/hadoop_store/hdfs/datanode
     




    sudo chown hduser1:hadoop_group -R /usr/local/hadoop_store
    sudo chmod 777 -R /usr/local/hadoop_store

 cd /usr/local/hadoop/
 hdfs namenode -format   


cd /usr/local/hadoop/
start-all.sh


jps
10477 SecondaryNameNode
10757 NodeManager
10974 Jps
10113 NameNode
10623 ResourceManager
10251 DataNode








 sudo apt-get update
 sudo apt-get install openjdk-7-jdk
 java -version
 cd /usr/lib/jvm
 ln -s java-7-openjdk-amd64 jdk
 sudo addgroup hadoop_group
 sudo adduser --ingroup hadoop_group hduser1
 sudo adduser hduser1 sudo


 su - hduser1
 ssh-keygen -t rsa -P ""
 cat $HOME/.ssh/id_rsa.pub >> $HOME/.ssh/authorized_keys
 ssh localhost
 su - hduser1

# Set Hadoop-related environment variables
export HADOOP_HOME=/usr/local/hadoop
# Add Hadoop bin/ directory to PATH
export PATH= $PATH:$HADOOP_HOME/bin

wget http://ftp.yz.yamagata-u.ac.jp/pub/network/apache/hadoop/common/current/hadoop-2.7.0.tar.gz
tar -zxvf hadoop-2.7.0.tar.gz 
sudo mv hadoop-2.7.0 /usr/local/hadoop 

vi ~/.bashrc
增加

 #Hadoop variables
 export JAVA_HOME=/usr/lib/jvm/jdk/
 export HADOOP_INSTALL=/usr/local/hadoop
 export PATH=$PATH:$HADOOP_INSTALL/bin
 export PATH=$PATH:$HADOOP_INSTALL/sbin
 export HADOOP_MAPRED_HOME=$HADOOP_INSTALL
 export HADOOP_COMMON_HOME=$HADOOP_INSTALL
 export HADOOP_HDFS_HOME=$HADOOP_INSTALL
 export YARN_HOME=$HADOOP_INSTALL
 ###end of paste



 vi /usr/local/hadoop/etc/hadoop/hadoop-env.sh

 将JAVA_HOME 这一行修改为

 export JAVA_HOME=/usr/lib/jvm/jdk


vi /usr/local/hadoop/etc/hadoop/core-site.xml
在<configuration>
</configuration>之间加入


修改为
<configuration>
 <property>
 <name>fs.default.name</name>
 <value>hdfs://localhost:9000</value>
 </property>
</configuration>



vi /usr/local/hadoop/etc/hadoop/yarn-site.xml

修改为
<configuration>
 <property>
 <name>yarn.nodemanager.aux-services</name>
 <value>mapreduce_shuffle</value>
 </property>
 <property>
 <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
 <value>org.apache.hadoop.mapred.ShuffleHandler</value>
 </property>
</configuration>




cp /usr/local/hadoop/etc/hadoop/mapred-site.xml.template /usr/local/hadoop/etc/hadoop/mapred-site.xml


vi /usr/local/hadoop/etc/hadoop/mapred-site.xml 

修改为

<configuration>
 <property>
 <name>mapreduce.framework.name</name>
 <value>yarn</value>
 </property>
</configuration>


sudo mkdir -p /usr/local/hadoop_store/hdfs/namenode
sudo mkdir -p /usr/local/hadoop_store/hdfs/datanode
sudo chown hduser1 /usr/local/hadoop_store/hdfs/namenode
sudo chown hduser1 /usr/local/hadoop_store/hdfs/datanode


vi /usr/local/hadoop/etc/hadoop/hdfs-site.xml


修改为
<configuration>
 <property>
 <name>dfs.replication</name>
 <value>1</value>
 </property>
 <property>
 <name>dfs.namenode.name.dir</name>
 <value>file:/usr/local/hadoop_store/hdfs/namenode</value>
 </property>
 <property>
 <name>dfs.datanode.data.dir</name>
 <value>file:/usr/local/hadoop_store/hdfs/datanode</value>
 </property>
</configuration>



 sudo chown hduser1:hadoop_group -R /usr/local/hadoop_store
 sudo chmod 777 -R /usr/local/hadoop_store

 cd /usr/local/hadoop/
 hdfs namenode -format 


cd /usr/local/hadoop/
start-all.sh


jps
10477 SecondaryNameNode
10757 NodeManager
10974 Jps
10113 NameNode
10623 ResourceManager
10251 DataNode



Typical HDFS cluster

A typical HDFS cluster:

 

hdfs cluster

 

 

沪ICP备14014813号-2

沪公网安备 31010802001379号