配色: 字号:
hadoop安装前准备工作
2012-05-16 | 阅:  转:  |  分享 
  
在安装好的Ubuntu系统下添加具有sudo权限的用户。

root@nodeA:~#sudoadduserzyx

Addinguser`zyx''...

Addingnewgroup`zyx''(1001)...

Addingnewuser`zyx''(1001)withgroup`zyx''...

Creatinghomedirectory`/home/zyx''...

Copyingfilesfrom`/etc/skel''...

EnternewUNIXpassword:

RetypenewUNIXpassword:

passwd:passwordupdatedsuccessfully

Changingtheuserinformationforzyx

Enterthenewvalue,orpressENTERforthedefault

FullName[]:^Cadduser:`/usr/bin/chfnzyx''exitedfromsignal2.Exiting.

root@nodeA:~#

root@nodeA:~#sudousermod-Gadmin-azyx

root@nodeA:~#

建立SSH无密码登陆

(1)namenode上实现无密码登陆本机

zyx@nodeA:~$ssh-keygen-tdsa-P''''-f~/.ssh/id_dsa

Generatingpublic/privatedsakeypair.

Createddirectory''/home/zyx/.ssh''.

Youridentificationhasbeensavedin/home/zyx/.ssh/id_dsa.

Yourpublickeyhasbeensavedin/home/zyx/.ssh/id_dsa.pub.

Thekeyfingerprintis:

65:2e:e0:df:2e:61:a5:19:6a:ab:0e:38:45:a9:6a:2bzyx@nodeA

Thekey''srandomartimageis:

+--[DSA1024]----+

||

|.|

|o.o|

|o...+.|

|....S=.|

|.oo.=o|

|+...o...|

|E......|

|...o...|

+-----------------+

zyx@nodeA:~$cat~/.ssh/id_dsa.pub>>~/.ssh/authorized_keys

zyx@nodeA:~$

(2)实现namenode无密码登陆其他datanode

hadoop@nodeB:~$scphadoop@nodea:/home/hadoop/.ssh/id_dsa.pub/home/hadoop

hadoop@nodea''spassword:

id_dsa.pub100%6020.6KB/s00:00

hadoop@nodeB:~$catid_dsa.pub>>.ssh/authorized_keys

hadoop@nodeB:~$sudoufwdisable



复制JDK(jdk-6u20-linux-i586.bin)文件到linux

利用F-SecureSSHFileTransferTrial工具,直接拖拽

jdk-6u20-linux-i586.bin的安装和配置

(1)安装

zyx@nodeA:~$ls

Examplesjdk

zyx@nodeA:~$cdjdk

zyx@nodeA:~/jdk$ls

jdk-6u20-linux-i586.bin

zyx@nodeA:~/jdk$chmoda+xjdk

zyx@nodeA:~/jdk$./jdk

接下来显示许可协议,然后选择yes,然后按Enter键,安装结束。

zyx@nodeA:~/jdk$ls

jdk1.6.0_20jdk-6u20-linux-i586.bin

(2)配置

用root@nodeA:/home/zyx#vi.bashrc打开bashrc,然后在最后加入下面几行:

exportJAVA_HOME=/home/zyx/jdk/jdk1.6.0_20

exportJRE_HOME=/home/zyx/jdk/jdk1.6.0_20/jre

exportCLASS_PATH=$CLASSPATH:$JAVA_HOME/lib:$JRE_HOME/lib

exportPATH=$JAVA_HOME/bin:$JRE_HOME/bin:$PATH:$HOMR/bin



Hadoop的安装

下载地址:

http://labs.renren.com/apache-mirror/hadoop/core/hadoop-0.20.2/hadoop-0.20.2.tar.gz

把hadoop-0.20.2.tar.gz放到home/zyx/hadoop下,然后解压该文件

zyx@nodeB:~/hadoop$tar-zvxfhadoop-0.20.2.tar.gz

设置环境变量,添加到home/zyx/.bashrc

zyx@nodeA:~$vi.bashrc

exportHADOOP_HOME=/home/zyx/hadoop/hadoop-0.20.2

exportPATH=$HADOOP_HOME/bin:$PATH



Hadoop的配置

在conf/hadoop-env.sh中配置java环境

exportJAVA_HOME=/home/zyx/jdk/jdk/jdk1.6.0_20

配置conf/masters,slaves文件,只需要在nodename上配置。

配置core-site.xml,hdfs-site.xml,mapred-site.xml

zyx@nodeC:~/hadoop-0.20.2/conf$morecore-site.xml















fs.default.name

#hdfs://192.168.1.103:54310

hdfs://192.168.1.103:9000







zyx@nodeC:~/hadoop-0.20.2/conf$morehdfs-site.xml















dfs.replication

1







zyx@nodeC:~/hadoop-0.20.2/conf$moremapred-site.xml















mapred.job.tracker

#hdfs://192.168.1.103:54320

hdfs://192.168.1.103:9001







Hadoop的运行

(0)格式化:

zyx@nodeC:~/hadoop-0.20.2/bin$hadoopnamenode–format

(1)用jps查看进程:

zyx@nodeC:~/hadoop-0.20.2/bin$jps

31030NameNode

31488TaskTracker

31283SecondaryNameNode

31372JobTracker

31145DataNode

31599Jps

(2)查看集群状态

zyx@nodeC:~/hadoop-0.20.2/bin$hadoopdfsadmin-report

ConfiguredCapacity:304716488704(283.79GB)

PresentCapacity:270065557519(251.52GB)

DFSRemaining:270065532928(251.52GB)

DFSUsed:24591(24.01KB)

DFSUsed%:0%

Underreplicatedblocks:0

Blockswithcorruptreplicas:0

Missingblocks:0



-------------------------------------------------

Datanodesavailable:1(1total,0dead)



Name:192.168.1.103:50010

DecommissionStatus:Normal

ConfiguredCapacity:304716488704(283.79GB)

DFSUsed:24591(24.01KB)

NonDFSUsed:34650931185(32.27GB)

DFSRemaining:270065532928(251.52GB)

DFSUsed%:0%

DFSRemaining%:88.63%

Lastcontact:FriApr2315:39:10CST2010



(3)Stop文件:

zyx@nodeC:~/hadoop-0.20.2/bin$stop-all.sh

stoppingjobtracker

localhost:stoppingtasktracker

stoppingnamenode

localhost:stoppingdatanode

localhost:stoppingsecondarynamenode



运行一个简单JAVA程序

先在本地磁盘建立两个文件file01和file02[cuijj@station1~]$echo"Hellocuijjbyecuijj">file01[cuijj@station1~]$echo"HelloHadoopGoodbyeHadoop">file022)在hdfs中建立一个input目录[cuijj@station1~]$hadoopdfs-mkdirinput将file01和file02拷贝到hdfs的input目录下zyx@nodeC:~$hadoopdfs-copyFromLocal/home/zyx/file0input

查看hdfs中有没有input目录zyx@nodeC:~$hadoopdfs-ls

Found1items

drwxr-xr-x-zyxsupergroup02010-04-2316:40/user/zyx/input



查看input目录下有没有复制成功file01和file02zyx@nodeC:~$hadoopdfs-lsinput

Found2items

-rw-r--r--1zyxsupergroup02010-04-2316:40/user/zyx/input/file01

-rw-r--r--1zyxsupergroup02010-04-2316:40/user/zyx/input/file02

执行wordcount(确保hdfs上没有output目录)zyx@nodeC:~/hadoop-0.20.2$hadoopjarhadoop-0.20.2-examples.jarwordcountinputoutput

10/04/2409:25:10INFOinput.FileInputFormat:Totalinputpathstoprocess:2

10/04/2409:25:11INFOmapred.JobClient:Runningjob:job_201004240840_0001

10/04/2409:25:12INFOmapred.JobClient:map0%reduce0%

10/04/2409:25:22INFOmapred.JobClient:map100%reduce0%

10/04/2409:25:34INFOmapred.JobClient:map100%reduce100%

10/04/2409:25:36INFOmapred.JobClient:Jobcomplete:job_201004240840_0001

10/04/2409:25:36INFOmapred.JobClient:Counters:17

10/04/2409:25:36INFOmapred.JobClient:JobCounters

10/04/2409:25:36INFOmapred.JobClient:Launchedreducetasks=1

10/04/2409:25:36INFOmapred.JobClient:Launchedmaptasks=2

10/04/2409:25:36INFOmapred.JobClient:Data-localmaptasks=2

10/04/2409:25:36INFOmapred.JobClient:FileSystemCounters

10/04/2409:25:36INFOmapred.JobClient:FILE_BYTES_READ=79

10/04/2409:25:36INFOmapred.JobClient:HDFS_BYTES_READ=50

10/04/2409:25:36INFOmapred.JobClient:FILE_BYTES_WRITTEN=228

10/04/2409:25:36INFOmapred.JobClient:HDFS_BYTES_WRITTEN=41

10/04/2409:25:36INFOmapred.JobClient:Map-ReduceFramework

10/04/2409:25:36INFOmapred.JobClient:Reduceinputgroups=5

10/04/2409:25:36INFOmapred.JobClient:Combineoutputrecords=6

10/04/2409:25:36INFOmapred.JobClient:Mapinputrecords=2

10/04/2409:25:36INFOmapred.JobClient:Reduceshufflebytes=85

10/04/2409:25:36INFOmapred.JobClient:Reduceoutputrecords=5

10/04/2409:25:36INFOmapred.JobClient:SpilledRecords=12

10/04/2409:25:36INFOmapred.JobClient:Mapoutputbytes=82

10/04/2409:25:36INFOmapred.JobClient:Combineinputrecords=8

10/04/2409:25:36INFOmapred.JobClient:Mapoutputrecords=8

10/04/2409:25:36INFOmapred.JobClient:Reduceinputrecords=6



查看运行结果

zyx@nodeC:~/hadoop-0.20.2$hadoopfs-catoutput/part-r-00000

Goodbye1

Hadoop2

Hello2

bye1

cuijj2



MapReduce的安装

MapReduce程序的运行









http://www.michael-noll.com/wiki/Running_Hadoop_On_Ubuntu_Linux_(Single-Node_Cluster)#Excursus:_Hadoop_Distributed_File_System_.28HDFS.29



11.对于.java程序的hadoop编译:

root@nodeC:/home/zyx/hadoop-0.20.2#javac-classpath/home/zyx/hadoop-0.20.2/hadoop-0.20.2-core.jar:/home/zyx/hadoop-0.20.2/lib/commons-cli-1.2.jar-d/home/zyx/wordcount_class/home/zyx/hadoop-0.20.2/src/examples/org/apache/hadoop/examples/WordCount.java

12.把.class文件生成.jar文件

root@nodeC:/home/zyx/wordcount_class/org/apache/hadoop/examples#jar-cvf/home/zyx/wordcount.jar/home/zyx/wordcount_class/.

addedmanifest

adding:home/zyx/wordcount_class/(in=0)(out=0)(stored0%)

adding:home/zyx/wordcount_class/org/(in=0)(out=0)(stored0%)

adding:home/zyx/wordcount_class/org/apache/(in=0)(out=0)(stored0%)

adding:home/zyx/wordcount_class/org/apache/hadoop/(in=0)(out=0)(stored0%)

adding:home/zyx/wordcount_class/org/apache/hadoop/examples/(in=0)(out=0)(stored0%)

adding:home/zyx/wordcount_class/org/apache/hadoop/examples/WordCount.class(in=1911)(out=996)(deflated47%)

adding:home/zyx/wordcount_class/org/apache/hadoop/examples/WordCount$TokenizerMapper.class(in=1790)(out=765)(deflated57%)

adding:home/zyx/wordcount_class/org/apache/hadoop/examples/WordCount$IntSumReducer.class(in=1789)(out=746)(deflated58%)

adding:WordCount.class(in=1911)(out=996)(deflated47%)

adding:WordCount$TokenizerMapper.class(in=1790)(out=765)(deflated57%)



例子:WordCountv1.0

在深入细节之前,让我们先看一个Map/Reduce的应用示例,以便对它们的工作方式有一个初步的认识。

WordCount是一个简单的应用,它可以计算出指定数据集中每一个单词出现的次数。

这个应用适用于单机模式,伪分布式模式或完全分布式模式三种Hadoop安装方式。

源代码

WordCount.java 1. packageorg.myorg; 2. 3. importjava.io.IOException; 4. importjava.util.; 5. 6. importorg.apache.hadoop.fs.Path; 7. importorg.apache.hadoop.conf.; 8. importorg.apache.hadoop.io.; 9. importorg.apache.hadoop.mapred.; 10. importorg.apache.hadoop.util.; 11. 12. publicclassWordCount{ 13. 14. ??publicstaticclassMapextendsMapReduceBaseimplementsMapper{ 15. ????privatefinalstaticIntWritableone=newIntWritable(1); 16. ????privateTextword=newText(); 17. 18. ????publicvoidmap(LongWritablekey,Textvalue,OutputCollectoroutput,Reporterreporter)throwsIOException{ 19. ??????Stringline=value.toString(); 20. ??????StringTokenizertokenizer=newStringTokenizer(line); 21. ??????while(tokenizer.hasMoreTokens()){ 22. ????????word.set(tokenizer.nextToken()); 23. ????????output.collect(word,one); 24. ??????} 25. ????} 26. ??} 27. 28. ??publicstaticclassReduceextendsMapReduceBaseimplementsReducer{ 29. ????publicvoidreduce(Textkey,Iteratorvalues,OutputCollectoroutput,Reporterreporter)throwsIOException{ 30. ??????intsum=0; 31. ??????while(values.hasNext()){ 32. ????????sum+=values.next().get(); 33. ??????} 34. ??????output.collect(key,newIntWritable(sum)); 35. ????} 36. ??} 37. 38. ??publicstaticvoidmain(String[]args)throwsException{ 39. ????JobConfconf=newJobConf(WordCount.class); 40. ????conf.setJobName("wordcount"); 41. 42. ????conf.setOutputKeyClass(Text.class); 43. ????conf.setOutputValueClass(IntWritable.class); 44. 45. ????conf.setMapperClass(Map.class); 46. ????conf.setCombinerClass(Reduce.class); 47. ????conf.setReducerClass(Reduce.class); 48. 49. ????conf.setInputFormat(TextInputFormat.class); 50. ????conf.setOutputFormat(TextOutputFormat.class); 51. 52. ????FileInputFormat.setInputPaths(conf,newPath(args[0])); 53. ????FileOutputFormat.setOutputPath(conf,newPath(args[1])); 54. 55. ????JobClient.runJob(conf); 57. ??} 58. } 59. 用法

假设环境变量HADOOP_HOME对应安装时的根目录,HADOOP_VERSION对应Hadoop的当前安装版本,编译WordCount.java来创建jar包,可如下操作:

$mkdirwordcount_classes$javac-classpath${HADOOP_HOME}/hadoop-${HADOOP_VERSION}-core.jar-dwordcount_classesWordCount.java$jar-cvf/usr/joe/wordcount.jar-Cwordcount_classes/.

假设:

/usr/joe/wordcount/input-是HDFS中的输入路径

/usr/joe/wordcount/output-是HDFS中的输出路径

用示例文本文件做为输入:

$bin/hadoopdfs-ls/usr/joe/wordcount/input//usr/joe/wordcount/input/file01/usr/joe/wordcount/input/file02$bin/hadoopdfs-cat/usr/joe/wordcount/input/file01HelloWorldByeWorld$bin/hadoopdfs-cat/usr/joe/wordcount/input/file02HelloHadoopGoodbyeHadoop

运行应用程序:

$bin/hadoopjar/usr/joe/wordcount.jarorg.myorg.WordCount/usr/joe/wordcount/input/usr/joe/wordcount/output

输出是:

$bin/hadoopdfs-cat/usr/joe/wordcount/output/part-00000Bye1Goodbye1Hadoop2Hello2World2

应用程序能够使用-files选项来指定一个由逗号分隔的路径列表,这些路径是task的当前工作目录。使用选项-libjars可以向map和reduce的classpath中添加jar包。使用-archives选项程序可以传递档案文件做为参数,这些档案文件会被解压并且在task的当前工作目录下会创建一个指向解压生成的目录的符号链接(以压缩包的名字命名)。有关命令行选项的更多细节请参考Commandsmanual。

使用-libjars和-files运行wordcount例子:hadoopjarhadoop-examples.jarwordcount-filescachefile.txt-libjarsmylib.jarinputoutput



献花(0)
+1
(本文系成都宝自来首藏)