在安装好的Ubuntu系统下添加具有sudo权限的用户。
root@nodeA:~#sudoadduserzyx
Addinguser`zyx''...
Addingnewgroup`zyx''(1001)...
Addingnewuser`zyx''(1001)withgroup`zyx''...
Creatinghomedirectory`/home/zyx''...
Copyingfilesfrom`/etc/skel''...
EnternewUNIXpassword:
RetypenewUNIXpassword:
passwd:passwordupdatedsuccessfully
Changingtheuserinformationforzyx
Enterthenewvalue,orpressENTERforthedefault
FullName[]:^Cadduser:`/usr/bin/chfnzyx''exitedfromsignal2.Exiting.
root@nodeA:~#
root@nodeA:~#sudousermod-Gadmin-azyx
root@nodeA:~#
建立SSH无密码登陆
(1)namenode上实现无密码登陆本机
zyx@nodeA:~$ssh-keygen-tdsa-P''''-f~/.ssh/id_dsa
Generatingpublic/privatedsakeypair.
Createddirectory''/home/zyx/.ssh''.
Youridentificationhasbeensavedin/home/zyx/.ssh/id_dsa.
Yourpublickeyhasbeensavedin/home/zyx/.ssh/id_dsa.pub.
Thekeyfingerprintis:
65:2e:e0:df:2e:61:a5:19:6a:ab:0e:38:45:a9:6a:2bzyx@nodeA
Thekey''srandomartimageis:
+--[DSA1024]----+
||
|.|
|o.o|
|o...+.|
|....S=.|
|.oo.=o|
|+...o...|
|E......|
|...o...|
+-----------------+
zyx@nodeA:~$cat~/.ssh/id_dsa.pub>>~/.ssh/authorized_keys
zyx@nodeA:~$
(2)实现namenode无密码登陆其他datanode
hadoop@nodeB:~$scphadoop@nodea:/home/hadoop/.ssh/id_dsa.pub/home/hadoop
hadoop@nodea''spassword:
id_dsa.pub100%6020.6KB/s00:00
hadoop@nodeB:~$catid_dsa.pub>>.ssh/authorized_keys
hadoop@nodeB:~$sudoufwdisable
复制JDK(jdk-6u20-linux-i586.bin)文件到linux
利用F-SecureSSHFileTransferTrial工具,直接拖拽
jdk-6u20-linux-i586.bin的安装和配置
(1)安装
zyx@nodeA:~$ls
Examplesjdk
zyx@nodeA:~$cdjdk
zyx@nodeA:~/jdk$ls
jdk-6u20-linux-i586.bin
zyx@nodeA:~/jdk$chmoda+xjdk
zyx@nodeA:~/jdk$./jdk
接下来显示许可协议,然后选择yes,然后按Enter键,安装结束。
zyx@nodeA:~/jdk$ls
jdk1.6.0_20jdk-6u20-linux-i586.bin
(2)配置
用root@nodeA:/home/zyx#vi.bashrc打开bashrc,然后在最后加入下面几行:
exportJAVA_HOME=/home/zyx/jdk/jdk1.6.0_20
exportJRE_HOME=/home/zyx/jdk/jdk1.6.0_20/jre
exportCLASS_PATH=$CLASSPATH:$JAVA_HOME/lib:$JRE_HOME/lib
exportPATH=$JAVA_HOME/bin:$JRE_HOME/bin:$PATH:$HOMR/bin
Hadoop的安装
下载地址:
http://labs.renren.com/apache-mirror/hadoop/core/hadoop-0.20.2/hadoop-0.20.2.tar.gz
把hadoop-0.20.2.tar.gz放到home/zyx/hadoop下,然后解压该文件
zyx@nodeB:~/hadoop$tar-zvxfhadoop-0.20.2.tar.gz
设置环境变量,添加到home/zyx/.bashrc
zyx@nodeA:~$vi.bashrc
exportHADOOP_HOME=/home/zyx/hadoop/hadoop-0.20.2
exportPATH=$HADOOP_HOME/bin:$PATH
Hadoop的配置
在conf/hadoop-env.sh中配置java环境
exportJAVA_HOME=/home/zyx/jdk/jdk/jdk1.6.0_20
配置conf/masters,slaves文件,只需要在nodename上配置。
配置core-site.xml,hdfs-site.xml,mapred-site.xml
zyx@nodeC:~/hadoop-0.20.2/conf$morecore-site.xml
fs.default.name
#hdfs://192.168.1.103:54310
hdfs://192.168.1.103:9000
zyx@nodeC:~/hadoop-0.20.2/conf$morehdfs-site.xml
dfs.replication
1
zyx@nodeC:~/hadoop-0.20.2/conf$moremapred-site.xml
mapred.job.tracker
#hdfs://192.168.1.103:54320
hdfs://192.168.1.103:9001
Hadoop的运行
(0)格式化:
zyx@nodeC:~/hadoop-0.20.2/bin$hadoopnamenode–format
(1)用jps查看进程:
zyx@nodeC:~/hadoop-0.20.2/bin$jps
31030NameNode
31488TaskTracker
31283SecondaryNameNode
31372JobTracker
31145DataNode
31599Jps
(2)查看集群状态
zyx@nodeC:~/hadoop-0.20.2/bin$hadoopdfsadmin-report
ConfiguredCapacity:304716488704(283.79GB)
PresentCapacity:270065557519(251.52GB)
DFSRemaining:270065532928(251.52GB)
DFSUsed:24591(24.01KB)
DFSUsed%:0%
Underreplicatedblocks:0
Blockswithcorruptreplicas:0
Missingblocks:0
-------------------------------------------------
Datanodesavailable:1(1total,0dead)
Name:192.168.1.103:50010
DecommissionStatus:Normal
ConfiguredCapacity:304716488704(283.79GB)
DFSUsed:24591(24.01KB)
NonDFSUsed:34650931185(32.27GB)
DFSRemaining:270065532928(251.52GB)
DFSUsed%:0%
DFSRemaining%:88.63%
Lastcontact:FriApr2315:39:10CST2010
(3)Stop文件:
zyx@nodeC:~/hadoop-0.20.2/bin$stop-all.sh
stoppingjobtracker
localhost:stoppingtasktracker
stoppingnamenode
localhost:stoppingdatanode
localhost:stoppingsecondarynamenode
运行一个简单JAVA程序
先在本地磁盘建立两个文件file01和file02[cuijj@station1~]$echo"Hellocuijjbyecuijj">file01[cuijj@station1~]$echo"HelloHadoopGoodbyeHadoop">file022)在hdfs中建立一个input目录[cuijj@station1~]$hadoopdfs-mkdirinput将file01和file02拷贝到hdfs的input目录下zyx@nodeC:~$hadoopdfs-copyFromLocal/home/zyx/file0input
查看hdfs中有没有input目录zyx@nodeC:~$hadoopdfs-ls
Found1items
drwxr-xr-x-zyxsupergroup02010-04-2316:40/user/zyx/input
查看input目录下有没有复制成功file01和file02zyx@nodeC:~$hadoopdfs-lsinput
Found2items
-rw-r--r--1zyxsupergroup02010-04-2316:40/user/zyx/input/file01
-rw-r--r--1zyxsupergroup02010-04-2316:40/user/zyx/input/file02
执行wordcount(确保hdfs上没有output目录)zyx@nodeC:~/hadoop-0.20.2$hadoopjarhadoop-0.20.2-examples.jarwordcountinputoutput
10/04/2409:25:10INFOinput.FileInputFormat:Totalinputpathstoprocess:2
10/04/2409:25:11INFOmapred.JobClient:Runningjob:job_201004240840_0001
10/04/2409:25:12INFOmapred.JobClient:map0%reduce0%
10/04/2409:25:22INFOmapred.JobClient:map100%reduce0%
10/04/2409:25:34INFOmapred.JobClient:map100%reduce100%
10/04/2409:25:36INFOmapred.JobClient:Jobcomplete:job_201004240840_0001
10/04/2409:25:36INFOmapred.JobClient:Counters:17
10/04/2409:25:36INFOmapred.JobClient:JobCounters
10/04/2409:25:36INFOmapred.JobClient:Launchedreducetasks=1
10/04/2409:25:36INFOmapred.JobClient:Launchedmaptasks=2
10/04/2409:25:36INFOmapred.JobClient:Data-localmaptasks=2
10/04/2409:25:36INFOmapred.JobClient:FileSystemCounters
10/04/2409:25:36INFOmapred.JobClient:FILE_BYTES_READ=79
10/04/2409:25:36INFOmapred.JobClient:HDFS_BYTES_READ=50
10/04/2409:25:36INFOmapred.JobClient:FILE_BYTES_WRITTEN=228
10/04/2409:25:36INFOmapred.JobClient:HDFS_BYTES_WRITTEN=41
10/04/2409:25:36INFOmapred.JobClient:Map-ReduceFramework
10/04/2409:25:36INFOmapred.JobClient:Reduceinputgroups=5
10/04/2409:25:36INFOmapred.JobClient:Combineoutputrecords=6
10/04/2409:25:36INFOmapred.JobClient:Mapinputrecords=2
10/04/2409:25:36INFOmapred.JobClient:Reduceshufflebytes=85
10/04/2409:25:36INFOmapred.JobClient:Reduceoutputrecords=5
10/04/2409:25:36INFOmapred.JobClient:SpilledRecords=12
10/04/2409:25:36INFOmapred.JobClient:Mapoutputbytes=82
10/04/2409:25:36INFOmapred.JobClient:Combineinputrecords=8
10/04/2409:25:36INFOmapred.JobClient:Mapoutputrecords=8
10/04/2409:25:36INFOmapred.JobClient:Reduceinputrecords=6
查看运行结果
zyx@nodeC:~/hadoop-0.20.2$hadoopfs-catoutput/part-r-00000
Goodbye1
Hadoop2
Hello2
bye1
cuijj2
MapReduce的安装
MapReduce程序的运行
http://www.michael-noll.com/wiki/Running_Hadoop_On_Ubuntu_Linux_(Single-Node_Cluster)#Excursus:_Hadoop_Distributed_File_System_.28HDFS.29
11.对于.java程序的hadoop编译:
root@nodeC:/home/zyx/hadoop-0.20.2#javac-classpath/home/zyx/hadoop-0.20.2/hadoop-0.20.2-core.jar:/home/zyx/hadoop-0.20.2/lib/commons-cli-1.2.jar-d/home/zyx/wordcount_class/home/zyx/hadoop-0.20.2/src/examples/org/apache/hadoop/examples/WordCount.java
12.把.class文件生成.jar文件
root@nodeC:/home/zyx/wordcount_class/org/apache/hadoop/examples#jar-cvf/home/zyx/wordcount.jar/home/zyx/wordcount_class/.
addedmanifest
adding:home/zyx/wordcount_class/(in=0)(out=0)(stored0%)
adding:home/zyx/wordcount_class/org/(in=0)(out=0)(stored0%)
adding:home/zyx/wordcount_class/org/apache/(in=0)(out=0)(stored0%)
adding:home/zyx/wordcount_class/org/apache/hadoop/(in=0)(out=0)(stored0%)
adding:home/zyx/wordcount_class/org/apache/hadoop/examples/(in=0)(out=0)(stored0%)
adding:home/zyx/wordcount_class/org/apache/hadoop/examples/WordCount.class(in=1911)(out=996)(deflated47%)
adding:home/zyx/wordcount_class/org/apache/hadoop/examples/WordCount$TokenizerMapper.class(in=1790)(out=765)(deflated57%)
adding:home/zyx/wordcount_class/org/apache/hadoop/examples/WordCount$IntSumReducer.class(in=1789)(out=746)(deflated58%)
adding:WordCount.class(in=1911)(out=996)(deflated47%)
adding:WordCount$TokenizerMapper.class(in=1790)(out=765)(deflated57%)
例子:WordCountv1.0
在深入细节之前,让我们先看一个Map/Reduce的应用示例,以便对它们的工作方式有一个初步的认识。
WordCount是一个简单的应用,它可以计算出指定数据集中每一个单词出现的次数。
这个应用适用于单机模式,伪分布式模式或完全分布式模式三种Hadoop安装方式。
源代码
WordCount.java 1. packageorg.myorg; 2. 3. importjava.io.IOException; 4. importjava.util.; 5. 6. importorg.apache.hadoop.fs.Path; 7. importorg.apache.hadoop.conf.; 8. importorg.apache.hadoop.io.; 9. importorg.apache.hadoop.mapred.; 10. importorg.apache.hadoop.util.; 11. 12. publicclassWordCount{ 13. 14. ??publicstaticclassMapextendsMapReduceBaseimplementsMapper{ 15. ????privatefinalstaticIntWritableone=newIntWritable(1); 16. ????privateTextword=newText(); 17. 18. ????publicvoidmap(LongWritablekey,Textvalue,OutputCollectoroutput,Reporterreporter)throwsIOException{ 19. ??????Stringline=value.toString(); 20. ??????StringTokenizertokenizer=newStringTokenizer(line); 21. ??????while(tokenizer.hasMoreTokens()){ 22. ????????word.set(tokenizer.nextToken()); 23. ????????output.collect(word,one); 24. ??????} 25. ????} 26. ??} 27. 28. ??publicstaticclassReduceextendsMapReduceBaseimplementsReducer{ 29. ????publicvoidreduce(Textkey,Iteratorvalues,OutputCollectoroutput,Reporterreporter)throwsIOException{ 30. ??????intsum=0; 31. ??????while(values.hasNext()){ 32. ????????sum+=values.next().get(); 33. ??????} 34. ??????output.collect(key,newIntWritable(sum)); 35. ????} 36. ??} 37. 38. ??publicstaticvoidmain(String[]args)throwsException{ 39. ????JobConfconf=newJobConf(WordCount.class); 40. ????conf.setJobName("wordcount"); 41. 42. ????conf.setOutputKeyClass(Text.class); 43. ????conf.setOutputValueClass(IntWritable.class); 44. 45. ????conf.setMapperClass(Map.class); 46. ????conf.setCombinerClass(Reduce.class); 47. ????conf.setReducerClass(Reduce.class); 48. 49. ????conf.setInputFormat(TextInputFormat.class); 50. ????conf.setOutputFormat(TextOutputFormat.class); 51. 52. ????FileInputFormat.setInputPaths(conf,newPath(args[0])); 53. ????FileOutputFormat.setOutputPath(conf,newPath(args[1])); 54. 55. ????JobClient.runJob(conf); 57. ??} 58. } 59. 用法
假设环境变量HADOOP_HOME对应安装时的根目录,HADOOP_VERSION对应Hadoop的当前安装版本,编译WordCount.java来创建jar包,可如下操作:
$mkdirwordcount_classes$javac-classpath${HADOOP_HOME}/hadoop-${HADOOP_VERSION}-core.jar-dwordcount_classesWordCount.java$jar-cvf/usr/joe/wordcount.jar-Cwordcount_classes/.
假设:
/usr/joe/wordcount/input-是HDFS中的输入路径
/usr/joe/wordcount/output-是HDFS中的输出路径
用示例文本文件做为输入:
$bin/hadoopdfs-ls/usr/joe/wordcount/input//usr/joe/wordcount/input/file01/usr/joe/wordcount/input/file02$bin/hadoopdfs-cat/usr/joe/wordcount/input/file01HelloWorldByeWorld$bin/hadoopdfs-cat/usr/joe/wordcount/input/file02HelloHadoopGoodbyeHadoop
运行应用程序:
$bin/hadoopjar/usr/joe/wordcount.jarorg.myorg.WordCount/usr/joe/wordcount/input/usr/joe/wordcount/output
输出是:
$bin/hadoopdfs-cat/usr/joe/wordcount/output/part-00000Bye1Goodbye1Hadoop2Hello2World2
应用程序能够使用-files选项来指定一个由逗号分隔的路径列表,这些路径是task的当前工作目录。使用选项-libjars可以向map和reduce的classpath中添加jar包。使用-archives选项程序可以传递档案文件做为参数,这些档案文件会被解压并且在task的当前工作目录下会创建一个指向解压生成的目录的符号链接(以压缩包的名字命名)。有关命令行选项的更多细节请参考Commandsmanual。
使用-libjars和-files运行wordcount例子:hadoopjarhadoop-examples.jarwordcount-filescachefile.txt-libjarsmylib.jarinputoutput
|
|