分享

MapReduce初级编程单词统计

 青语csl 2020-05-15

实验序号

实验项目

MapReduce编程初级实践- 单词统计

实验成绩


一、实验目的

1) 理解的MapReduce编程的基本方法;

2)掌握用MapReduce编程模型解决单词统计问题。

二、实验内容

1)在eclipse 里创建新工程并添加Mapreducehadoop所需要的jar包;

2)写基于MapReduce方法的单词统计程序(包含map方法、reduce方法以及main方法)

3)把单词统计程序进行打包 生成jar文件;

4)启动hadoop ,在hdfs上创建inputoutput文件夹,把需要处理的文件上传到input文件夹里;

5)运行 单词统程序(jar文件)

6)查看结果(保存在output文件夹里)。

三、实验过程及分析

前期准备条件:eclipse\centos7

(1) 下载并安装Hadoop-Eclipse-Plugin

I.下载插件至桌面

Ii.解压并复制release中的hadoop-eclipse-plugin-2.6.0.jareclipse安装目录的plugins目录下,并运行eclipse -clean命令使插件生效

(2) 配置Hadoop-Eclipse-Plugin

I.启动hadoop

Ii.启动eclipse,选择Window菜单下的Preference

Ii.切换Map/Reduce开发视图

Window -> Perspective -> Open Perspective -> Other

Iii.建立与Hadoop集群的连接,点击eclipse软件中的Map/Reduce Location面板,在面板中单击右键,选择New Hadoop Location

(3) 在Eclipse中操作HDFS中的文件,output文件夹下可查看结果

(4) 创建MapReduce项目

(5) 编写代码

package org.apache.hadoop.examples;  

    import java.io.IOException;

    import java.util.Iterator;

    import java.util.StringTokenizer;

    import org.apache.hadoop.conf.Configuration;

    import org.apache.hadoop.fs.Path;

    import org.apache.hadoop.io.IntWritable;

    import org.apache.hadoop.io.Text;

    import org.apache.hadoop.mapreduce.Job;

    import org.apache.hadoop.mapreduce.Mapper;

    import org.apache.hadoop.mapreduce.Reducer;

    import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;

    import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

    import org.apache.hadoop.util.GenericOptionsParser;

    public class WordCount {

        public WordCount() {

        }

        public static void main(String[] args) throws Exception {

            Configuration conf = new Configuration();

//            String[] otherArgs = (new GenericOptionsParser(conf, args)).getRemainingArgs();

            String[] otherArgs=new String[]{"input","output"}; /* 直接设置输入参数 */

            if(otherArgs.length < 2) {

                System.err.println("Usage: wordcount <in> [<in>...] <out>");

                System.exit(2);

            }   

            Job job = Job.getInstance(conf, "word count");

            job.setJarByClass(WordCount.class);

            job.setMapperClass(WordCount.TokenizerMapper.class);

            job.setCombinerClass(WordCount.IntSumReducer.class);

            job.setReducerClass(WordCount.IntSumReducer.class);

            job.setOutputKeyClass(Text.class);

            job.setOutputValueClass(IntWritable.class);  

            for(int i = 0; i < otherArgs.length - 1; ++i) {

                FileInputFormat.addInputPath(job, new Path(otherArgs[i]));

            }   

            FileOutputFormat.setOutputPath(job, new Path(otherArgs[otherArgs.length - 1]));

            System.exit(job.waitForCompletion(true)?0:1);

        }   

        public static class IntSumReducer extends Reducer<Text, IntWritable, Text, IntWritable> {

            private IntWritable result = new IntWritable();  

            public IntSumReducer() {

            }

            public void reduce(Text key, Iterable<IntWritable> values, Reducer<Text, IntWritable, Text, IntWritable>.Context context) throws IOException, InterruptedException {

                int sum = 0;

                IntWritable val;

                for(Iterator i$ = values.iterator(); i$.hasNext(); sum += val.get()) {

                    val = (IntWritable)i$.next();

                }  

                this.result.set(sum);

                context.write(key, this.result);

            }

        }

        public static class TokenizerMapper extends Mapper<Object, Text, Text, IntWritable> {

            private static final IntWritable one = new IntWritable(1);

            private Text word = new Text();

            public TokenizerMapper() {

            }

            public void map(Object key, Text value, Mapper<Object, Text, Text, IntWritable>.Context context) throws IOException, InterruptedException {

                StringTokenizer itr = new StringTokenizer(value.toString());

                while(itr.hasMoreTokens()) {

                    this.word.set(itr.nextToken());

                    context.write(this.word, one);

                }  

            }

        }

}

(6) 打包代码

I.编译java程序

Ii.打包编译生成的class文件

(7) 创建input文件

(8) 运行jar文件

查看运行结果:

四、实验总结(心得体会)

本次实验中遇到的问题:

(1)namenode处于安全模式

解决办法:关闭namenode的安全模式

(2)hdfs上传不了文件+上传的文件没有内容

解决办法:重启hdfs

一开始上传不了文件时,百度搜索的问题原因是没有关闭datanodenamenode的防火墙,经过一串命令后发现防火墙早已关闭。后来通过eclipse上传了文件后,发现文件怎么都存储不了内容。此时,我能百度的方法依旧是关闭防火墙,附上的步骤又是一篇上百字的命令。我想,今天我可能要和防火墙来个不死不休的战斗了。突然间,我果断的重启了hdfs。然后结果就出来了。真是奇妙的一天!

    转藏 分享 献花(0

    0条评论

    发表

    请遵守用户 评论公约

    类似文章 更多