搜索

分享

QQ空间 QQ好友新浪微博微信

Hadoop中调试（mapreduce）map与redcue信息的输出办法

whbsdu 2015-01-08

展开全文

Hadoop调试是比较麻烦的事情，考虑到只能通过reduce输出数据，我们可以把调试信息输出到reduce中，然后固定到某个文件中。

我们可以把所有的调试数据都是用key=“Debug”，调试信息作为value=“debugInfo”。
（1）在map中直接使用
output.collect(new Text("debug"), new Text("调试信息"));
（2）在reduce中判断

if(key. equals ("debug"))
{
while (values.hasNext())
{
String line = values.next().toString();
output.collect(new Text("debug"), new Text(line));
}
}

复制代码

(3)增加类ReportOutFormat

public static class ReportOutFormat<K extends WritableComparable<?>, V extends Writable>
extends MultipleOutputFormat<K, V> {
private TextOutputFormat<K, V> theTextOutputFormat = null;
@Override
protected RecordWriter<K, V> getBaseRecordWriter(FileSystem fs,
JobConf job, String name, Progressable arg3) throws IOException {
if (theTextOutputFormat == null) {
theTextOutputFormat = new TextOutputFormat<K, V>();
}
return theTextOutputFormat.getRecordWriter(fs, job, name, arg3);
}
@Override
protected String generateFileNameForKeyValue(K key, V value, String name) {
if(key.equals("debug")) ///注意这个判断
return "debug"+name;
return name ;
}
}

复制代码

（4）在configJob里面添加代码

protected void configJob(JobConf conf)
{
conf.setMapOutputKeyClass(Text.class);
conf.setMapOutputValueClass(Text.class);
conf.setOutputKeyClass(Text.class);
conf.setOutputValueClass(Text.class);
conf.setOutputFormat(ReportOutFormat.class); //增加该行
}

复制代码

这样在输出文件中我们就可以得到调试信息了。这个办法有点曲线救国的意思，不知道有没有其他方便的办法。

这里补充，还可以通过其他方法来进行调试：

可以通过 context.getCounter输出到控制台：
Counter countPrint1 = context.getCounter("Map中循环strScore", strScore);
countPrint1.increment(1l);

上面需要导入包：import org.apache.hadoop.mapreduce.Counter;

如下实例：将这两个类放到如项目中，或则为了验证效果，可以直接把下面内容放入项目中
Counter countPrint1 = context.getCounter("Map中循环strScore", “输出信息”);
countPrint1.increment(1l);

实例：

// map类
static class MyMapper extends
Mapper<LongWritable, Text, Text, LongWritable> {
protected void map(LongWritable key, Text value, Context context)
throws IOException, InterruptedException {
String line = value.toString();
Counter countPrint = context.getCounter("Map输出传递Value", line);
countPrint.increment(1l);
// 将输入的数据首先按行进行分割
StringTokenizer tokenizerArticle = new StringTokenizer(line, "\n");
// 分别对每一行进行处理
while (tokenizerArticle.hasMoreElements()) {
// 每行按空格划分
StringTokenizer tokenizerLine = new StringTokenizer(
tokenizerArticle.nextToken());
// String strName = tokenizerLine.nextToken();// 学生姓名部分
String strScore = tokenizerLine.nextToken();// 成绩部分
Counter countPrint1 = context.getCounter("Map中循环strScore", strScore);
countPrint1.increment(1l);
// Text name = new Text(strName);
int scoreInt = Integer.parseInt(strScore);
// 输出姓名和成绩
context.write(new Text("avg"), new LongWritable(scoreInt));
}
}
}

复制代码

// reduce类
static class MyReduce extends
Reducer<Text, LongWritable, Text, LongWritable> {
@Override
protected void reduce(Text k2, java.lang.Iterable<LongWritable> v2s,
Context ctx) throws java.io.IOException, InterruptedException {
long sum = 0;
long count = 0;
Iterator<LongWritable> iterator = v2s.iterator();
while (iterator.hasNext()) {
sum += iterator.next().get();// 计算总分
count++;// 统计总的科目数
}
long average = (long) sum / count;// 计算平均成绩
ctx.write(k2, new LongWritable(average));
Counter countPrint1 = ctx.getCounter("Redue调用次数","空");
countPrint1.increment(1l);
}
}

复制代码

本站是提供个人知识管理的网络存储空间，所有内容均由用户发布，不代表本站观点。请注意甄别内容中的联系方式、诱导购买等信息，谨防诈骗。如发现有害或侵权内容，请点击一键举报。

转藏分享

QQ空间 QQ好友新浪微博微信

献花（0） +1

来自： whbsdu > 《Hadoop》

举报/认领

0条评论

请遵守用户评论公约

类似文章 更多

whbsdu

关注对话

TA的最新馆藏

HBaes遇到问题：ERROR: org.apache.hadoop.hbase.MasterNotRunningException: Retried 7 times
如何修改tomcat7的端口和用户名密码？
免安装的Tomcat服务器的基本配置和安装
如何查看Linux系统是64位还是32位
浅谈UML的概念和模型之UML九种图
HDFS的副本存放策略

喜欢该文的人也喜欢更多

热门阅读换一换