分享

Intellij IDEA中配置依赖 ---- Hadoop 普通 java 项目

 dzh1121 2015-04-24
  1. 创建普通Java项目

  2. 添加引用

    1. FileProject Structure...ModulesDependencies+Library...Java

    2. 选择/usr/local/Cellar/hadoop/2.5.2/libexec/share/hadoop 目录下除了httpfs外的全部文件夹。   (Ubuntu is : /usr/local/hadoop/share/hadoop/  )

    3. Name可以随便写,例如”common”,OK。

    4. +Jars or directories...

    5. 选择/usr/local/Cellar/hadoop/2.5.2/libexec/share/hadoop/common/lib    (Ubuntu is : /usr/local/hadoop/share/hadoop/common/lib )

    此时Dependencies内应该总共增加了一个”common”和一个”lib”目录。

  3. 修改Project Structure中的Artifacts,增加Jar包的生成配置。

生成HelloHadoop jar包

生成jar包的过程也比较简单,
1.选择菜单File->Project Structure,弹出Project Structure的设置对话框。
2.选择左边的Artifacts后点击上方的“+”按钮
3.在弹出的框中选择jar->from moduls with dependencies..
4.选择要启动的类,然后 确定
5.应用之后,对话框消失。在IDEA选择菜单Build->Build Artifacts,选择Build或者Rebuild后即可生成,生成的jar文件位于工程项目目录的out/artifacts下。

(这样对于LInux应该就可以了,但是对Mac OS x是不行的。因为OSX by default, the filesystem is set to case-insensitive . 一个解决方法是 转用maven 来build project. 然后



可以写代码编译打包了。



编写WordCount MapReduce程序

这里直接使用了官方的代码

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
import java.io.IOException;
import java.util.*;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.conf.*;
import org.apache.hadoop.io.*;
import org.apache.hadoop.mapreduce.*;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;
public class WordCount {
public static class Map extends Mapper<LongWritable, Text, Text, IntWritable> {
private final static IntWritable one = new IntWritable(1);
private Text word = new Text();
public void map(LongWritable key, Text value, Context context)
throws IOException, InterruptedException {
String line = value.toString();
StringTokenizer tokenizer = new StringTokenizer(line);
while (tokenizer.hasMoreTokens()) {
word.set(tokenizer.nextToken());
context.write(word, one);
}
}
}
public static class Reduce extends Reducer<Text, IntWritable, Text, IntWritable> {
public void reduce(Text key, Iterable<IntWritable> values, Context context)
throws IOException, InterruptedException {
int sum = 0;
for (IntWritable val : values) {
sum += val.get();
}
context.write(key, new IntWritable(sum));
}
}
public static void main(String[] args) throws Exception {
Configuration conf = new Configuration();
Job job = new Job(conf, "wordcount");
job.setJarByClass(WordCount.class); //注意,必须添加这行,否则hadoop无法找到对应的class
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
job.setMapperClass(Map.class);
job.setReducerClass(Reduce.class);
job.setInputFormatClass(TextInputFormat.class);
job.setOutputFormatClass(TextOutputFormat.class);
FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
job.waitForCompletion(true);
}
}

需要注意的是需要在官方代码中加入job.setJarByClass(WordCount.class);这一行,具体解释可以参考这里


运行HelloHadoop jar包

将生成的HelloHadoop.jar传送到hadoop集群的Name node节点上。
处于测试目的,简单写了一个测试数据文本wctest.txt

1
2
3
4
this is hadoop test string
hadoop hadoop
test test
string string string

将该测试文本传到HDFS

1
2
[hdfs@172-22-195-15 data]$ hdfs dfs -mkdir /user/chenbiaolong/wc_test_input
[hdfs@172-22-195-15 data]$ hdfs dfs -put wctest.txt /user/chenbiaolong/wc_test_input

cd 到jar包对应的目录,执行HelloHadoop jar包

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
[hdfs@172-22-195-15 code]$ cd WorkCount/
[hdfs@172-22-195-15 WorkCount]$ ls
HelloHadoop.jar
[hdfs@172-22-195-15 WorkCount]$ hadoop jar HelloHadoop.jar WordCount /user/chenbiaolong/wc_test_input /user/chenbiaolong/wc_test_output
15/03/26 15:54:19 INFO impl.TimelineClientImpl: Timeline service address: http://:8188/ws/v1/timeline/
15/03/26 15:54:19 INFO client.RMProxy: Connecting to ResourceManager at /172.22.195.17:8050
15/03/26 15:54:20 WARN mapreduce.JobSubmitter: Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this.
15/03/26 15:54:20 INFO input.FileInputFormat: Total input paths to process : 1
15/03/26 15:54:21 INFO mapreduce.JobSubmitter: number of splits:1
15/03/26 15:54:21 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1427255014010_0005
15/03/26 15:54:21 INFO impl.YarnClientImpl: Submitted application application_1427255014010_0005
15/03/26 15:54:21 INFO mapreduce.Job: The url to track the job: http://172-22-195-17.com:8088/proxy/application_1427255014010_0005/
15/03/26 15:54:21 INFO mapreduce.Job: Running job: job_1427255014010_0005
15/03/26 15:54:28 INFO mapreduce.Job: Job job_1427255014010_0005 running in uber mode : false
15/03/26 15:54:28 INFO mapreduce.Job: map 0% reduce 0%
15/03/26 15:54:34 INFO mapreduce.Job: map 100% reduce 0%
15/03/26 15:54:41 INFO mapreduce.Job: map 100% reduce 100%
15/03/26 15:54:42 INFO mapreduce.Job: Job job_1427255014010_0005 completed successfully
15/03/26 15:54:43 INFO mapreduce.Job: Counters: 49
File System Counters
FILE: Number of bytes read=150
FILE: Number of bytes written=225815
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=210
HDFS: Number of bytes written=37
HDFS: Number of read operations=6
HDFS: Number of large read operations=0
HDFS: Number of write operations=2
Job Counters
Launched map tasks=1
Launched reduce tasks=1
Rack-local map tasks=1
Total time spent by all maps in occupied slots (ms)=4133
Total time spent by all reduces in occupied slots (ms)=4793
Total time spent by all map tasks (ms)=4133
Total time spent by all reduce tasks (ms)=4793
Total vcore-seconds taken by all map tasks=4133
Total vcore-seconds taken by all reduce tasks=4793
Total megabyte-seconds taken by all map tasks=16928768
Total megabyte-seconds taken by all reduce tasks=19632128
Map-Reduce Framework
Map input records=4
Map output records=12
Map output bytes=120
Map output materialized bytes=150
Input split bytes=137
Combine input records=0
Combine output records=0
Reduce input groups=5
Reduce shuffle bytes=150
Reduce input records=12
Reduce output records=5
Spilled Records=24
Shuffled Maps =1
Failed Shuffles=0
Merged Map outputs=1
GC time elapsed (ms)=91
CPU time spent (ms)=3040
Physical memory (bytes) snapshot=1466998784
Virtual memory (bytes) snapshot=8678326272
Total committed heap usage (bytes)=2200961024
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters
Bytes Read=73
File Output Format Counters
Bytes Written=37
[hdfs@172-22-195-15 WorkCount]$

结果被输出到/user/chenbiaolong/wc_test_output

1
2
3
4
5
6
7
8
9
10
11
[hdfs@172-22-195-15 WorkCount]$ hdfs dfs -ls /user/chenbiaolong/wc_test_output
Found 2 items
-rw-r--r-- 3 hdfs hdfs 0 2015-03-26 15:54 /user/chenbiaolong/wc_test_output/_SUCCESS
-rw-r--r-- 3 hdfs hdfs 37 2015-03-26 15:54 /user/chenbiaolong/wc_test_output/part-r-00000
[hdfs@172-22-195-15 WorkCount]$ hdfs dfs -cat /user/chenbiaolong/wc_test_output/part-r-00000
hadoop 3
is 1
string 4
test 3
this 1
[hdfs@172-22-195-15 WorkCount]$

可以看出我们已经顺利得到正确结果。

    本站是提供个人知识管理的网络存储空间,所有内容均由用户发布,不代表本站观点。请注意甄别内容中的联系方式、诱导购买等信息,谨防诈骗。如发现有害或侵权内容,请点击一键举报。
    转藏 分享 献花(0

    0条评论

    发表

    请遵守用户 评论公约

    类似文章 更多