博客
关于我
强烈建议你试试无所不能的chatGPT,快点击我
快速入门MapReduc① 实现WordCount
阅读量:3951 次
发布时间:2019-05-24

本文共 4954 字,大约阅读时间需要 16 分钟。

目录


1.需要处理的数据

hello wordword counthello MapReduce

2.创建maven项目pom.xml

cloudera
https://repository.cloudera.com/artifactory/cloudera-repos/
org.apache.Hadoop
Hadoop-client
2.6.0-mr1-cdh5.14.0
org.apache.Hadoop
Hadoop-common
2.6.0-cdh5.14.0
org.apache.Hadoop
Hadoop-hdfs
2.6.0-cdh5.14.0
org.apache.Hadoop
Hadoop-mapreduce-client-core
2.6.0-cdh5.14.0
junit
junit
4.11
test
org.testng
testng
RELEASE
org.apache.maven.plugins
maven-compiler-plugin
3.0
1.8
1.8
UTF-8
org.apache.maven.plugins
maven-shade-plugin
2.4.3
package
shade
true

3.编写map类

package com.czxy.wordCount;import org.apache.hadoop.io.LongWritable;import org.apache.hadoop.io.Text;import org.apache.hadoop.mapreduce.Mapper;import java.io.IOException;public class WordCountMapper extends Mapper
{ @Override protected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException { // 将 Text类型转换为String 类型 String s = value.toString(); // 安装空格切分 String[] split = s.split(" "); // 循环遍历输出 for (String s1 : split) { // 输出 key=单词 value =1 context.write(new Text(s1), new LongWritable(1)); } }}

4.编写Reduce类

package com.czxy.wordCount;import org.apache.hadoop.io.LongWritable;import org.apache.hadoop.io.Text;import org.apache.hadoop.mapreduce.Reducer;import java.io.IOException;public class WordCountReduce extends Reducer
{ @Override protected void reduce(Text key, Iterable
values, Context context) throws IOException, InterruptedException { // 定义一个变量用来记录单词出现的次数 int sumCount=0; for (LongWritable value : values) { sumCount+=value.get(); } // 结果数据 context.write(key, new LongWritable(sumCount)); }}

5.编写启动类

package com.czxy.wordCount;import org.apache.hadoop.conf.Configuration;import org.apache.hadoop.conf.Configured;import org.apache.hadoop.fs.Path;import org.apache.hadoop.io.LongWritable;import org.apache.hadoop.io.Text;import org.apache.hadoop.mapreduce.Job;import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;import org.apache.hadoop.util.Tool;import org.apache.hadoop.util.ToolRunner;public class WordCountDriver extends Configured implements Tool {    @Override    public int run(String[] args) throws Exception {        // 获取job        Job job = Job.getInstance(new Configuration());        //  设置支持jar执行        job.setJarByClass(WordCountDriver.class);        // 设置执行的napper        job.setMapperClass(WordCountMapper.class);        // 设置map输出的key类型        job.setMapOutputKeyClass(Text.class);        // 设置map输出value类型        job.setMapOutputValueClass(LongWritable.class);        // 设置执行的reduce        job.setReducerClass(WordCountReduce.class);        // 设置reduce输出key的类型        job.setOutputKeyClass(Text.class);        // 设置reduce输出value的类型        job.setOutputValueClass(LongWritable.class);        // 设置文件输入        job.setInputFormatClass(TextInputFormat.class);        TextInputFormat.addInputPath(job, new Path("./data/wordCount/"));        // 设置文件输出        job.setOutputFormatClass(TextOutputFormat.class);        TextOutputFormat.setOutputPath(job, new Path("./outPut/wordCount/"));        // 设置启动类        boolean b = job.waitForCompletion(true);        return b ? 0 : 1;    }    public static void main(String[] args) throws Exception {        // 调用启动方法        ToolRunner.run(new WordCountDriver(), args);    }}

6.执行的结果

MapReduce	1count	1hello	2word	2

1

转载地址:http://zakzi.baihongyu.com/

你可能感兴趣的文章
CoreLocation笔记 by STP
查看>>
Application Transport Security has blocked a cleartext HTTP (http://) 解决方案
查看>>
The identity used to sign the executable is no longer valid.解决方案
查看>>
Xcode增加pch文件
查看>>
CocoaPods安装和使用笔记 by STP
查看>>
Could not find developer disk image-解决方案
查看>>
升级Xcode之后VVDocumenter-Xcode不能用的解决办法
查看>>
iOS开发常见报错及解决方案 by STP
查看>>
SVN(Cornerstone)屏蔽/忽略不需要版本控制的UserInterfaceState.xcuserstate
查看>>
IOS 8 以上版本 设置applicationIconBadgeNumber和消息推送
查看>>
git常用命令
查看>>
Java 基本数据类型笔记by STP
查看>>
IDEA创建Maven项目时 loading archetype list转菊花转十年解决方案
查看>>
Mac启动tomcat
查看>>
报错: java.sql.SQLException: The server time zone value '�й�' is unrecognized or represents more ...
查看>>
使用xshell对服务器上的sql文件进行操作(mysql导入Linux)
查看>>
Spirngboot 后台操作一切正常并无报错,但是前端出现404错误
查看>>
java错误:java.lang.String can not be cast to java.math.BigDecimal
查看>>
Linux导出数据库文件mysql
查看>>
xshell查看程序代码后台的动态日志
查看>>