Hadoop生态系统的图谱,详细的列举了在Hadoop这个生态系统中出现的各种数据工具。
- 这一切,都起源自Web数据爆炸时代的来临
- 数据抓取系统 - Nutch
- 海量数据怎么存,当然是用分布式文件系统 - HDFS
- 数据怎么用呢,分析,处理
- MapReduce框架,让你编写代码来实现对大数据的分析工作
- 非结构化数据(日志)收集处理 - fuse,webdav, chukwa, flume, Scribe
- 数据导入到HDFS中,至此RDBSM也可以加入HDFS的狂欢了 - Hiho, sqoop
- MapReduce太麻烦,好吧,让你用熟悉的方式来操作Hadoop里的数据 – Pig, Hive, Jaql
- 让你的数据可见 - drilldown, Intellicus用高级语言管理你的任务流 – oozie, Cascading
- Hadoop当然也有自己的监控管理工具 – Hue, karmasphere, eclipse plugin, cacti, ganglia
- 数据序列化处理与任务调度 – Avro, Zookeeper
- 更多构建在Hadoop上层的服务 – Mahout, Elastic map Reduce
- OLTP存储系统 – Hbase
- How did it all start- huge data on the web!
- Nutch built to crawl this web data
- Huge data had to saved- HDFS was born!
- How to use this data?
- Map reduce framework built for coding and running analytics – java, any language-streaming/pipes
- How to get in unstructured data – Web logs, Click streams, Apache logs, Server logs – fuse,webdav, chukwa, flume, Scribe
- Hiho and sqoop for loading data into HDFS – RDBMS can join the Hadoop band wagon!
- High level interfaces required over low level map reduce programming– Pig, Hive, Jaql
- BI tools with advanced UI reporting- drilldown etc- Intellicus
- Workflow tools over Map-Reduce processes and High level languages
- Monitor and manage hadoop, run jobs/hive, view HDFS – high level view- Hue, karmasphere, eclipse plugin, cacti, ganglia
- Support frameworks- Avro (Serialization), Zookeeper (Coordination)
- More High level interfaces/uses- Mahout, Elastic map Reduce
- OLTP- also possible – Hbase
|