分享

hadoop经典书籍

 看风景D人 2014-12-30
hadoop快速入门网页版:
http://hadoop./docs/r1.0.4/cn/quickstart.html

----------------------------------------------------------
1. 有权威指南也基本80%了。另外20%不是书本上的,多去开源社区走走。
hadoop实战-->hadoop权威指南-->hadoop技术内幕
2. 在我学习hadoop的过程中,对中文版的书籍重点推荐以下两本(我都看过,并做过相应的实验)

第一本 hadoop实战

第二本 hadoop权威指南


另外有一本hadoop实战(hadoop in action)也可以。

更重要的还有一本英文版的书好像叫Pro hadoop,非常好,在最初部署和调整环境的时候,给我很大的帮助和指导。

-----------------------------------------------------------------------------------------

转载自  http://www./thread-970-1-1.html


Hadoop 官方网站 

Hadoop - Cloudera

Hadoop - Yahoo!

Hadoop - Wiki

Doug Cutting - Wiki

Doug Cutting - blog

 

Hadoop 包括下面这些子项目: 

HDFS: A distributed file system that provides high throughput access to application data. HDFS:一个能够提供高吞吐量访问应用数据的分布式文件系统。其思想来自于 Google 的The Google File System (GFS)MapReduce: A software framework for distributed processing of large data sets on compute clusters. MapReduce:在。其思想来自于 Google 的MapReduce: Simplified Data Processing on Large Clusters

 

《Hadoop权威指南(中文版) 》

      本人已买且读过部分章节。翻译的语句明显不通,但是该刚接触 Hadoop 挚友的还是很有帮助的。从中文版的内容来看,英文原版的质量非常不错。所以,建议将她和英文版(下载电子版即可,下载地址详见下面,附件也有文件下载),以及 Hadoop 官方文档信息一起结合起来学习和实践。这应该是一种不错的折衷方案吧,毕竟有关 Hadoop 的经典中文书籍少之又少。


《Hadoop: The Definitive Guide 》

    从中文版的内容介绍来看,她对 Hadoop 的 HDFS 和 MapReduce 的具体实现细节都介绍地很详细。个人认为她与《Java 编程思想》有的一拼。英文原版下载地址:Oreilly.Hadoop.The.Definitive.Guide.Jun.2009.rar


《云计算的关键技术与应用实例 》

     有选择的看了这本书的部分章节,发现她对云计算(包括概念、相关技术)的解释还是颇有深度,且是用通俗易懂的语言阐明非常深奥的知识实属难得。同时也看出作者对云计算的理解还是很有深度的。

 

The Google File System
Sanjay Ghemawat,Howard Gobioff, andShun-Tak Leung

Abstract

We have designed and implemented the Google File System, a scalable distributed file system for large distributed data-intensive applications. It provides fault tolerance while running on inexpensive commodity hardware, and it delivers high aggregate performance to a large number of clients.


While sharing many of the same goals as previous distributed file systems, our design has been driven by observations of our application workloads and technological environment, both current and anticipated, that reflect a marked departure from some earlier file system assumptions. This has led us to reexamine traditional choices and explore radically different design points.


The file system has successfully met our storage needs. It is widely deployed within Google as the storage platform for the generation and processing of data used by our service as well as research and development efforts that require large data sets. The largest cluster to date provides hundreds of terabytes of storage across thousands of disks on over a thousand machines, and it is concurrently accessed by hundreds of clients.


In this paper, we present file system interface extensions designed to support distributed applications, discuss many aspects of our design, and report measurements from both micro-benchmarks and real world use.


Appeared in:
19th ACM Symposium on Operating Systems Principles,
Lake George, NY, October, 2003.


Download:PDF Version


MapReduce: Simplified Data Processing on Large Clusters
Jeffrey DeanandSanjay Ghemawat

Abstract

MapReduce is a programming model and an associated implementation for processing and generating large data sets. Users specify a map function that processes a key/value pair to generate a set of intermediate key/value pairs, and a reduce function that merges all intermediate values associated with the same intermediate key. Many real world tasks are expressible in this model, as shown in the paper.


Programs written in this functional style are automatically parallelized and executed on a large cluster of commodity machines. The run-time system takes care of the details of partitioning the input data, scheduling the program's execution across a set of machines, handling machine failures, and managing the required inter-machine communication. This allows programmers without any experience with parallel and distributed systems to easily utilize the resources of a large distributed system.


Our implementation of MapReduce runs on a large cluster of commodity machines and is highly scalable: a typical MapReduce computation processes many terabytes of data on thousands of machines. Programmers find the system easy to use: hundreds of MapReduce programs have been implemented and upwards of one thousand MapReduce jobs are executed on Google's clusters every day.


Appeared in:
OSDI'04: Sixth Symposium on Operating System Design and Implementation,
San Francisco, CA, December, 2004.


Download:PDF Version

Slides:HTML Slides

    本站是提供个人知识管理的网络存储空间,所有内容均由用户发布,不代表本站观点。请注意甄别内容中的联系方式、诱导购买等信息,谨防诈骗。如发现有害或侵权内容,请点击一键举报。
    转藏 分享 献花(0

    0条评论

    发表

    请遵守用户 评论公约

    类似文章 更多