Top 10 books to get started with Hadoop

Hadoop | Tech and Tools   |   
Published July 3, 2014   |   
Baiju NT

These books are our recommendations if you are planning to start your Big Data journey with Hadoop, – an open source distributed processing framework which is at the center of a growing big data ecosystem. The books are listed in no specific order.

1. Hadoop: The Definitive Guide

Author: Tom White
Publisher: Hadoop: The Definitive Guide
The books nicely covers Hadoop basic concepts as well as the whole Hadoop galaxy (HDFS, MapReduce, HBase, Zookeeper, Hive, Pig…) With this comprehensive guide, you’ll learn how to build and maintain reliable, scalable, distributed systems with Apache Hadoop. You’ll find illuminating case studies that demonstrate how Hadoop is used to solve specific problems. This third edition covers recent changes to Hadoop, including material on the new MapReduce API, as well as MapReduce 2 and its more flexible execution model (YARN).

2. Hadoop in Practice

Author: Alex Holmes
Publisher: Manning Publications
Hadoop in Practice collects 85 Hadoop examples and presents them in a problem/solution format. Each technique addresses a specific task you’ll face, like querying big data using Pig or writing a log file loader. You’ll explore each problem step by step, learning both how to build and deploy that specific solution along with the thinking that went into its design. As you work through the tasks, you’ll find yourself growing more comfortable with Hadoop and at home in the world of big data.

3. Hadoop in Action

Author: Chuck Lam
Publisher: Manning
Hadoop in Action introduces the subject and shows how to write programs in the MapReduce style. It starts with a few easy examples and then moves quickly to show Hadoop use in more complex data analysis tasks. Included are best practices and design patterns of MapReduce programming.

4. Hadoop Operations

Author:  Eric Sammers
Publisher: O’Reilly Press
A guide to running large-scale Hadoop clusters, written by someone who has practical experience in such deployments. If you’ve been asked to maintain large and complex Hadoop clusters, this book is a must.

5. Pro Hadoop

Author: Jason Venner
Publisher: Apress
This book is a step by step guide to writing, running and debugging Map/Reduce jobs using Hadoop, and to installing and managing Hadoop Clusters. It is ideal for training new Map/Reduce users and Cluster administrators and for polishing existing Hadoop skills.

6. Hadoop Beginner’s Guide

Author: Garry Turkington
Publisher: Packt Publishing
Written for complete beginners to Hadoop, the book covers how to install and run Hadoop on a local Ubuntu host or create an on-demand Hadoop cluster on Amazon Web Services (EC2), before getting to grips with MapReduce.

7. Optimizing Hadoop for MapReduce

Author: Khaled Tannir
Publisher: Packt Publishing
Optimizing Hadoop for MapReduce book is an example-based tutorial that deals with Optimizing Hadoop for MapReduce job performance. This book introduces you to advanced MapReduce concepts and teaches you everything from identifying the factors that affect MapReduce job performance to tuning the MapReduce configuration. Based on real-world experience, this book will help you to fully utilize your cluster’s node resources to run MapReduce jobs optimally.

8. Scaling Big Data with Hadoop and Solr

Author: Hrishikesh Karambelkar
Publisher: Packt Publishing
Scaling Big Data with Hadoop and Solr is a step-by-step guide to building a search engine while scaling data. Starting with the basics of Apache Hadoop and Solr, this book then dives into advanced topics of optimizing search with some real-world use cases and sample Java code.

9. Hadoop Operations and Cluster Management Cookbook

Author: Shumin Guo
Publisher: Packt Publishing
Hadoop Operations and Cluster Management Cookbook is a guide for designing and managing a Hadoop cluster.

10. Hadoop Real World Solutions Cookbook

Author: Jonathan Owens, Brian Femiano, Jon Lentz
Publisher: Packt Publishing
Collection of real world code analytics and design patterns using various tools from the Hadoop community. Each recipe walks the reader through the implementation, or in some cases debugging and configuration tuning. The book covers various tools including MapReduce, Hive, Pig, MRUnit, serialization using Avro/Thrift/ProtoBuffs, Giraph, Accumulo and several others.