HBase is an open-source, distributed, versioned, column-oriented store modeled after Google ‘Bigtable’.
This tutorial will describe how to setup and run Hbase cluster, with not too much explanation about hbase. There are a number of articles where the Hbase are described in details.
We will build hbase cluster using three Ubuntu machine in this tutorial.
A distributed HBase depends on a running ZooKeeper cluster. All participating nodes and clients need to be able to get to the running ZooKeeper cluster. HBase by default manages a ZooKeeper cluster for you, or you can manage it on your own and point HBase to it. In our case, we are using default ZooKeeper cluster, which is manage by Hbase
Following are the capacities in which nodes may act in our cluster:
- Hbase Master:- The HbaseMaster is responsible for assigning regions to HbaseRegionserver, monitors the health of each HbaseRegionserver.
- Zookeeper: – For any distributed application, ZooKeeper is a centralized service for maintaining configuration information, naming, providing distributed synchronization, and providing group services.
- Hbase Regionserver:- The HbaseRegionserver is responsible for handling client read and write requests. It communicates with the Hbasemaster to get a list of regions to serve and to tell the master that it is alive.
In our case, one machine in the cluster is designated as Hbase master and Zookeeper. The rest of machine in the cluster act as a Regionserver.
Before we start:
Before we start configure HBase, you need to have a running Hadoop cluster, which will be the storage for hbase(Hbase store data in Hadoop Distributed File System). Please refere to Installing Hadoop in the cluster – A complete step by step tutorial post before continuing.