An Introduction to HBase

In our last two blogs, we talked about the HDFS Cluster & Zookeeper Cluster. Which is needed for deploying OpenTSDB in clustered mode. Continuing to the series, In this blog, we are going to talk about HBase which will be used by OpenTSDB in the cluster to store data.

HBase is a column-oriented NoSQL database management system that runs on top of Hadoop Distributed File System (HDFS).

It is a part of the Hadoop ecosystem that provides random real-time read/write access to data in the Hadoop File System.

One can store the data in HDFS either directly or through HBase. Data consumer reads/accesses the data in HDFS randomly using HBase. HBase sits on top of the Hadoop File System and provides read and write access.

 It is well suited for sparse data sets, which are common in many big data use cases. Like most of other Apache projects, it is also mainly written in JAVA. It can store the huge amount of data from terabytes to petabytes. HBase is not a relational database system. Unlike Relational Database System it does not support a structured query language like SQL. It is built for low latency operations, which is having some specific features compared to traditional relational models.

Storage Mechanism in HBase:

HBase is a column-oriented database. It stores data in tables & sorted by RowId.  In table schema, only column family is defined. It is a key-value pair. A table has multiple column families and each column family can have any number of columns. HBase stores data on disk in a column-oriented format, it is distinctly different from traditional columnar databases.

Architecture:

In HBase, the tables are divided into regions and served by region servers.

hbaseArchitecture

The Main Component Of HBase are:

  1. Master
  2. Region Server
  3. Region
  4. ZooKeeper
  5. HDFS
  6. HFile
  7. MemStore

HBase Master Server:

  • Master server usage Apache Zookeeper and assigns region to the region server
  • Responsible for load balancing. It will reduce the load from busy servers and assign that region to less occupied servers.
  • Responsible for schema changes (HBase table creation, the creation of column families etc).
  • Interface for creating, deleting, updating tables
  • Monitor all the region servers in the cluster.

 

HBase Region:

The HBase tables are the tables that are split horizontally into regions and are managed by region server.

HBase Region Server:

Regions are assigned to a node in the cluster called Region server. Region Server manages Region. When data size grows beyond the limit, to reduce the load on one Region Server. HBase automatically splits the table and distributes the load to another Region Server. A single region server can server around 1000 regions.

The process of splitting tables into regions is called Sharding and it is done automatically.

Role of Region Server:

  •  It communicates with the client and handles data-related operation
  • Decide the size of the region
  • Splitting regions automatically
  • Handling read and writes requests
  •  Handle the read and write request for all the regions under it.

HFile:

HFile is a file-based data structure that is used to store data in HBase. It is key/value type of file data structure. A file of sorted key/value pairs. Both keys and values are byte arrays. This data structure supports random read and writes operation on the table. Using key it will update the values on the table.

MemStore:

MemStore is a write buffer. Before permanent write data is a buffered in MemStore. When MemStore is full it content is flushed to HFile. It doesn’t write in existing HFile instead it creates a new one.

HDFS:

HBase uses HDFS to store data. For more info please refer our blog: An Introduction to HDFS

ZooKeeper:

HBase uses ZooKeeper as a centralized monitoring server to maintain configuration information. It also provides distributed synchronization. For more info please refer our last blog  An Introduction to ZooKeeper

Deploy HBase:

For deploying HBase we will use harisekhon/hbase:1.2 docker image.

 hbase-site.xml:

Create hbase-site.xml file in /root/hadoop/ location in all 3 VM’s.

Replace zoo1,zoo2,zoo3 with respective zookeeper IP.

HBase:

HBase on VM 1:
HBase on VM 2:

HBase on VM 3:

Once all the service is deployed you can see the hbase Status on http://<VM1 | VM2 | VM3 IP>:16010/master-status 

Hbase.PNG

In this blog, we studied about HBase and how to create  3 node HBase cluster. In the next blog, we will study about the OpenTSDB  and will add it to our HDFS, ZooKeeper & HBase Cluster.

Please Like and Share this blog.

2 thoughts on “An Introduction to HBase”

Leave a Reply

Your email address will not be published. Required fields are marked *