Overview of HBase Storage Architecture

Become a Certified Professional

Apache HBase is an open-source, distributed, non-relational database modeled after Google’s Bigtable and written in Java. It provides capabilities similar to Bigtable on top of Hadoop and HDFS (Hadoop Distributed Filesystem) i.e. it provides a fault-tolerant way of storing large quantities of sparse data, which are common in many big data use cases. HBase is used for real time read/write access to Big Data.

The HBase Storage architecture comprises numerous components. Let’s look at the functions of these components and know how data is being written.

HFiles:

HFiles forms the low level of HBase’s architecture. HFiles are storage files created to store HBase’s data fast and efficiently.

HMaster:

The HMaster is responsible to assign the regions to each HRegionServer when HBase is started. It is responsible for managing everything related to rows, tables and their co-ordination activities. The Hmaster also has the details of the metadata.

Learn more about Big Data and its applications from the Data Engineering courses.

Components of HBase:

HBase has the following components:

Table – Comprises Regions
Region – Range of rows stored together
Region Servers – Serves one or more regions
Master Server – Daemon is responsible for managing HBase cluster

The HBase stores data directly in to the HDFS and relies greatly on HDFS’s High Availability and Fault Tolerance.

HBase Storage Architecture:

The general flow is that a Client contacts the Zookeeper first to find a particular row key. It does so by retrieving the server name from Zookeeper. With this information it can now query that server to get the server that holds the metatable. Both these details are cached and only looked up once. Lastly, it can query the metaserver and retrieve the server that has the row the client is looking for. You can even check out the details of Big Data with the Azure Data Engineering Certification in Atlanta.

Once it knows in what region the row resides, it caches this information as well and contacts the HRegionServer directly. So over time the Client has complete information of where to get rows from without needing to query the metaserver again. When the HRegion is opened, it sets up a Store instance for each HColumnFamily for every table. Data is written when the Client issues a request to the HRegionServer which provides the details to the matching HRegion instance. The first step is that we have to decide if the data should be first written to the ‘Write-Ahead-Log’ (WAL) represented by the HLog class. The decision is based on the flag set by the client.
Once the data is written to the WAL it is placed in the MemStore. At the same time, the Memstore is checked whether it is full and in that case a flush to disk is requested. Then the data is written in to the HFile.

Got a question for us? Mention them in the comments section and we will get back to you.

Related Posts

Big Data and Hadoop Traininig

Overview of HBase Storage Architecture

Components of HBase:

HBase Storage Architecture:

Recommended videos for you

Big Data Processing With Apache Spark

MapReduce Tutorial – All You Need To Know About MapReduce

Introduction to Big Data TDD and Pig Unit

Ways to Succeed with Hadoop in 2015

Bulk Loading Into HBase With MapReduce

Webinar: Introduction to Big Data & Hadoop

Improve Customer Service With Big Data

Spark SQL | Apache Spark

Streaming With Apache Spark and Scala

Power of Python With BigData

Filtering on HBase Using MapReduce Filtering Pattern

Python for Big Data Analytics

Tailored Big Data Solutions Using MapReduce Design Patterns

Is It The Right Time For Me To Learn Hadoop ? Find out.

Administer Hadoop Cluster

Hadoop Tutorial – A Complete Tutorial For Hadoop

Hadoop Cluster With High Availability

Hadoop-A Highly Available And Secure Enterprise Data Warehousing Solution

Big Data – XML Parsing With MapReduce

Introduction to Apache Solr-1

Recommended blogs for you

Jupyter Notebook Cheat Sheet : A Beginner’s Guide to Jupyter Notebook

Top 50 Hadoop Interview Questions You Must Prepare In 2025

Hadoop Tutorial: All you need to know about Hadoop!

Big Data Processing with Spark and Scala

Big Data Career Is The Right Way Forward. Know Why!

Apache Hadoop 2.0 and YARN

What’s New in Hadoop 3.0 – Enhancements in Apache Hadoop 3

Stateful Transformations with Windowing in Spark Streaming

Big Bucks for Big Data Professionals: A Hype or Hope?

Big Data Analytics Tools and Technologies with key Features

A Day In The Life Of A Hadoop Administrator

Apache Kafka: Next Generation Distributed Messaging System

Switching Careers: From Java to Big Data / Hadoop

Why Should a Data Warehouse Professional Move to Big Data Hadoop?

What are the Best books for Hadoop?

What is Azure Cosmos DB? – Types, Features, Benefits

Introduction to Apache MapReduce and HDFS

PySpark CheatSheet: Spark RDD with Python

What is Big Data Analytics – Turning Insights Into Action

Dataframes in Spark: All you need to know about Structured Data Processing

Join the discussionCancel reply

Trending Courses in Big Data

Microsoft Azure Data Engineering Training Cou ...

Microsoft Fabric Data Engineer Associate Trai ...

PySpark Certification Training Course

Apache Kafka Certification Training Course

Big Data Hadoop Certification Training Course

Applied Data Engineering on Azure Cloud Cours ...

Splunk Certification Training: Power User and ...

ELK Stack Training & Certification

Apache Spark and Scala Certification Training ...

Comprehensive MapReduce Certification Trainin ...

Browse Categories

Subscribe to our Newsletter, and get personalized recommendations.

Overview of HBase Storage Architecture