Explain to me the difference between HBase and HDFS

Explain to me the difference between HBase and HDFS. Since the functionality of both looks to be almost the same, I am getting confused between both. The only difference I found in my research is that HDFS can work with any kind of data but HBase need structured data to work with. help me to better understand this concept
Mar 19, 2019
1 answer to this question.

Let's start from scratch.

Hadoop basically consists of three components

  1. HDFS(Hadoop Distributed File System)
  2. MapReduce
  3. YARN(Yet Another Resource Negotiator)

HDFS, the name explains it all. It is a distributed file system which stores data in commodity hardware. HDFS can store any type of data regardless of Structured, Unstructured and Semi-Structured data. It provides data in a better manner but ends up replicating the data. Being just a File-System it stores data in flat files and HDFS also lacks random Read-Write capabilities.

  • It can boost up the speed for accessing Big-Data
  • It follows the slogan of "Write once, Read Many"
  • Lacks random Read-Write capabilities
MapReduce is a framework used to compute and process Big-Data. Unlike HDFS, MapReduce can access data randomly but HDFS was proven to be good for sequential data accessing. so, this when HBase comes into the picture.
  • HBase stores data in terms of Key-Value pair
  • Low latency in data accessing regardless of the size of the data file in which it needs to search the needed data
  • Flexibility in Data Model
YARN acts like a manager between HDFS and MapReduce.
Hadoop is used for Batch-Processing and HBase is used in Real-Time needs.
answered Mar 19, 2019
