Let's start from scratch.
Hadoop basically consists of three components
- HDFS(Hadoop Distributed File System)
- MapReduce
- YARN(Yet Another Resource Negotiator)
HDFS, the name explains it all. It is a distributed file system which stores data in commodity hardware. HDFS can store any type of data regardless of Structured, Unstructured and Semi-Structured data. It provides data in a better manner but ends up replicating the data. Being just a File-System it stores data in flat files and HDFS also lacks random Read-Write capabilities.
- It can boost up the speed for accessing Big-Data
- It follows the slogan of "Write once, Read Many"
- Lacks random Read-Write capabilities
MapReduce is a framework used to compute and process Big-Data. Unlike HDFS, MapReduce can access data randomly but HDFS was proven to be good for sequential data accessing. so, this when HBase comes into the picture.
- HBase stores data in terms of Key-Value pair
- Low latency in data accessing regardless of the size of the data file in which it needs to search the needed data
- Flexibility in Data Model
YARN acts like a manager between HDFS and MapReduce.
Hadoop is used for Batch-Processing and HBase is used in Real-Time needs.