Apache Hadoop is an open-source framework that is used to efficiently store and process large datasets. It is used for storing and processing big data. In Hadoop, data is stored in inexpensive commodity servers that run as clusters. It is a distributed file system, which allows concurrent processing and fault tolerance. Hadoop consists of multiple software components for different functionalities such as:
-
HDFS
-
MapReduce
-
Hive
-
Spark
-
HBase
-
YARN etc.
Apache Hadoop has many use cases. Financial services companies use analytics to assess risk, build investment models, and create trading algorithms; Hadoop has been used to help build and run those applications. Retailers use it to help analyze structured and unstructured data to better understand and serve their customers.