Scoop came into scenario because there were tools with which we can ingest the data for unstructured sources. But as per organization, data are stored in relational database . But there was a need of a tool which can import and export data.
So Apache Sqoop is a tool in Hadoop which is design to transfer data between HDFS (Hadoop storage) and relational database like MySQL, RDB etc. Apache Sqoop imports data from relational databases to HDFS, and exports data from HDFS to the relational databases. It efficiently transfers bulk data between Hadoop and external data stores such as enterprise data warehouses, relational databases, etc.
This is how Sqoop got its name – “SQL to Hadoop & Hadoop to SQL”.
The data residing in the relational database management systems need to be transferred to HDFS. This task used to be done with writing Map Reduce code for importing and exporting data from the relational database to HDFS which is quite tedious. Here Apache Sqoop automates the process of importing and exporting of data.
Sqoop provides basic information like,
- database authentication, source, destination, operations etc.
- Sqoop internally converts the command into MapReduce tasks, which are then executed over HDFS.
- Sqoop uses YARN framework to import and export the data, which provides fault tolerance.
I hope this information will be helpful to understand the topic.