Microsoft Certified Azure Data Engineer Assoc ...
- 14k Enrolled Learners
- Weekend
- Live Class
In this blog post, let’s discuss top Hive commands with examples. These Hive commands are very important to set up the foundation for Hive Certification Training.
Apache Hive is a Data warehouse system which is built to work on Hadoop. It is used to querying and managing large datasets residing in distributed storage. Before becoming an open source project of Apache Hadoop, Hive was originated in Facebook. It provides a mechanism to project structure onto the data in Hadoop and to query that data using a SQL-like language called HiveQL (HQL).
Hive is used because the tables in Hive are similar to tables in a relational database. If you are familiar with SQL, it’s a cakewalk. Many users can simultaneously query the data using Hive-QL.
Hive defines a simple SQL-like query language to querying and managing large datasets called Hive-QL ( HQL ). It’s easy to use if you’re familiar with SQL Language. Hive allows programmers who are familiar with the language to write the custom MapReduce framework to perform more sophisticated analysis.
You can even check out the details of Big Data with the Azure Data Engineering Certification in Bangalore.
1. The Apache Hive distributed storage.
2. Hive provides tools to enable easy data extract/transform/load (ETL)
3. It provides the structure on a variety of data formats.
4. By using Hive, we can access files stored in Hadoop Distributed File System (HDFS is used to querying and managing large datasets residing in) or in other data storage systems such as Apache HBase.
• Hive is not designed for Online transaction processing (OLTP ), it is only used for the Online Analytical Processing.
• Hive supports overwriting or apprehending data, but not updates and deletes.
• In Hive, sub queries are not supported.
The following are the reasons why Hive is used in spite of Pig’s availability:
Metastore :
Hive stores the schema of the Hive tables in a Hive Metastore. Metastore is used to hold all the information about the tables and partitions that are in the warehouse. By default, the metastore is run in the same process as the Hive service and the default Metastore is DerBy Database.
SerDe :
Serializer, Deserializer gives instructions to hive on how to process a record.
Data Definition Language (DDL )
DDL statements are used to build and modify the tables and other objects in the database.
DDL Command | Function |
CREATE | It is used to create a table or Database |
SHOW | It is used to show Database, Table, Properties, etc |
ALTER | It is used to make changes to the existing table |
DESCRIBE | It describes the table columns |
TRUNCATE | Used to permanently truncate and delete the rows of table |
DELETE | Deletes the table data, but, can be restored |
Go to Hive shell by giving the command sudo hive and enter the command ‘create database<data base name>’ to create the new database in the Hive.
To list out the databases in Hive warehouse, enter the command ‘show databases’.
The database creates in a default location of the Hive warehouse. In Cloudera, Hive database store in a /user/hive/warehouse.
The command to use the database is USE <data base name>
Copy the input data to HDFS from local by using the copy From Local command.
When we create a table in hive, it creates in the default location of the hive warehouse. – “/user/hive/warehouse”, after creation of the table we can move the data from HDFS to hive table.
The following command creates a table with in location of “/user/hive/warehouse/retail.db”
Note : retail.db is the database created in the Hive warehouse.
Describe provides information about the schema of the table.
Data Manipulation Language (DML )
DML statements are used to retrieve, store, modify, delete, insert and update data in the database.
Example :
LOAD, INSERT Statements.
Syntax :
LOAD data <LOCAL> inpath <file path> into table [tablename]
The Load operation is used to move the data into corresponding Hive table. If the keyword local is specified, then in the load command will give the local file system path. If the keyword local is not specified we have to use the HDFS path of the file.
Here are some examples for the LOAD data LOCAL command
After loading the data into the Hive table we can apply the Data Manipulation Statements or aggregate functions retrieve the data.
Example to count number of records:
Count aggregate function is used count the total number of the records in a table.
‘create external’ Table :
The create external keyword is used to create a table and provides a location where the table will create, so that Hive does not use a default location for this table. An EXTERNAL table points to any HDFS location for its storage, rather than default storage.
Insert Command:
The insert command is used to load the data Hive table. Inserts can be done to a table or a partition.
• INSERT OVERWRITE is used to overwrite the existing data in the table or partition.
• INSERT INTO is used to append the data into existing data in a table. (Note: INSERT INTO syntax is work from the version 0.8)
‘Partitioned by‘ is used to divided the table into the Partition and can be divided in to buckets by using the ‘Clustered By‘ command.
When we insert the data Hive throwing errors, the dynamic partition mode is strict and dynamic partition not enabled. So we need to set the following parameters in Hive shell.
set hive.exec.dynamic.partition=true;
To enable dynamic partitions, by default, it’s false
set hive.exec.dynamic.partition.mode=nonstrict;
Partition is done by the category and can be divided in to buckets by using the ‘Clustered By’ command.
The ‘Drop Table’ statement deletes the data and metadata for a table. In the case of external tables, only the metadata is deleted.
The ‘Drop Table’ statement deletes the data and metadata for a table. In the case of external tables, only the metadata is deleted.
Load data local inpath ‘aru.txt’ into table tablename and then we check employee1 table by using Select * from table name command
To count the number of records in table by using Select count(*) from txnrecords;
Select count (DISTINCT category) from tablename;
This command will count the different category of ‘cate’ table. Here there are 3 different categories.
Suppose there is another table cate where f1 is field name of category.
Group command is used to group the result-set by one or more columns.
Select category, sum( amount) from txt records group by category
It calculates the amount of same category.
The result one table is stored in to another table.
Create table newtablename as select * from oldtablename;
Here one more table is created in the name ‘mailid’
A Join operation is performed to combining fields from two tables by using values common to each.
The result of a left outer join (or simply left join) for tables A and B always contains all records of the “left” table (A), even if the join-condition does not find any matching record in the “right” table (B).
A right outer join (or right join) closely resembles a left outer join, except with the treatment of the tables reversed. Every row from the “right” table (B) will appear in the joined table at least once.
The joined table will contain all records from both tables, and fill in NULLs for missing matches on either side.
Once done with hive we can use quit command to exit from the hive shell.
Hive is just a part of the big puzzle called Big Data and Hadoop. Hadoop is much more than just Hive. Click below to see what other skills you should master in Hadoop. You can even check out the details of Big Data with the Data Engineering Courses.
Got a question for us? Please mention it in the comments section and we will get back to you.
Related Posts:
7 Ways Big Data Training Can Change Your Organization
edureka.co
This blog is wonderful and very informative.
Can I also get some information regarding MLlib and kafka?
Thanks in advance :)
Hi Mridula,
Thank you for the positive feedback. Keep following our blog for more interesting and informative posts.
We would recommend that you get in touch with us for further clarification on Apache Kafka by contacting our sales team on +91-8880862004 (India) or 1800 275 9730 (US toll free). You can mail us on sales@edureka.co.
nice blog.. this blog helps beginners to work and understand the HIVE commands. very useful… :)
Hi Sabarinadh,
Thank you for your positive feedback. We hope that you will find our blog useful in future as well.
Keep visiting the Edureka Blog page for latest posts on this link: https://www.edureka.co/blog/
nice blog mate… can i write blogs at edureka web?
Hi Justin, let us know your area of expertise and also do send couple of samples to content@edureka.co We will get back to you after reviewing it.
Very Informative! Thanks
This should be tightly coupled with HBase also running on Hadoop
Thanks Prateek!
unable to see the images on the page. getting below error
The server at cdn.edureka.co can’t be found, because the DNS lookup failed.
Hi Sneha, the issue has been fixed now.
Thank you for the support. I am able to view the images now.
hi guys can anyone suggest me the command to show the databases in hive using spark….
Hi Sree, You can do the following:
Run Spark Shell to check if you are able to see Hive databases and tables.
$ cd $SPARK_HOME;
$ ./bin/spark-shell
scala> val
sqlContext = new org.apache.spark.sql.hive.HiveContext(sc);
scala>
sqlContext.sql(“”show
databases””).collect().foreach(println);
Hi Awanish, Can you please post a blog on Spark and Storm. Thanks.
Hi Pavan, we will take your request in to consideration. Do go through other blog posts as well.
This is excellent sheet on Hive database operation. team please create same for Pig latin operations
thanks
Thanks Hareesh!! Do go through out other blog posts as well.
please check hadoopmreduce.blogspot.com,
hadoopdatasolutions.blogspot.com
Nice bolg…Below ash is taking abt you tube links can you send me those if you have …
Nice bog :)
This blog and the youtube videos are awesome. Thanks.
Can you send me you tube links
Hi Sushobit, Here the link to our Youtube channel. You can check out the various play lists available. https://www.youtube.com/channel/UCkw4JCwteGrDHIsyIIKo4tQ
Thanks Ash!! Do check out our other posts as well.