Hive and Yarn Examples on Spark

Apache Spark and Scala (25 Blogs) Become a Certified Professional

We have learnt how to Build Hive and Yarn on Spark. Now let us try out Hive and Yarn examples on Spark.

Hive Example on Spark

We will run an example of Hive on Spark. We will create a table, load data in that table and execute a simple query. When working with Hive, one must construct a HiveContext which inherits from SQLContext.

Command: cd spark-1.1.1

Command: ./bin/spark-shell

Create an input file ‘sample’ in your home directory as below snapshot (tab separated).

Command: val sqlContext = new org.apache.spark.sql.hive.HiveContext(sc)

Command: sqlContext.sql(“CREATE TABLE IF NOT EXISTS test (name STRING, rank INT) ROW FORMAT DELIMITED FIELDS TERMINATED BY ‘ ‘ LINES TERMINATED BY ‘
‘”)

Command: sqlContext.sql(“LOAD DATA LOCAL INPATH ‘/home/edureka/sample’ INTO TABLE test”)

Command: sqlContext.sql(“SELECT * FROM test WHERE rank < 5”).collect().foreach(println)

Yarn Example on Spark

We will run SparkPi example on Yarn. We can deploy Yarn on Spark in two modes : cluster mode and client mode. In yarn-cluster mode, the Spark driver runs inside an application master process which is managed by Yarn on the cluster, and the client can go away after initiating the application. In yarn-client mode, the driver runs in the client process, and the application master is only used for requesting resources from Yarn.

Command: cd spark-1.1.1

Command: SPARK_JAR=./assembly/target/scala-2.10/spark-assembly-1.1.1-hadoop2.2.0.jar ./bin/spark-submit –master yarn –deploy-mode cluster –class org.apache.spark.examples.SparkPi –num-executors 1 –driver-memory 2g –executor-memory 1g –executor-cores 1 examples/target/scala-2.10/spark-examples-1.1.1-hadoop2.2.0.jar

After you execute the above command, please wait for sometime till you get SUCCEEDED message.

Browse localhost:8088/cluster and click on the Spark application.

Click on logs.

Click on stdout to check the output.

For deploying Yarn on Spark in client mode, just make –deploy-mode as “client”. Now, you know how to build Hive and Yarn on Spark. We also did practicals on them.

Got a question for us? Please mention them in the comments section and we will get back to you.

Apache Spark with Hadoop-Why it matters?

Hive & Yarn Get Electrified By Spark

Start your Training in Apache Spark & Scala Today

Hive and Yarn Examples on Spark

Hive Example on Spark

Yarn Example on Spark

Recommended videos for you

Reduce Side Joins With MapReduce

What is Apache Storm all about?

Administer Hadoop Cluster

Streaming With Apache Spark and Scala

Advanced Security In Hadoop Cluster

Pig Tutorial – Know Everything About Apache Pig Script

Big Data Processing With Apache Spark

Apache Spark Redefining Big Data Processing

Webinar: Introduction to Big Data & Hadoop

Secure Your Hadoop Cluster With Kerberos

Hadoop Architecture – Hadoop Tutorial on HDFS Architecture

5 Scenarios: When To Use & When Not to Use Hadoop

HBase Tutorial – A Complete Guide On Apache HBase

Big Data – XML Parsing With MapReduce

New-Age Search through Apache Solr

Apache Kafka With Spark Streaming: Real-Time Analytics Redefined

Hive Tutorial – Understanding Hive In Depth

Introduction to Hadoop Administration

Introduction to Big Data TDD and Pig Unit

Top Hadoop Interview Questions and Answers – Ace Your Interview

Recommended blogs for you

Essential Hadoop Tools for Crunching Big Data

Importance of Hadoop Tutorial

What is Hadoop? Introduction to Big Data & Hadoop

Why Should a Data Warehouse Professional Move to Big Data Hadoop?

Big Bucks for Big Data Professionals: A Hype or Hope?

Switching Careers: From Java to Big Data / Hadoop

Map Side Join Vs. Join

Introduction to Hadoop Job Tracker

HBase Architecture: HBase Data Model & HBase Read/Write Mechanism

Install Apache Hadoop Cluster on Amazon EC2 free tier Ubuntu server in 30 minutes

Apache Flink: The Next Gen Big Data Analytics Framework For Stream And Batch Data Processing

Most Important Scala Interview Questions to Prepare in 2025

Jobs In Hadoop

Demystifying Partitioning in Spark

Azure Synapse vs. Databricks – What Are the Differences?

Apache Kafka: Next Generation Distributed Messaging System

How to become a Hadoop Administrator?

Overview of Hadoop 2.0 Cluster Architecture Federation

Splunk Knowledge Objects: Splunk Timechart, Data Models And Alert

Big Data In Healthcare: How Hadoop Is Revolutionizing Healthcare Analytics

Join the discussionCancel reply

Trending Courses in Big Data

PySpark Certification Training Course

Applied Data Engineering on Azure Cloud Cours ...

Apache Kafka Certification Training Course

Big Data Hadoop Certification Training Course

Big Data Hadoop Administration Certification ...

Apache Spark and Scala Certification Training ...

Splunk Certification Training: Power User and ...

Comprehensive MapReduce Certification Trainin ...

MapReduce Design Patterns Certification Train ...

ELK Stack Training & Certification

Browse Categories

Subscribe to our Newsletter, and get personalized recommendations.

Hive and Yarn Examples on Spark