Brief Introduction to Oozie

Workflow Example:

<workflow-app nome='wordcount –wf’> <start to= ‘wordcount’/> <action name=’Wordcount'> <map-reduce> <job-tracker>foo.com:9001</job-tracker> <name-node>hdfs://bar.com:9000</name-node> <configuration> <property> <name>mapred.input.dir</name> <value>${inputDir}</value,> </property> <property> <name>mapred.output.dir</name> <value> ${outputDir}</value> </property> </configuration> </map-reduce> <ok to='end’/> <error to='kill'/> </action> <kill name='kill'/> <end name='end'/> </Workflow-app>

Workflow Definition:

A workflow definition is a DAG with control flow nodes or action nodes, where the nodes are connected by transitions arrows.

Control Flow Nodes:

The control flow provides a way to control the Workflow execution path. Flow control operations within the workflow applications can be done through the following nodes:

Start/end/kill

Decision

Fork/join

Action Nodes:

Map-reduce

Pig

HDFS

Sub-workflow

Java – Run custom Java code

Hey Rajiv, thanks for checking out our blog. Please refer to the steps given below to step up cron job:
1. Prepare SQL to be run on using CRON
2. See below for example of code which needs to be added to SQL code for a cron job
.logon server/user_id, Teradata password
For example :
.logon Mozart/akatarni,Welcome1
ADD THE SQL CODE HERE
.logoff
.quit
.exit
3. WinSCP – this is the file transfer application that is used to transfer the .SQL code file to the server.
a. Open “WinSCP”, Server name = phximdsas02.phx.ebay.com
b. Give login id and SAS password
c. Copy the code from your system to server window, in the attached snap shot we have copied “ask_lstg.sql” from genpact(personal system) to server window.
i. Left window shows your personal computer and right one is server
4. Open “Putty”. Use server phximdsas02.phx.ebay.com.
https://uploads.disquscdn.com/images/78d54b229f0ce485a72b7984a886306720904f6e58179446069905356f639f94.png
5. At the prompt, enter SAS credentials. After entering the password , you will see the attached window.
https://uploads.disquscdn.com/images/d89991fa7351bb7a80085c13e9ec8028f333f79c3ab3e6da6c406491ceb53dde.png
6. To open the editor :
a. Type export EDITOR=vi <hit enter>
b. Type crontab -e <hit enter>
i. This command edits your crontab file, or create one if it doesn’t already exist.
c. Press “i” to start typing
d. Press <ESC> to get out of insert mode
7. Then make the cron job entry:
A crontab entry has five fields for specifying day, date and time followed by the command to be run at that interval.
00 06 * * * /usr/bin/bteq <fake_lstg.sql> fake_lstg.LOG 2>&1
The above will run the code at 06:00 hours every day
In the above example, “fake_lstg.sql” is SQL file, “fake_lstg.LOG” is the log file where results will appear
15 20 * * 0 /usr/bin/bteq <fake_lstg.sql> fake_lstg.LOG 2>&1
The above will run the code at 20:15 hours every Sunday
https://uploads.disquscdn.com/images/4b910f3e7b76f35e76b9e9a338c5f547932c7c2e897d8b8a109c15c224fa0e01.png
8. Keep adding lines to the crontab file to schedule more job.
a. The easiest way to add a line is to be at the first character in the file, then in ESC mode,
click <shift> + O (case sensitive). This adds a new line above the current one.
9. To move around the file, in ESC mode
“l” – move right
“h” – move left
“j” – move down
“k” – move up
10. To save the crontab file and exit, press <ESC>, then :wq
a. To exit the file WITHOUT saving, press <ESC>, the :q!
11. Type Exit at the Unix prompt to exit Putty.
12. The cron job should run at the specified time
13. Check the *.LOG file to make sure code ran successfully.
Hope this helps. Cheers!

Comments

5 Comments

Rajiv says:
Dec 31, 2016 at 11:14 am GMT
sir how to schedule job using crontab
- EdurekaSupport says:
  Jan 4, 2017 at 2:35 pm GMT
  Hey Rajiv, thanks for checking out our blog. Please refer to the steps given below to step up cron job:
  1. Prepare SQL to be run on using CRON
  2. See below for example of code which needs to be added to SQL code for a cron job
  .logon server/user_id, Teradata password
  For example :
  .logon Mozart/akatarni,Welcome1
  ADD THE SQL CODE HERE
  .logoff
  .quit
  .exit
  3. WinSCP – this is the file transfer application that is used to transfer the .SQL code file to the server.
  a. Open “WinSCP”, Server name = phximdsas02.phx.ebay.com
  b. Give login id and SAS password
  c. Copy the code from your system to server window, in the attached snap shot we have copied “ask_lstg.sql” from genpact(personal system) to server window.
  i. Left window shows your personal computer and right one is server
  4. Open “Putty”. Use server phximdsas02.phx.ebay.com.
  https://uploads.disquscdn.com/images/78d54b229f0ce485a72b7984a886306720904f6e58179446069905356f639f94.png
  5. At the prompt, enter SAS credentials. After entering the password , you will see the attached window.
  https://uploads.disquscdn.com/images/d89991fa7351bb7a80085c13e9ec8028f333f79c3ab3e6da6c406491ceb53dde.png
  6. To open the editor :
  a. Type export EDITOR=vi <hit enter>
  b. Type crontab -e <hit enter>
  i. This command edits your crontab file, or create one if it doesn’t already exist.
  c. Press “i” to start typing
  d. Press <ESC> to get out of insert mode
  7. Then make the cron job entry:
  A crontab entry has five fields for specifying day, date and time followed by the command to be run at that interval.
  00 06 * * * /usr/bin/bteq <fake_lstg.sql> fake_lstg.LOG 2>&1
  The above will run the code at 06:00 hours every day
  In the above example, “fake_lstg.sql” is SQL file, “fake_lstg.LOG” is the log file where results will appear
  15 20 * * 0 /usr/bin/bteq <fake_lstg.sql> fake_lstg.LOG 2>&1
  The above will run the code at 20:15 hours every Sunday
  https://uploads.disquscdn.com/images/4b910f3e7b76f35e76b9e9a338c5f547932c7c2e897d8b8a109c15c224fa0e01.png
  8. Keep adding lines to the crontab file to schedule more job.
  a. The easiest way to add a line is to be at the first character in the file, then in ESC mode,
  click <shift> + O (case sensitive). This adds a new line above the current one.
  9. To move around the file, in ESC mode
  “l” – move right
  “h” – move left
  “j” – move down
  “k” – move up
  10. To save the crontab file and exit, press <ESC>, then :wq
  a. To exit the file WITHOUT saving, press <ESC>, the :q!
  11. Type Exit at the Unix prompt to exit Putty.
  12. The cron job should run at the specified time
  13. Check the *.LOG file to make sure code ran successfully.
  Hope this helps. Cheers!
  - Rajiv says:
    Jan 4, 2017 at 3:18 pm GMT
    sir thanks for giving answer to my question..its helpful form me…good and fine description..thanks to u sir
Sankalp Tomar says:
Aug 19, 2016 at 12:55 pm GMT
Hi,
Suppose we want to use the output of Hive Job as an input to Mapreduce Job. How can we achieve this??
- EdurekaSupport says:
  Jan 5, 2017 at 7:31 am GMT
  Hey Sankalp, thanks for checking out our blog. With regard to your query, first we can store the output of hive in hdfs and then we can execute it as an input file for mapreduce code.
  Storing the output of hive.
  INSERT OVERWRITE DIRECTORY ‘/path/to/output/dir’
  ROW FORMAT DELIMITED
  FIELDS TERMINATED BY ‘,’
  select books from table;
  Hope this helps. Cheers!

Introduction to Big Data

Introduction to Hadoop

Hadoop Distributed File System

Hadoop Installation

YARN & MapReduce

Data Loading Tools

Apache Pig

Apache Hive

DynamoDB vs MongoDB: Which One Meets Your Business Needs Better?

How To Install MongoDB On Windows Operating System?

How To Install MongoDB On Ubuntu Operating System?

How To Install MongoDB on Mac Operating System?

How To Create User In MongoDB?

Apache HBase

Apache Oozie

Hadoop Interview Questions

Career Guidance

Big Data

Brief Introduction to Oozie

Features:

Workflow – Directed Acyclic Graph of Jobs:

Workflow Example:

Workflow Definition:

Workflow Application:

Application Deployment:

Workflow Job Parameters:

Job Execution:

Recommended videos for you

What is Apache Storm all about?

Introduction to Hadoop Administration

Advanced Security In Hadoop Cluster

HBase Tutorial – A Complete Guide On Apache HBase

What is Big Data and Why Learn Hadoop!!!

Ways to Succeed with Hadoop in 2015

Distributed Cache With MapReduce

Secure Your Hadoop Cluster With Kerberos

Apache Spark Redefining Big Data Processing

Reduce Side Joins With MapReduce

Hadoop-A Highly Available And Secure Enterprise Data Warehousing Solution

Apache Kafka With Spark Streaming: Real-Time Analytics Redefined

Big Data Processing with Spark and Scala

Real-Time Analytics with Apache Storm

Filtering on HBase Using MapReduce Filtering Pattern

MapReduce Design Patterns – Application of Join Pattern

New-Age Search through Apache Solr

Improve Customer Service With Big Data

When not to use Hadoop

Power of Python With BigData

Recommended blogs for you

Hive & Yarn Get Electrified By Spark

Tutorial: Setting Up a Virtual Environment in Hadoop

Distributed Caching With Broadcast Variables: Apache Spark

Why should a Software Testing Engineer learn Big Data and Hadoop Ecosystem Technologies?

Top Skills Required for Big Data Engineer

Introduction to Hadoop

Top Hadoop Interview Questions To Prepare In 2025 – HDFS

Top 14 Big Data Certifications in 2021

Big Data Testing: A Perfect Guide You Need to Follow

A Day In The Life Of A Hadoop Administrator

MapReduce Example: Reduce Side Join in Hadoop MapReduce

Why do we need Hadoop for Data Science?

Zookeeper Tutorial: The Guide you need to Master Zookeeper

Big Data Analytics: Turning Insights into Action

Azure Data Engineer Roadmap in 2025

Introduction of Hadoop Architecture

Hadoop Career: Career in Big Data Analytics

How to Set Up Hadoop Cluster with HDFS High Availability

Hive Tutorial – Hive Architecture and NASA Case Study

Apache Flume Tutorial : Twitter Data Streaming

Join the discussionCancel reply

Trending Courses in Big Data

Microsoft Azure Data Engineering Training Cou ...

Microsoft Fabric DP-700 Certification Trainin ...

PySpark Certification Training Course

Big Data Hadoop Certification Training Course

Applied Data Engineering on Azure Cloud Cours ...

Apache Kafka Certification Training Course

ELK Stack Training & Certification

Apache Spark and Scala Certification Training ...

Splunk Certification Training: Power User and ...