Brief Introduction to Oozie

Workflow Example:

<workflow-app nome='wordcount –wf’> <start to= ‘wordcount’/> <action name=’Wordcount'> <map-reduce> <job-tracker>foo.com:9001</job-tracker> <name-node>hdfs://bar.com:9000</name-node> <configuration> <property> <name>mapred.input.dir</name> <value>${inputDir}</value,> </property> <property> <name>mapred.output.dir</name> <value> ${outputDir}</value> </property> </configuration> </map-reduce> <ok to='end’/> <error to='kill'/> </action> <kill name='kill'/> <end name='end'/> </Workflow-app>

Workflow Definition:

A workflow definition is a DAG with control flow nodes or action nodes, where the nodes are connected by transitions arrows.

Control Flow Nodes:

The control flow provides a way to control the Workflow execution path. Flow control operations within the workflow applications can be done through the following nodes:

Start/end/kill

Decision

Fork/join

Action Nodes:

Map-reduce

Pig

HDFS

Sub-workflow

Java – Run custom Java code

Hey Rajiv, thanks for checking out our blog. Please refer to the steps given below to step up cron job:
1. Prepare SQL to be run on using CRON
2. See below for example of code which needs to be added to SQL code for a cron job
.logon server/user_id, Teradata password
For example :
.logon Mozart/akatarni,Welcome1
ADD THE SQL CODE HERE
.logoff
.quit
.exit
3. WinSCP – this is the file transfer application that is used to transfer the .SQL code file to the server.
a. Open “WinSCP”, Server name = phximdsas02.phx.ebay.com
b. Give login id and SAS password
c. Copy the code from your system to server window, in the attached snap shot we have copied “ask_lstg.sql” from genpact(personal system) to server window.
i. Left window shows your personal computer and right one is server
4. Open “Putty”. Use server phximdsas02.phx.ebay.com.
https://uploads.disquscdn.com/images/78d54b229f0ce485a72b7984a886306720904f6e58179446069905356f639f94.png
5. At the prompt, enter SAS credentials. After entering the password , you will see the attached window.
https://uploads.disquscdn.com/images/d89991fa7351bb7a80085c13e9ec8028f333f79c3ab3e6da6c406491ceb53dde.png
6. To open the editor :
a. Type export EDITOR=vi <hit enter>
b. Type crontab -e <hit enter>
i. This command edits your crontab file, or create one if it doesn’t already exist.
c. Press “i” to start typing
d. Press <ESC> to get out of insert mode
7. Then make the cron job entry:
A crontab entry has five fields for specifying day, date and time followed by the command to be run at that interval.
00 06 * * * /usr/bin/bteq <fake_lstg.sql> fake_lstg.LOG 2>&1
The above will run the code at 06:00 hours every day
In the above example, “fake_lstg.sql” is SQL file, “fake_lstg.LOG” is the log file where results will appear
15 20 * * 0 /usr/bin/bteq <fake_lstg.sql> fake_lstg.LOG 2>&1
The above will run the code at 20:15 hours every Sunday
https://uploads.disquscdn.com/images/4b910f3e7b76f35e76b9e9a338c5f547932c7c2e897d8b8a109c15c224fa0e01.png
8. Keep adding lines to the crontab file to schedule more job.
a. The easiest way to add a line is to be at the first character in the file, then in ESC mode,
click <shift> + O (case sensitive). This adds a new line above the current one.
9. To move around the file, in ESC mode
“l” – move right
“h” – move left
“j” – move down
“k” – move up
10. To save the crontab file and exit, press <ESC>, then :wq
a. To exit the file WITHOUT saving, press <ESC>, the :q!
11. Type Exit at the Unix prompt to exit Putty.
12. The cron job should run at the specified time
13. Check the *.LOG file to make sure code ran successfully.
Hope this helps. Cheers!

Comments

5 Comments

Rajiv says:
Dec 31, 2016 at 11:14 am GMT
sir how to schedule job using crontab
- EdurekaSupport says:
  Jan 4, 2017 at 2:35 pm GMT
  Hey Rajiv, thanks for checking out our blog. Please refer to the steps given below to step up cron job:
  1. Prepare SQL to be run on using CRON
  2. See below for example of code which needs to be added to SQL code for a cron job
  .logon server/user_id, Teradata password
  For example :
  .logon Mozart/akatarni,Welcome1
  ADD THE SQL CODE HERE
  .logoff
  .quit
  .exit
  3. WinSCP – this is the file transfer application that is used to transfer the .SQL code file to the server.
  a. Open “WinSCP”, Server name = phximdsas02.phx.ebay.com
  b. Give login id and SAS password
  c. Copy the code from your system to server window, in the attached snap shot we have copied “ask_lstg.sql” from genpact(personal system) to server window.
  i. Left window shows your personal computer and right one is server
  4. Open “Putty”. Use server phximdsas02.phx.ebay.com.
  https://uploads.disquscdn.com/images/78d54b229f0ce485a72b7984a886306720904f6e58179446069905356f639f94.png
  5. At the prompt, enter SAS credentials. After entering the password , you will see the attached window.
  https://uploads.disquscdn.com/images/d89991fa7351bb7a80085c13e9ec8028f333f79c3ab3e6da6c406491ceb53dde.png
  6. To open the editor :
  a. Type export EDITOR=vi <hit enter>
  b. Type crontab -e <hit enter>
  i. This command edits your crontab file, or create one if it doesn’t already exist.
  c. Press “i” to start typing
  d. Press <ESC> to get out of insert mode
  7. Then make the cron job entry:
  A crontab entry has five fields for specifying day, date and time followed by the command to be run at that interval.
  00 06 * * * /usr/bin/bteq <fake_lstg.sql> fake_lstg.LOG 2>&1
  The above will run the code at 06:00 hours every day
  In the above example, “fake_lstg.sql” is SQL file, “fake_lstg.LOG” is the log file where results will appear
  15 20 * * 0 /usr/bin/bteq <fake_lstg.sql> fake_lstg.LOG 2>&1
  The above will run the code at 20:15 hours every Sunday
  https://uploads.disquscdn.com/images/4b910f3e7b76f35e76b9e9a338c5f547932c7c2e897d8b8a109c15c224fa0e01.png
  8. Keep adding lines to the crontab file to schedule more job.
  a. The easiest way to add a line is to be at the first character in the file, then in ESC mode,
  click <shift> + O (case sensitive). This adds a new line above the current one.
  9. To move around the file, in ESC mode
  “l” – move right
  “h” – move left
  “j” – move down
  “k” – move up
  10. To save the crontab file and exit, press <ESC>, then :wq
  a. To exit the file WITHOUT saving, press <ESC>, the :q!
  11. Type Exit at the Unix prompt to exit Putty.
  12. The cron job should run at the specified time
  13. Check the *.LOG file to make sure code ran successfully.
  Hope this helps. Cheers!
  - Rajiv says:
    Jan 4, 2017 at 3:18 pm GMT
    sir thanks for giving answer to my question..its helpful form me…good and fine description..thanks to u sir
Sankalp Tomar says:
Aug 19, 2016 at 12:55 pm GMT
Hi,
Suppose we want to use the output of Hive Job as an input to Mapreduce Job. How can we achieve this??
- EdurekaSupport says:
  Jan 5, 2017 at 7:31 am GMT
  Hey Sankalp, thanks for checking out our blog. With regard to your query, first we can store the output of hive in hdfs and then we can execute it as an input file for mapreduce code.
  Storing the output of hive.
  INSERT OVERWRITE DIRECTORY ‘/path/to/output/dir’
  ROW FORMAT DELIMITED
  FIELDS TERMINATED BY ‘,’
  select books from table;
  Hope this helps. Cheers!

Introduction to Big Data

Introduction to Hadoop

Hadoop Distributed File System

Hadoop Installation

YARN & MapReduce

Data Loading Tools

Apache Pig

Apache Hive

DynamoDB vs MongoDB: Which One Meets Your Business Needs Better?

How To Install MongoDB On Windows Operating System?

How To Install MongoDB On Ubuntu Operating System?

How To Install MongoDB on Mac Operating System?

How To Create User In MongoDB?

Apache HBase

Apache Oozie

Hadoop Interview Questions

Career Guidance

Big Data

Brief Introduction to Oozie

Features:

Workflow – Directed Acyclic Graph of Jobs:

Workflow Example:

Workflow Definition:

Workflow Application:

Application Deployment:

Workflow Job Parameters:

Job Execution:

Recommended videos for you

Hadoop Tutorial – A Complete Tutorial For Hadoop

Webinar: Introduction to Big Data & Hadoop

What is Apache Storm all about?

MapReduce Tutorial – All You Need To Know About MapReduce

Pig Tutorial – Know Everything About Apache Pig Script

Real-Time Analytics with Apache Storm

Big Data Tutorial – Get Started With Big Data And Hadoop

Apache Spark For Faster Batch Processing

Administer Hadoop Cluster

New-Age Search through Apache Solr

Apache Kafka With Spark Streaming: Real-Time Analytics Redefined

Advanced Security In Hadoop Cluster

5 Things One Must Know About Spark

Improve Customer Service With Big Data

HBase Tutorial – A Complete Guide On Apache HBase

Is It The Right Time For Me To Learn Hadoop ? Find out.

Hive Tutorial – Understanding Hive In Depth

What is Big Data and Why Learn Hadoop!!!

Is Hadoop A Necessity For Data Science?

Hadoop Cluster With High Availability

Recommended blogs for you

All You Need To Know About Splunk

Top 14 Big Data Certifications in 2021

Hadoop Career: Career in Big Data Analytics

Azure Data Engineer Salary in India 2025

RDD using Spark : The Building Block of Apache Spark

Elasticsearch Tutorial – Power Up Your Searches

Spark vs Hadoop: Which is the Best Big Data Framework?

Install Puppet – Install Puppet in Four Simple Steps

Why should a Software Testing Engineer learn Big Data and Hadoop Ecosystem Technologies?

Basics of HBase

Is Big Data the Right Move for You?

Apache Hadoop : Create your First HIVE Script

Apache Hive Installation on Ubuntu

HBase Architecture: HBase Data Model & HBase Read/Write Mechanism

Apache Kafka: Next Generation Distributed Messaging System

5 Reasons to Learn Apache Spark

HDFS Commands: Hadoop Shell Commands to Manage HDFS

ELK Stack Tutorial – Discover, Analyze And Visualize Your Data Efficiently

Hadoop Learners’ Profile

Top Big Data Technologies that you Need to know

Join the discussionCancel reply

Trending Courses in Big Data

Microsoft Azure Data Engineering Training Cou ...

PySpark Certification Training Course

Microsoft Fabric Data Engineer Associate Trai ...

Apache Kafka Certification Training Course

Big Data Hadoop Certification Training Course

Applied Data Engineering on Azure Cloud Cours ...

Splunk Certification Training: Power User and ...

Apache Spark and Scala Certification Training ...

ELK Stack Training & Certification