Brief Introduction to Oozie

Workflow Example:

<workflow-app nome='wordcount –wf’> <start to= ‘wordcount’/> <action name=’Wordcount'> <map-reduce> <job-tracker>foo.com:9001</job-tracker> <name-node>hdfs://bar.com:9000</name-node> <configuration> <property> <name>mapred.input.dir</name> <value>${inputDir}</value,> </property> <property> <name>mapred.output.dir</name> <value> ${outputDir}</value> </property> </configuration> </map-reduce> <ok to='end’/> <error to='kill'/> </action> <kill name='kill'/> <end name='end'/> </Workflow-app>

Workflow Definition:

A workflow definition is a DAG with control flow nodes or action nodes, where the nodes are connected by transitions arrows.

Control Flow Nodes:

The control flow provides a way to control the Workflow execution path. Flow control operations within the workflow applications can be done through the following nodes:

Start/end/kill

Decision

Fork/join

Action Nodes:

Map-reduce

Pig

HDFS

Sub-workflow

Java – Run custom Java code

Hey Rajiv, thanks for checking out our blog. Please refer to the steps given below to step up cron job:
1. Prepare SQL to be run on using CRON
2. See below for example of code which needs to be added to SQL code for a cron job
.logon server/user_id, Teradata password
For example :
.logon Mozart/akatarni,Welcome1
ADD THE SQL CODE HERE
.logoff
.quit
.exit
3. WinSCP – this is the file transfer application that is used to transfer the .SQL code file to the server.
a. Open “WinSCP”, Server name = phximdsas02.phx.ebay.com
b. Give login id and SAS password
c. Copy the code from your system to server window, in the attached snap shot we have copied “ask_lstg.sql” from genpact(personal system) to server window.
i. Left window shows your personal computer and right one is server
4. Open “Putty”. Use server phximdsas02.phx.ebay.com.
https://uploads.disquscdn.com/images/78d54b229f0ce485a72b7984a886306720904f6e58179446069905356f639f94.png
5. At the prompt, enter SAS credentials. After entering the password , you will see the attached window.
https://uploads.disquscdn.com/images/d89991fa7351bb7a80085c13e9ec8028f333f79c3ab3e6da6c406491ceb53dde.png
6. To open the editor :
a. Type export EDITOR=vi <hit enter>
b. Type crontab -e <hit enter>
i. This command edits your crontab file, or create one if it doesn’t already exist.
c. Press “i” to start typing
d. Press <ESC> to get out of insert mode
7. Then make the cron job entry:
A crontab entry has five fields for specifying day, date and time followed by the command to be run at that interval.
00 06 * * * /usr/bin/bteq <fake_lstg.sql> fake_lstg.LOG 2>&1
The above will run the code at 06:00 hours every day
In the above example, “fake_lstg.sql” is SQL file, “fake_lstg.LOG” is the log file where results will appear
15 20 * * 0 /usr/bin/bteq <fake_lstg.sql> fake_lstg.LOG 2>&1
The above will run the code at 20:15 hours every Sunday
https://uploads.disquscdn.com/images/4b910f3e7b76f35e76b9e9a338c5f547932c7c2e897d8b8a109c15c224fa0e01.png
8. Keep adding lines to the crontab file to schedule more job.
a. The easiest way to add a line is to be at the first character in the file, then in ESC mode,
click <shift> + O (case sensitive). This adds a new line above the current one.
9. To move around the file, in ESC mode
“l” – move right
“h” – move left
“j” – move down
“k” – move up
10. To save the crontab file and exit, press <ESC>, then :wq
a. To exit the file WITHOUT saving, press <ESC>, the :q!
11. Type Exit at the Unix prompt to exit Putty.
12. The cron job should run at the specified time
13. Check the *.LOG file to make sure code ran successfully.
Hope this helps. Cheers!

Comments

5 Comments

Rajiv says:
Dec 31, 2016 at 11:14 am GMT
sir how to schedule job using crontab
- EdurekaSupport says:
  Jan 4, 2017 at 2:35 pm GMT
  Hey Rajiv, thanks for checking out our blog. Please refer to the steps given below to step up cron job:
  1. Prepare SQL to be run on using CRON
  2. See below for example of code which needs to be added to SQL code for a cron job
  .logon server/user_id, Teradata password
  For example :
  .logon Mozart/akatarni,Welcome1
  ADD THE SQL CODE HERE
  .logoff
  .quit
  .exit
  3. WinSCP – this is the file transfer application that is used to transfer the .SQL code file to the server.
  a. Open “WinSCP”, Server name = phximdsas02.phx.ebay.com
  b. Give login id and SAS password
  c. Copy the code from your system to server window, in the attached snap shot we have copied “ask_lstg.sql” from genpact(personal system) to server window.
  i. Left window shows your personal computer and right one is server
  4. Open “Putty”. Use server phximdsas02.phx.ebay.com.
  https://uploads.disquscdn.com/images/78d54b229f0ce485a72b7984a886306720904f6e58179446069905356f639f94.png
  5. At the prompt, enter SAS credentials. After entering the password , you will see the attached window.
  https://uploads.disquscdn.com/images/d89991fa7351bb7a80085c13e9ec8028f333f79c3ab3e6da6c406491ceb53dde.png
  6. To open the editor :
  a. Type export EDITOR=vi <hit enter>
  b. Type crontab -e <hit enter>
  i. This command edits your crontab file, or create one if it doesn’t already exist.
  c. Press “i” to start typing
  d. Press <ESC> to get out of insert mode
  7. Then make the cron job entry:
  A crontab entry has five fields for specifying day, date and time followed by the command to be run at that interval.
  00 06 * * * /usr/bin/bteq <fake_lstg.sql> fake_lstg.LOG 2>&1
  The above will run the code at 06:00 hours every day
  In the above example, “fake_lstg.sql” is SQL file, “fake_lstg.LOG” is the log file where results will appear
  15 20 * * 0 /usr/bin/bteq <fake_lstg.sql> fake_lstg.LOG 2>&1
  The above will run the code at 20:15 hours every Sunday
  https://uploads.disquscdn.com/images/4b910f3e7b76f35e76b9e9a338c5f547932c7c2e897d8b8a109c15c224fa0e01.png
  8. Keep adding lines to the crontab file to schedule more job.
  a. The easiest way to add a line is to be at the first character in the file, then in ESC mode,
  click <shift> + O (case sensitive). This adds a new line above the current one.
  9. To move around the file, in ESC mode
  “l” – move right
  “h” – move left
  “j” – move down
  “k” – move up
  10. To save the crontab file and exit, press <ESC>, then :wq
  a. To exit the file WITHOUT saving, press <ESC>, the :q!
  11. Type Exit at the Unix prompt to exit Putty.
  12. The cron job should run at the specified time
  13. Check the *.LOG file to make sure code ran successfully.
  Hope this helps. Cheers!
  - Rajiv says:
    Jan 4, 2017 at 3:18 pm GMT
    sir thanks for giving answer to my question..its helpful form me…good and fine description..thanks to u sir
Sankalp Tomar says:
Aug 19, 2016 at 12:55 pm GMT
Hi,
Suppose we want to use the output of Hive Job as an input to Mapreduce Job. How can we achieve this??
- EdurekaSupport says:
  Jan 5, 2017 at 7:31 am GMT
  Hey Sankalp, thanks for checking out our blog. With regard to your query, first we can store the output of hive in hdfs and then we can execute it as an input file for mapreduce code.
  Storing the output of hive.
  INSERT OVERWRITE DIRECTORY ‘/path/to/output/dir’
  ROW FORMAT DELIMITED
  FIELDS TERMINATED BY ‘,’
  select books from table;
  Hope this helps. Cheers!

Introduction to Big Data

Introduction to Hadoop

Hadoop Distributed File System

Hadoop Installation

YARN & MapReduce

Data Loading Tools

Apache Pig

Apache Hive

DynamoDB vs MongoDB: Which One Meets Your Business Needs Better?

How To Install MongoDB On Windows Operating System?

How To Install MongoDB On Ubuntu Operating System?

How To Install MongoDB on Mac Operating System?

How To Create User In MongoDB?

Apache HBase

Apache Oozie

Hadoop Interview Questions

Career Guidance

Big Data

Brief Introduction to Oozie

Features:

Workflow – Directed Acyclic Graph of Jobs:

Workflow Example:

Workflow Definition:

Workflow Application:

Application Deployment:

Workflow Job Parameters:

Job Execution:

Recommended videos for you

When not to use Hadoop

New-Age Search through Apache Solr

Pig Tutorial – Know Everything About Apache Pig Script

What is Apache Storm all about?

Introduction to Hadoop Administration

Apache Kafka With Spark Streaming: Real-Time Analytics Redefined

Hive Tutorial – Understanding Hive In Depth

Distributed Cache With MapReduce

MapReduce Design Patterns – Application of Join Pattern

Hadoop Cluster With High Availability

Advanced Security In Hadoop Cluster

Apache Spark Redefining Big Data Processing

Big Data Processing With Apache Spark

Streaming With Apache Spark and Scala

Apache Spark For Faster Batch Processing

Is It The Right Time For Me To Learn Hadoop ? Find out.

Big Data – XML Parsing With MapReduce

Power of Python With BigData

What is Big Data and Why Learn Hadoop!!!

Secure Your Hadoop Cluster With Kerberos

Recommended blogs for you

Introduction to Spark with Python – PySpark for Beginners

NameNode High Availability with Quorum Journal Manager

Azure Data Engineer Roadmap in 2025

Install Hadoop: Setting up a Single Node Hadoop Cluster

Big Bucks for Big Data Professionals: A Hype or Hope?

How to Run Hive Scripts?

Essential Hadoop Tools for Crunching Big Data

Hive & Yarn Get Electrified By Spark

Introduction to Lambda Architecture

Spark MLlib – Machine Learning Library Of Apache Spark

Operators in Apache Pig: Part 2- Diagnostic Operators

Why Hadoop?

Machine Learning and Big Data: Is it the future?

Data Engineer Salary in India

What are the Key Terminologies in Hadoop Security?

Why do we need Hadoop for Data Science?

Hadoop Cluster : The all you need to know Guide

Real Time Big Data Applications in Various Domains

4 Practical Reasons to Learn Hadoop 2.0

Apache Pig Installation on Linux

Join the discussionCancel reply

Trending Courses in Big Data

Microsoft Azure Data Engineering Training Cou ...

Microsoft Fabric Data Engineer Associate Trai ...

PySpark Certification Training Course

Apache Kafka Certification Training Course

Big Data Hadoop Certification Training Course

Applied Data Engineering on Azure Cloud Cours ...

Splunk Certification Training: Power User and ...

ELK Stack Training & Certification

Apache Spark and Scala Certification Training ...