Microsoft Azure Data Engineering Certificatio ...
- 13k Enrolled Learners
- Weekend
- Live Class
This is the 2nd post in series of Apache Pig Operators. This post is about the ‘Diagnostic Operators’ in Apache Pig. You can also refer to our previous post on Relational Operators for more information.
Let’s create two files to run the commands. We have two files with name ‘first’ and ‘second.’ The first file contain three fields: user, url & id.
The second file contain two fields: url & rating. These two files are CSV files.
The DUMP operator is used to run Pig Latin statements and display the results on the screen. In this example, the operator prints ‘loading1’ on to the screen.
Use the DESCRIBE operator to review the schema of a particular relation. The DESCRIBE operator is best used for debugging a script.
ILLUSTRATE operator is used to review how data is transformed through a sequence of Pig Latin statements. ILLUSTRATE command is your best friend when it comes to debugging a script. This command alone might be a good reason for choosing Pig over something else.
The EXPLAIN operator prints the logical and physical plane.
0.12.0 is the current version of Apache Pig available. This release include several new features such as ASSERT operator, IN operator, CASE operator.
An Assert operator can be used for data validation. For example, the following script will fail if any value is a negative integer:
a = load ‘something’ as (a0: int, a1: int);
assert a by a0 > 0, ‘a can’t be negative for reasons’;
Previously, Pig had no support for IN operators. To imitate an IN operation, users had to concatenate several OR operators, as shown in below example:
a = LOAD ‘1.txt’ USING PigStorage (‘,’) AS (i:int);
b = FILTER a BY
(i == 1) OR
(i == 22) OR
(i == 333) OR
(i == 4444) OR
(i == 55555)
Now, this type of expression can be re-written in a more compressed manner using an IN operator:
a = LOAD ‘1.txt’ USING PigStorage (‘,’) AS (i:int);
b = FILTER a BY i IN (1, 22, 333, 4444, 55555);
Earlier, Pig had no support for a CASE statement. To mimic it, users often use nested bincond operators. Those could become unreadable when there were multiple levels of nesting. Following is an example of the type of CASE expression that Pig currently supports:
Case_operator = FOREACH foo GENERATE (
CASE i % 3
WHEN 0 THEN ‘3n’
WHEN 1 THEN ‘3n+1’
ELSE ‘3n+2’
END
);
Got a question for us? Please mention them in the comments section and we will get back to you.
Related Posts:
Operators in Apache Pig – Relational Operators
edureka.co
is there a command to join two files without duplicate columns?
very good blog.Easy to understand ! thank u Edureka!
Hi Teja,
Thank you so much for your great feedback. We hope that you will find our blog useful in future as well.
Keep visiting the Edureka Blog page for latest posts on this link:https://www.edureka.co/blog/
Hi All,
I need to put IF, then IF, ELSE IF conditions, how can I do that in PIG. Please let me know.Thanks in advance.
Nice Blog!! simple and to the point
Hi Bindu,
Thank you for your positive feedback. We hope that you will find our blog useful in future as well. Keep visiting the Edureka Blog page for latest posts on this link:
https://www.edureka.co/blog/
if i want to use In clause with matches is there a way?
what is siginificance of output given by Explain command. Please give details with example .
Hi Devinder, we use
the EXPLAIN operator to review the logical, physical, and map reduce
execution plans that are used to compute the specified relationship.
If no script is given, the logical plan shows a pipeline of operators to be executed to build the relation. Type checking and backend-independent
optimizations (such as applying filters early on) also applies. The physical plan shows how the logical operators are translated to backend-specific physical operators. Some backend optimizations also applies. The mapreduce plan shows how the physical operators are grouped into map reduce jobs. If a script without an alias is specified, it will output the entire execution graph (logical, physical, or map reduce). If a script with a alias is specified, it will output the plan for the given alias.
I am using Apache Pig version 0.12.0-cdh5.2.1 and Illustrate is giving error .
ERROR 2997: Encountered IOException. Exception
seems it is noty supported.
Hi Devinder, can you please share more details about the error. Meanwhile can you try to run this command in local mode of Pig and check.
Nicely explained. If any new updates are coming for this page, please let me know.
Thanks Sushobhit! You can get regular updates by subscribing to our blog. You can use the Subscription form on the right side of this post.