Informatica Certification Training Course
- 22k Enrolled Learners
- Weekend/Weekday
- Live Class
Talend is said to be the next generation leader in cloud & data integration software and currently holds a market share of 19.3%. This means there’s going to be a huge demand for professionals having Talend Certification in near future. I think this is a good time to grab this opportunity and prepare yourself to ace the competition. In this Talend interview questions blog, I have selected the top 75 questions which will help you in cracking your interview. I have divided this list of Talend interview questions into 4 sections:
Following are few of the advantages of Talend:
Feature | Description |
Faster | Talend automates the tasks and further maintains them for you. |
Less Expense | Talend provides open source tools which can be downloaded free of cost. Moreover, as the processes speed up, the developer rates are reduced as well. |
Future Proof | Talend is comprised of everything that you might need to meet the marketing requirements today as well as in the future. |
Unified Platform | Talend meets all of our needs under a common foundation for the products based on the needs of the organization. |
Huge Community | Being open source, it is backed up by a huge community. |
Talend is an open source software integration platform/vendor.
Talend Open Studio is an open source project that is based on Eclipse RCP. It supports ETL oriented implementations and is generally provided for the on-premises deployment. This acts as a code generator which produces data transformation scripts and underlying programs in Java. It provides an interactive and user-friendly GUI which lets you access the metadata repository containing the definition and configurations for each process performed in Talend.
‘Project’ is the highest physical structure which bundles up and stores all types of Business Models, Jobs, metadata, routines, context variables or any other technical resources.
A Job is a basic executable unit of anything that is built using Talend. It is technically a single Java class which defines the working and scope of information available with the help of graphical representation. It implements the data flow by translating the business needs into code, routines, and programs.
A component is a functional piece which is used to perform a single operation in Talend. On the palette, whatever you can see all are the graphical representation of the components. You can use them with a simple drag and drop. At the backend, a component is a snippet of Java code that is generated as a part of a Job (which is basically a Java class). These Java codes are automatically compiled by Talend when the Job is saved.
Connections in Talend define whether the data has to be processed, data output, or the logical sequence of a Job. Various types of connections provided by Talend are:
OnComponentOk | OnSubjobOk |
1. Belongs to Component Triggers | 1. Belongs to Subjob Triggers |
2. The linked Subjob starts executing only when the previous component successfully finishes its execution | 2. The linked Subjob starts executing only when the previous Subjob completely finishes its execution |
3. This link can be used with any component in a Job | 3. This link can only be used with the first component of the Subjob |
Talend provides a user-friendly GUI where you can simply drag and drop the components to design a Job. When the Job is executed, Talend Studio automatically translates it into a Java class at the backend. Each component present in a Job is divided into three parts of Java code (begin, main and end). This is why Talend studio is called a code generator.
Some of the major types of schemas supported by Talend are:
Routines are the reusable pieces of Java code. Using routines you can write custom code in Java in order to optimize data processing, improve Job capacity, and extend Talend Studio features.
Talend supports two types of routines:
Schemas can’t be defined during runtime. As the schemas define the movement of data, it must be defined while configuring the components.
Built-in | Repository |
1. Stored locally inside a Job | 1. Stored centrally inside the Repository |
2. Can be used by the local Job only | 2. Can be used globally by any Job within a project |
3. Can be updated easily within a Job | 3. Data is read-only within a Job |
Context variables are the user-defined parameters used by Talend which are passed into a Job at the runtime. These variables may change their values as the Job promotes from Development to Test and Production environment. Context variables can be defined in three ways:
Yes, you can do that by declaring a static variable within a routine. Then you need to add the setter/getter methods for this variable in the routine itself. Once done, this variable will be accessible from multiple Jobs.
A Subjob can be defined as a single component or a number of components which are joined by data-flow. A Job can have at least one Subjob. To pass a value from the parent Job to child Job you need to make use of context variables.
Outline View in Talend Open Studio is used to keep the track of return values available in a component. This will also include the user-defined values configured in a tSetGlobal component.
tMap is one of the core components which belongs to the ‘Processing’ family in Talend. It is primarily used for mapping the input data to the output data. tMap can perform following functions:
tMap | tJoin |
1. It is a powerful component which can handle complicated cases | 1. Can only handle basic Join cases |
2. Can accept multiple input links (one is main and rest are lookups) | 2. Can accept only two input links (main and lookup) |
3. Can have more than one output links | 3. Can have only two output links (main and reject) |
4. Supports multiple types of join models like unique join, first join, and all join etc. | 4. Supports only unique join |
5. Supports inner join and left outer join | 5. Supports only inner join |
6. Can filter data using filter expressions | 6. Can’t-do so |
A scheduler is a software which selects processes from the queue and loads them into memory for execution. Talend does not provide a built-in scheduler.
ETL stands for Extract, Transform and Load. It refers to a trio of processes which are required to move the raw data from its source to a data warehouse, a business intelligence system, or a big data platform.
ETL | ELT |
1. Data is first Extracted, then it is Transformed before it is Loaded into a target system | 1. Data is first Extracted, then it is Loaded to the target systems where it is further Transformed |
2. With the increase in the size of data, processing slows down as entire ETL process needs to wait till Transformation is over | 2. Processing is not dependent on the size of the data |
3. Easy to implement | 3. Needs deep knowledge of tools in order to implement |
4. Doesn’t provide Data Lake support | 4. Provides Data Lake support |
5. Supports relational data | 5. Supports unstructured data |
No, the transfer modes can’t be used in SFTP connections. SFTP doesn’t support any kind of transfer modes as it is an extension to SSH and assumes an underlying secure channel.
In order to schedule a Job in Talend first, you need to export the Job as a standalone program. Then using your OS’ native scheduling tools (Windows Task Scheduler, Linux, Cron etc.) you can schedule your Jobs.
tDenormalizeSortedRow belongs to the ‘Processing’ family of the components. It helps in synthesizing sorted input flow in order to save memory. It combines all input sorted rows in a group where the distinct values are joined with item separators.
insert or update: In this action, first Talend tries to insert a record, but if a record with a matching primary key already exists, then it updates that record.
update or insert: In this action, Talend first tries to update a record with a matching primary key, but if there is none, then the record is inserted.
tContextLoad belongs to the ‘Misc’ family of components. This component helps in modifying the values of the active context on the fly. Basically, it is used to load a context from a flow. It sends warnings if the parameters defined in the input are not defined in the context and also if the context is not initialized in the incoming data.
XMS parameter is used to specify the initial heap size in Java whereas XMX parameter is used to specify the maximum heap size in Java.
From an Expression Editor, all the expressions like Input, Var or Output, and constraint statements can be viewed and edited easily. Expression Editor comes with a dedicated view for writing any function or transformation. The necessary expressions which are needed for the data transformation can be directly written in the Expression editor or you can also open the Expression Builder dialog box where you can just write the data transformation expressions.
There are few ways in which errors in Talend can be handled:
Functions | tJava | tJavaRow | tJavaFlex |
1. Can be used to integrate custom Java code | Yes | Yes | Yes |
2. Will be executed only once at the beginning of the Subjob | Yes | No | No |
3. Needs input flow | No | Yes | No |
4. Needs output flow | No | Only if output schema is defined | Only if output schema is defined |
5. Can be used as the first component of a Job | Yes | No | Yes |
6. Can be used as a different Subjob | Yes | No | Yes |
7. Allows Main Flow or Iterator Flow | Both | Only Main | Both |
8. Has three parts of Java code | No | No | Yes |
9. Can auto propagate data | No | No | Yes |
You can execute a Talend Job remotely from the command line. All you need to do is, export the job along with its dependencies and then access its instructions files from the terminal.
Yes, the headers and footers can be excluded easily before loading the data from the input files.
‘Heap Space Issue’ occurs when JVM tries to add more data into the heap space area than the space available. To resolve this issue, you need to modify the memory allocated to the Talend Studio. Then you have to modify the relevant Studio .ini configuration file according to your system and need.
This component transforms and routes the data from single or multiple sources to single or multiple destinations. It is an advanced component which is sculpted for transforming and routing XML data flow. Especially when we need to process numerous XML data sources.
Talend Open Studio for Big Data is the superset of Talend For Data Integration. It contains all the functionalities provided by TOS for DI along with some additional functionalities like support for Big Data technologies. That is, TOS for DI generates only the Java codes whereas TOS for BD generates MapReduce codes along with the Java codes.
In TOS for BD, the Big Data family is really very large and few of the most used technologies are:
As Talend is a java-code generator, various Jobs and Subjobs in multiple threads can be executed to reduce the runtime of a Job. Basically, there are three ways for parallel execution in Talend Data Integration:
In order to connect to HDFS you must provide the following details:
Zookeeper service is mandatory for coordinating the transactions between TOS and HBase.
Pig Latin is used for scripting in Pig.
This component creates a Kafka topic which the other Kafka components can use as well. It allows you to visually generate the command to create a topic with various properties at topic-level.
Once the data is validated, this component helps in loading the original input data to an output stream in just one single transaction. It sets up a connection to the data source for the current transaction.
Using a tPostJob and tHiveClose components you can close a Hive connection automatically.
Repository
Run view
Designer Workspace
tChronometerStart [Ans]
tFlowMeterCatcher
edureka.co
talend components always have a main connection as output?
Thanks for providing this interview questions. These are very helpful when i attend my interview for job
I found a few points that could be enhanced.
OnComponent vs OnSubjob
When using OnSubjob the previous function call is complete, and garbage collector could free up memory. Also when using OnComponent the data may not be committed, depending on where you put the link. These are 2 good reasons to use OnSubjobOK every time.
Since In some answers mention pieces which is for Enterprise studio (for example the tParallelize ) I feel that the following questions could be enhanced.
Can you define schema at runtime in Talend?
Talend Enterprise has a dynamic schema feature that doesn’t limit you to define the schema in runtime. However it makes hard to modify the data in runtime. (But one can write custom code to do it.)
Scheduling
Talend Enterprise has a web interface that lets you build and schedule jobs.
Hey Balazs, thank you for pointing this out. We hope that you liked our blogs and found it useful :)