Introduction to Pentaho Data Integration

Last updated on Dec 19,2021 6.2K Views

Introduction to Pentaho Data Integration

edureka.co

The Pentaho Data Integration is intended to Extract, Transform, Load (ETL) mainly. It consists of the following elements:

DI Server (Server Application)

Data integration server executes jobs and transformations using PDI engine. It has default user and role-based security and can also be integrated with existing LDAP/ Active Directory security providers. Here, we can store the transformations and jobs stored at one commonplace.

Design Tool (standalone) – It is for designing jobs and transformations

Spoon – GUI Tool to develop all jobs & transformations

Kitchen – Tool to run any job & transformations

Pan – Tool to run just the transformations

Carte – Remote ETL Server

In a data warehouse, historical data is loaded at one go and historical data is available with the organization. On a daily basis, since we won’t be able to run the entire data repeatedly into the data warehouse, we go forward with the incremental load.

The incremental load involves loading any changed data from the source site. It’s important to know that we won’t be able to sit or run the job & transformation manually every day so we must schedule the job.  We schedule it on a weekly basis using windows scheduler and it runs the particular job at a specific time in order to run the incremental data into the data warehouse. This is known as the command prompt feature of PDI (Pentaho Data Integration).

Data Connections  –  Which is used for making connections from the source to the target database.

Transformation – It works on extracting and loading data into the data warehouse.

What is Spoon?

It’s a GUI tool for developing jobs and transformations. It is easy to learn and is user-friendly. There is a transformation already opened under the name ‘DIM_Product’. On the left side there are two tabs called View and Design.  Here, we build a Database Connection to get data or load data from data warehouse. In the design tab we have different nodes such as:

Input – Where we need to extract the data.

Output – In order to load data.

Transform – This involves connectors and logic.

If you’re looking to improve your abilities and learn more about Business Intelligence tools to become certified as a Business Intelligence Professional. Explore the Tableau training or Pentaho BI Training as well as the certification program. Expertly designed by experts and taught by experts This program might be the one you’re hoping to learn.

 

BROWSE COURSES