Zookeeper Tutorial: The Guide you need to Master Zookeeper

Big Data and Hadoop (165 Blogs) Become a Certified Professional

Apache Zookeeper is one of the top-notch cluster coordination services that use the most robust synchronization techniques in order to keep the nodes perfectly connected. Zookeeper solves the management of the distributed environment by its simple architecture and personalized API.

- What is Zookeeper?

Architecture of Zookeeper
Zookeeper Data Model
Node Types in Zookeeper
Zookeeper Ensemble
Zookeeper Installation
Zookeeper Command Line Interface
Companies Using Zookeeper

What is Zookeeper?

Zookeeper is a cluster coordinating, cross-platform software service provided by the Apache Foundation. It is essentially designed for providing service for distributed systems offering a hierarchical key-value store, which is used to provide a distributed configuration service, synchronization service, and naming registry for large distributed systems

Architecture of Zookeeper

Apache Zookeeper basically follows the Client-Server Architecture. Participants in the Zookeeper architecture can be enlisted as follows.

The Architecture of Apache Zookeeper is categorized into 5 different components as follows:

Ensemble
Server
Server Leader
Follower
Client

Ensemble

It is basically the collection of all the Server nodes in the Zookeeper ecosystem. The Ensemble requires a minimum of three nodes to get itself set up.

Server

It is one among-st the other servers present in the Zookeeper Ensemble whose objective is to provide all sorts of services to its clients. It sends its alive status to its client in order to inform its clients about its availability.

Server Leader

Ensemble Leader is elected at the service startup. It has access to recover the data from any of the failed nodes and performs automatic data recovery for clients.

Follower

A follower is one of the servers in the Ensemble. Its duty is to follow the orders passed by the Leader.

Client

Clients are the nodes that request service from the server. Similar to servers, the client also sends signals to servers regarding their availability. In case if the server fails to respond, then they automatically redirect themselves to the next available server

Next, in this zookeeper tutorial article, we will learn the Data model of Zookeeper.

Zookeeper Data Model

A Zookeeper Data Model follows a hierarchical namespace where each node is called a Znode, a part of the system where the cluster functions. In the below diagram, you can see the Znode separated by a ‘/’. Considering that as a root, you have two more namespaces underlying the root.

These two nodes are namespaces. config namespace is used for centralized configuration and the workers namespace is used for naming process. The main usage of the data model is to maintain synchronization in the zookeeper cluster and explain the metadata of each Znode.

Now, let us understand the types of znodes.

Node Types in Zookeeper

There are three types of Znodes as mentioned below.

Persistence Znode

All the nodes in an ensemble assume themselves to be Persistence Znodes. These nodes tend to stay alive even after the client is disconnected.

Ephemeral Znode

These type of nodes stay alive until the client is connected to them. When the client gets disconnected, they die. These type of nodes are not allowed to have children.

Sequential Znode

It can be either a Persistence Znode or an Ephemeral Znode. When a node gets created as a Sequential Znode, then you can assign the path of the Znode by attaching a 10 digit sequence number to the original name.

Sessions and Watches

Sessions

A session is a time interval assigned to every client for receiving service. Every client is provided with a Session-ID and the service is provided in sequential order. Every client sends a heartbeat to the server to keep the session valid. If a heartbeat is not received for more than the interval of session-timeout, then the server considers the client to be dead

Watches

These are just notifications to the client. Whenever there is a change in the Ensemble, then the client receives a notification from the ensemble about that change in the form of a watch.

Zookeeper Ensemble

At the beginning of the Zookeeper ensemble, the clients try to connect to one of the nodes in the ensemble. Once connected, the server node sends the confirmation to the client. The client in return sends the heartbeats to confirm its connection.

If the client needs to read data from the server, then it sends the znode path of the data to be read to the server. The Zookeeper provides the client with the required information.

If the client needs to store the information, then the client sends the znode path where the client wishes to store the data. This information is first sent to the ensemble leader. Ensemble leader forwards the write command to all the followers. The write request is processed only if the majority of followers respond with a positive response

The following image depicts the zookeeper ensemble. Every Zookeeper ensemble has some limitations. Let us discuss those.

Limitations:

We cannot establish a Zookeeper Ensemble with one Znode in real-time. Sice, Failure of one Znode results in the complete cluster Failure.
In the case of two Znodes in the Cluster, we would even fail, since one single node cannot be considered as a majority.
If we had three nodes and one fails, then we can consider the remaining nodes as the majority.
Hence, we are expected to provide the minimum requirement of Zookeeper to obtain a stable Ensemble.

Next, in this zookeeper tutorial article, we shall learn the installation of Zookeeper.

Zookeeper Installation

To install Zookeeper into your Linux systems, go through the following procedure.

Step 1: Install Java into your local system.

1	`sudo apt install openjdk-8-jdk-headless`

Step 2: Download the latest version of Zookeeper into your Ubuntu local system.

Step 3: Extract the tar file using the following command.

1	`tar -xvf apache-zookeeper-3.5.6-bin.tar.gz`

Step 4: Set up Zookeeper Configuration file.

# The number of milliseconds of each tick
tickTime=2000
# The number of ticks that the initial 
# synchronization phase can take
initLimit=10
# The number of ticks that can pass between 
# sending a request and getting an acknowledgement
syncLimit=5
# the directory where the snapshot is stored.
# do not use /tmp for storage, /tmp here is just 
# example sakes.
dataDir=/tmp/zookeeper
# the port at which the clients will connect
clientPort=2181
# the maximum number of client connections.
# increase this if you need to handle more clients
#maxClientCnxns=60
#
# Be sure to read the maintenance section of the 
# administrator guide before turning on autopurge.
#
# http://zookeeper.apache.org/doc/current/zookeeperAdmin.html#sc_maintenance
#
# The number of snapshots to retain in dataDir
#autopurge.snapRetainCount=3
# Purge task interval in hours
# Set to "0" to disable auto purge feature
#autopurge.purgeInterval=1

Step 5: Start Zookeeper Server

1	`./zkServer.sh start`

Step 6: Start Client Interface

1	`./zkSCli.sh`

Your Zookeeper has been successfully installed and running.

Similarly, after you are finished with services, you can close the Zookeeper by using the following command.

1	`./zkServer.sh stop`

Now, let us move ahead into the command-line interface

Zookeeper Command Line Interface

ZooKeeper Command Line Interface or in short, the CLI is designed to interact with the ZooKeeper ensemble for development procedures. Its major objective is for debugging and working around with different procedural options.

In order to perform any ZooKeeper CLI operations, we need to turn on your ZooKeeper server. And then, ZooKeeper client. Once the client starts, you can perform the following operation.

Create znodes

Creates new Znodes in the cluster

1	`create /EdurekaZnode “Edurekazookeeper-app”`

//Output:

[zk: localhost:2181(CONNECTED) 0] create /EdurekaZnode “Edurekazookeeper-app”
Created /EdurekaZnode

Creation of Sequential Znode

1	`create -s /EdurekaZnode data`

//Output:

[zk: localhost:2181(CONNECTED) 2] create -s /EdurekaZnode “data”
Created /EdurekaZnode0000000052

Creation of Ephemeral Znode

1	`create -e /EdurekaZnode2 “Ephemeral”`

//Output:

[zk: localhost:2181(CONNECTED) 2] create -e /EdurekaZnode2 “Ephemeral”
Created /EdurekaZnode2

Get data

It returns the associated data of the znode and metadata of the specified znode.

1	`get /EdurekaZnode`

//Output:

[zk: localhost:2181(CONNECTED) 1] get /EdurekaZnode
“Edurekazookeeper-app” cZxid = 0xx21f ctime = Sat 28 17:18:16 IST 2019 mZxid = 0xx21f mtime = Sat Dec 28 17:18:16 IST 2019 pZxid = 0xx21f cversion = 0 dataVersion = 0 aclVersion = 0 ephemeralOwner = 0x0 dataLength = 32 numChildren = 0

In order to access the next sequential znode, you are expected to enter the complete path of znode.

1	`get /EdurekaZnode0000000052`

//Output:

[zk: localhost:2181(CONNECTED) 1] get /EdurekaZnode0000000052
“data”
cZxid = 0xx22
ctime = Sat Dec 28 17:35:55 IST 2019
mZxid = 0xx22
mtime = Sat Dec 29 17:35:55 IST 2019
pZxid = 0xx22
cversion = 0
dataVersion = 0
aclVersion = 0
ephemeralOwner = 0x0
dataLength = 13
numChildren = 0

Watch znode for changes

Process of notifying the client about changes in Ensemble

1	`get /EdurekaZnode` `1`

//Output:

WATCHER: :

WatchedEvent state:SyncConnected type:NodeDataChanged path:/EdurekaZnode 1
cZxid = 0xx21f
ctime = Sat 28 17:42:28 IST 2019
mZxid = 0xx21f
mtime = Sat Dec 28 17:42:28 IST 2019
pZxid = 0xx21f
cversion = 0
dataVersion = 0
aclVersion = 0
ephemeralOwner = 0x0
dataLength = 32
numChildren = 0

Set data

Setting the data of the specified znode.

1	`set /EdurekaZnode2 updatedata`

//Output:

[zk: localhost:2181(CONNECTED) 1] get /EdurekaZnode2 “updatedata”
cZxid = 0xx22
ctime = Sat Dec 28 17:55:20 IST 2019
mZxid = oxx22
mtime = Sat Dec 28 17:55:20 IST 2019
pZxid = 0xx22
cversion = 0
dataVersion = 1
aclVersion = 0
ephemeralOwner = 0xx16016e32db00012
dataLength = 32
numChildren = 0

Create children of a znode

Creates the subordinate child nodes

1	`create /EdurekaZnode/Child1 EdurekaChild`

//Output:

[zk: localhost:2181(CONNECTED) 16] create /EdurekaZnode/Child1 “EdurekaChild”
created /EdurekaZnode/Child1

List children of a znode

We can list and display the children of a znode

1	`ls /EdurekaZnode`

//Output:

[zk: localhost:2181(CONNECTED) 2] ls /EdurekaZnode
[EdurekaChild]

Check Status

It can be used to describe the metadata of a specified znode.

1	`stat /EdurekaZnode`

//Output:

[zk: localhost:2181(CONNECTED) 1] stat /EdurekaZnode
cZxid = 0xx21f
ctime = Sat 28 18:04:26 IST 2019
mZxid = 0xx21f
mtime = Sat Dec 28 18:04:26 IST 2019
pZxid = 0xx21f
cversion = 0
dataVersion = 0
aclVersion = 0
ephemeralOwner = 0x0
dataLength = 32
numChildren = 0

Remove a znode

Removes a specified znode and recursively all its children.

1	`rmr /EdurekaZnode`

//Output:

[zk: localhost:2181(CONNECTED) 20] rmr /EdurekaZnode
[zk: localhost:2181(CONNECTED) 21] get /EdurekaZnode
Node does not exist: /EdurekaZnode

Companies Using Zookeeper

There are many companies using Apache Zookeeper. Few of the major companies using Zookeeper are listed below.

With this, we come to an end of this “Zookeeper Tutorial” article. I hope I have thrown some light on to your knowledge on Zookeeper.

Now that you have understood the concepts Zookeeper Fundamentals from this Zookeeper tutorial article, check out the Hadoop training by Edureka, a trusted online learning company with a network of more than 250,000 satisfied learners spread across the globe. The Edureka Big Data Hadoop Certification Training course helps learners become expert in HDFS, Yarn, MapReduce, Pig, Hive, HBase, Oozie, Flume and Sqoop using real-time use cases on Retail, Social Media, Aviation, Tourism, Finance domain.

If you have any query related to this “Zookeeper Tutorial” article, then please write to us in the comment section below and we will respond to you as early as possible.

Introduction to Big Data

Introduction to Hadoop

Hadoop Distributed File System

Hadoop Installation

YARN & MapReduce

Data Loading Tools

Apache Pig

Apache Hive

DynamoDB vs MongoDB: Which One Meets Your Business Needs Better?

How To Install MongoDB On Windows Operating System?

How To Install MongoDB On Ubuntu Operating System?

How To Install MongoDB on Mac Operating System?

How To Create User In MongoDB?

Apache HBase

Apache Oozie

Hadoop Interview Questions

Career Guidance

Big Data

Zookeeper Tutorial: The Guide you need to Master Zookeeper

What is Zookeeper?

Architecture of Zookeeper

Zookeeper Data Model

Node Types in Zookeeper

Zookeeper Ensemble

Zookeeper Installation

Zookeeper Command Line Interface

Companies Using Zookeeper

Recommended videos for you

Is Hadoop A Necessity For Data Science?

Filtering on HBase Using MapReduce Filtering Pattern

Introduction to Apache Solr-1

Spark SQL | Apache Spark

Apache Spark Will Replace Hadoop ! Know Why

Webinar: Introduction to Big Data & Hadoop

Tailored Big Data Solutions Using MapReduce Design Patterns

Hive Tutorial – Understanding Hive In Depth

Hadoop Cluster With High Availability

Logistic Regression In Data Science

Apache Spark Redefining Big Data Processing

Big Data Processing with Spark and Scala

Real-Time Analytics with Apache Storm

New-Age Search through Apache Solr

Hadoop for Java Professionals

Pig Tutorial – Know Everything About Apache Pig Script

Introduction to Big Data TDD and Pig Unit

Boost Your Data Career with Predictive Analytics! Learn How ?

Improve Customer Service With Big Data

Hadoop-A Highly Available And Secure Enterprise Data Warehousing Solution

Recommended blogs for you

Why do we need Hadoop for Data Science?

We Are Deloitte’s #1 Fastest Growing Tech Company!

Apache Hive Installation on Ubuntu

Why Hadoop?

Azure Synapse: Unlocking the Power of Your Data

Big Data Characteristics: Know the 5’Vs of Big Data

PySpark Programming – Integrating Speed With Simplicity

Apache Pig UDF: Part 1 – Eval, Aggregate & Filter Functions

HBase Architecture: HBase Data Model & HBase Read/Write Mechanism

How to Set Up Hadoop Cluster with HDFS High Availability

Spark Streaming Tutorial – Sentiment Analysis Using Apache Spark

Apache Spark Architecture – Spark Cluster Architecture Explained

Splunk vs. ELK vs. Sumo Logic: Which Works Best For You?

Azure Databricks Architecture Overview

Hadoop Streaming: Writing A Hadoop MapReduce Program In Python

5 Reasons When to and When not to use Hadoop

Cloudera Hadoop: Getting started with CDH Distribution

Real Time Storm Project

What Is Elasticsearch – Getting Started With No Constraints Search Engine

Big Data Applications-Sears Case Study

Join the discussionCancel reply

Trending Courses in Big Data

Microsoft Azure Data Engineering Training Cou ...

Microsoft Fabric Data Engineer Associate Trai ...

PySpark Certification Training Course

Apache Kafka Certification Training Course

Big Data Hadoop Certification Training Course

Applied Data Engineering on Azure Cloud Cours ...

Splunk Certification Training: Power User and ...

Apache Spark and Scala Certification Training ...

ELK Stack Training & Certification