Zookeeper Tutorial: The Guide you need to Master Zookeeper

Last updated on Apr 28,2020 7.2K Views
Tech Enthusiast working as a Research Analyst at Edureka. Curious about learning... Tech Enthusiast working as a Research Analyst at Edureka. Curious about learning more about Data Science and Big-Data Hadoop.

Zookeeper Tutorial: The Guide you need to Master Zookeeper

edureka.co

Apache Zookeeper is one of the top-notch cluster coordination services that use the most robust synchronization techniques in order to keep the nodes perfectly connected. Zookeeper solves the management of the distributed environment by its simple architecture and personalized API.

 

What is Zookeeper?

Zookeeper is a cluster coordinating, cross-platform software service provided by the Apache Foundation. It is essentially designed for providing service for distributed systems offering a hierarchical key-value store, which is used to provide a distributed configuration servicesynchronization service, and naming registry for large distributed systems

 

Architecture of Zookeeper

Apache Zookeeper basically follows the Client-Server Architecture. Participants in the Zookeeper architecture can be enlisted as follows.

The Architecture of Apache Zookeeper is categorized into 5 different components as follows:

Ensemble

It is basically the collection of all the Server nodes in the Zookeeper ecosystem. The Ensemble requires a minimum of three nodes to get itself set up.

Server

It is one among-st the other servers present in the Zookeeper Ensemble whose objective is to provide all sorts of services to its clients. It sends its alive status to its client in order to inform its clients about its availability.

Server Leader

Ensemble Leader is elected at the service startup. It has access to recover the data from any of the failed nodes and performs automatic data recovery for clients.

Follower

A follower is one of the servers in the Ensemble. Its duty is to follow the orders passed by the Leader.

Client

Clients are the nodes that request service from the server. Similar to servers, the client also sends signals to servers regarding their availability. In case if the server fails to respond, then they automatically redirect themselves to the next available server

Next, in this zookeeper tutorial article, we will learn the Data model of Zookeeper.

 

Zookeeper Data Model

A Zookeeper Data Model follows a hierarchical namespace where each node is called a Znode, a part of the system where the cluster functions. In the below diagram, you can see the Znode separated by a ‘/’. Considering that as a root, you have two more namespaces underlying the root.

These two nodes are namespaces. config namespace is used for centralized configuration and the workers namespace is used for naming process. The main usage of the data model is to maintain synchronization in the zookeeper cluster and explain the metadata of each Znode.

Now, let us understand the types of znodes.

 

Node Types in Zookeeper

There are three types of Znodes as mentioned below.

Persistence Znode

All the nodes in an ensemble assume themselves to be Persistence Znodes. These nodes tend to stay alive even after the client is disconnected.

Ephemeral Znode

These type of nodes stay alive until the client is connected to them. When the client gets disconnected, they die. These type of nodes are not allowed to have children.

Sequential Znode

It can be either a Persistence Znode or an Ephemeral Znode. When a node gets created as a Sequential Znode, then you can assign the path of the Znode by attaching a 10 digit sequence number to the original name.

Sessions and Watches

Sessions

A session is a time interval assigned to every client for receiving service. Every client is provided with a Session-ID and the service is provided in sequential order. Every client sends a heartbeat to the server to keep the session valid. If a heartbeat is not received for more than the interval of session-timeout, then the server considers the client to be dead

Watches

These are just notifications to the client. Whenever there is a change in the Ensemble, then the client receives a notification from the ensemble about that change in the form of a watch.

 

Zookeeper Ensemble

At the beginning of the Zookeeper ensemble, the clients try to connect to one of the nodes in the ensemble. Once connected, the server node sends the confirmation to the client. The client in return sends the heartbeats to confirm its connection.

If the client needs to read data from the server, then it sends the znode path of the data to be read to the server. The Zookeeper provides the client with the required information.

If the client needs to store the information, then the client sends the znode path where the client wishes to store the data. This information is first sent to the ensemble leader. Ensemble leader forwards the write command to all the followers. The write request is processed only if the majority of followers respond with a positive response

The following image depicts the zookeeper ensemble. Every Zookeeper ensemble has some limitations. Let us discuss those.

Limitations:

Next, in this zookeeper tutorial article, we shall learn the installation of Zookeeper.

 

Zookeeper Installation 

To install Zookeeper into your Linux systems, go through the following procedure.

Step 1: Install Java into your local system.

sudo apt install openjdk-8-jdk-headless

Step 2: Download the latest version of Zookeeper into your Ubuntu local system.

Step 3: Extract the tar file using the following command.

tar -xvf apache-zookeeper-3.5.6-bin.tar.gz

Step 4: Set up Zookeeper Configuration file.

# The number of milliseconds of each tick
tickTime=2000
# The number of ticks that the initial 
# synchronization phase can take
initLimit=10
# The number of ticks that can pass between 
# sending a request and getting an acknowledgement
syncLimit=5
# the directory where the snapshot is stored.
# do not use /tmp for storage, /tmp here is just 
# example sakes.
dataDir=/tmp/zookeeper
# the port at which the clients will connect
clientPort=2181
# the maximum number of client connections.
# increase this if you need to handle more clients
#maxClientCnxns=60
#
# Be sure to read the maintenance section of the 
# administrator guide before turning on autopurge.
#
# http://zookeeper.apache.org/doc/current/zookeeperAdmin.html#sc_maintenance
#
# The number of snapshots to retain in dataDir
#autopurge.snapRetainCount=3
# Purge task interval in hours
# Set to "0" to disable auto purge feature
#autopurge.purgeInterval=1

Step 5: Start Zookeeper Server

./zkServer.sh start

Step 6: Start Client Interface

./zkSCli.sh

Your Zookeeper has been successfully installed and running.

Similarly, after you are finished with services, you can close the Zookeeper by using the following command.

./zkServer.sh stop

Now, let us move ahead into the command-line interface

 

Zookeeper Command Line Interface

ZooKeeper Command Line Interface or in short, the CLI is designed to interact with the ZooKeeper ensemble for development procedures. Its major objective is for debugging and working around with different procedural options.

In order to perform any ZooKeeper CLI operations, we need to turn on your ZooKeeper server. And then, ZooKeeper client. Once the client starts, you can perform the following operation.

Creates new Znodes in the cluster

create /EdurekaZnode “Edurekazookeeper-app”

//Output:

[zk: localhost:2181(CONNECTED) 0] create /EdurekaZnode “Edurekazookeeper-app”
Created /EdurekaZnode

Creation of Sequential Znode

create -s /EdurekaZnode data

//Output:

[zk: localhost:2181(CONNECTED) 2] create -s /EdurekaZnode “data”
Created /EdurekaZnode0000000052

Creation of Ephemeral Znode

create -e /EdurekaZnode2 “Ephemeral”

//Output:

[zk: localhost:2181(CONNECTED) 2] create -e /EdurekaZnode2 “Ephemeral”
Created /EdurekaZnode2

It returns the associated data of the znode and metadata of the specified znode.

get /EdurekaZnode

//Output:

[zk: localhost:2181(CONNECTED) 1] get /EdurekaZnode
“Edurekazookeeper-app”
cZxid = 0xx21f
ctime = Sat 28 17:18:16 IST 2019
mZxid = 0xx21f
mtime = Sat Dec 28 17:18:16 IST 2019
pZxid = 0xx21f
cversion = 0
dataVersion = 0
aclVersion = 0
ephemeralOwner = 0x0
dataLength = 32
numChildren = 0

In order to access the next sequential znode, you are expected to enter the complete path of znode.

get /EdurekaZnode0000000052

//Output:

[zk: localhost:2181(CONNECTED) 1] get /EdurekaZnode0000000052
“data”
cZxid = 0xx22
ctime = Sat Dec 28 17:35:55 IST 2019
mZxid = 0xx22
mtime = Sat Dec 29 17:35:55 IST 2019
pZxid = 0xx22
cversion = 0
dataVersion = 0
aclVersion = 0
ephemeralOwner = 0x0
dataLength = 13
numChildren = 0

Process of notifying the client about changes in Ensemble

get /EdurekaZnode 1

//Output:

WATCHER: :

WatchedEvent state:SyncConnected type:NodeDataChanged path:/EdurekaZnode 1
cZxid = 0xx21f
ctime = Sat 28 17:42:28 IST 2019
mZxid = 0xx21f
mtime = Sat Dec 28 17:42:28 IST 2019
pZxid = 0xx21f
cversion = 0
dataVersion = 0
aclVersion = 0
ephemeralOwner = 0x0
dataLength = 32
numChildren = 0

Setting the data of the specified znode.

set /EdurekaZnode2 updatedata

//Output:

[zk: localhost:2181(CONNECTED) 1] get /EdurekaZnode2 “updatedata”
cZxid = 0xx22
ctime = Sat Dec 28 17:55:20 IST 2019
mZxid = oxx22
mtime = Sat Dec 28 17:55:20 IST 2019
pZxid = 0xx22
cversion = 0
dataVersion = 1
aclVersion = 0
ephemeralOwner = 0xx16016e32db00012
dataLength = 32
numChildren = 0

Creates the subordinate child nodes

create /EdurekaZnode/Child1 EdurekaChild

//Output:

[zk: localhost:2181(CONNECTED) 16] create /EdurekaZnode/Child1 “EdurekaChild”
created /EdurekaZnode/Child1

We can list and display the children of a znode

ls /EdurekaZnode

//Output:

[zk: localhost:2181(CONNECTED) 2] ls /EdurekaZnode
[EdurekaChild]

It can be used to describe the metadata of a specified znode.

stat /EdurekaZnode

//Output:

[zk: localhost:2181(CONNECTED) 1] stat /EdurekaZnode
cZxid = 0xx21f
ctime = Sat 28 18:04:26 IST 2019
mZxid = 0xx21f
mtime = Sat Dec 28 18:04:26 IST 2019
pZxid = 0xx21f
cversion = 0
dataVersion = 0
aclVersion = 0
ephemeralOwner = 0x0
dataLength = 32
numChildren = 0

Removes a specified znode and recursively all its children. 

rmr /EdurekaZnode

//Output:

[zk: localhost:2181(CONNECTED) 20] rmr /EdurekaZnode
[zk: localhost:2181(CONNECTED) 21] get /EdurekaZnode
Node does not exist: /EdurekaZnode

 

Companies Using Zookeeper

There are many companies using Apache Zookeeper. Few of the major companies using Zookeeper are listed below.

 

With this, we come to an end of this “Zookeeper Tutorial” article. I hope I have thrown some light on to your knowledge on Zookeeper.

Now that you have understood the concepts Zookeeper Fundamentals from this Zookeeper tutorial article, check out the Hadoop training by Edureka, a trusted online learning company with a network of more than 250,000 satisfied learners spread across the globe. The Edureka Big Data Hadoop Certification Training course helps learners become expert in HDFS, Yarn, MapReduce, Pig, Hive, HBase, Oozie, Flume and Sqoop using real-time use cases on Retail, Social Media, Aviation, Tourism, Finance domain.

If you have any query related to this “Zookeeper Tutorial” article, then please write to us in the comment section below and we will respond to you as early as possible.

BROWSE COURSES