August 16, 2014

Over the past few weeks, I have been working with Apache ZooKeeper for a certain project at work. ZooKeeper is a pretty cool technology and I thought I would document what I’ve learned so far, for others looking for a short introduction and for myself to look back on in the future.

What is ZooKeeper?

ZooKeeper is:

“A centralized service for maintaining configuration information, naming, providing distributed synchronization, and providing group services”.

Informally, ZooKeeper is just a service that distributed systems can rely on for coordination. It allows developers to focus on implementing the core application logic, without having to worry about also implementing any coordination tasks required for the application. In today’s “big data” world, applications are usually individual, independent programs running on separate, physical machines. This sort of distribution further complicates the already-complicated problem of synchronizing shared data and state between processes. ZooKeeper provides a fast, highly available, fault tolerant, and distributed solution to help solve these synchronization issues.

Like a lot of innovative ideas these days, ZooKeeper was originally conceived by Google as Google Chubby. In that paper, Chubby is introduced as a lock service to “allow its clients to synchronize their activities and to agree on basic information about their environment”. In the context of distributed applications, this information can control the cooperation between many competing resources. Here are a few examples of when ZooKeeper can be used:

  • An application in which a single worker machine needs to wait on a certain condition to be satisfied before starting or continuing progress. This is an example of the familiar Mutual Exclusion problem just at a larger, more distributed scale. For this scenario, ZooKeeper can be used as a lock to prevent concurrent execution.
  • An application with a master-slave architecture where a single master delegates work to several worker nodes (see HBase, for an example). In this example, we can use ZooKeeper to elect a leader and to keep track of the worker nodes to see “who’s doing what”.

How does ZooKeeper work?

ZooKeeper’s data model is almost identical to a that of a file system. ZooKeeper works with small data nodes (called znodes) which are organized hierarchically into a tree structure.

Logical representation of znodes in ZooKeeper.

In the above example, we have the root node /, under which we have two logical namespaces config and workers. These just serve to separate out the concerns of the application. The config namespace is used for centralized configuration management. In the leaf node, we can store data (upto 1MB), which in our case contains a BLOCK_SIZE property. The workers namespace is used as a name service to locate worker nodes in our system.

ZooKeeper’s nodes come in two flavors: persistent and ephemeral. The difference is simply that a persistent znode can only be deleted through an explicit delete call. Whereas ephemeral nodes are deleted once the client which created it has disconnected. Nodes can also be sequential. This just means that each node is given a unique, monotonically increasing integer which is appended to the path.


Architecture of ZooKeeper. Source: O'Reilly's ZooKeeper by Flavio Junqueira and Benjamin Reed

The above image shows a basic overview of a typical ZooKeeper interaction. Many application processes open sessions to ZooKeeper via a client library (ZooKeeper provides bindings for Java and C). The ensemble of ZooKeeper servers are responsible for serving client requests. Writes are routed through a single leader and are atomically broadcast to followers. Reads can be served from any follower. There is a small, bounded delay from when a write has been requested to when it is available to be read. This is the reason ZooKeeper is considered eventually consistent.

ZooKeeper is built on three main principles:

  • Replication - The data stored in a single znode is replicated across several servers in the ensemble. This ensures that there is not one single point of failure.
  • Order - Each update of data in ZooKeeper is assigned a global version number. Updates from a client are applied in the order in which they were was sent.
  • Speed - ZooKeeper stores its data in-memory, which allows it to achieve high throughput and low latency.


ZooKeeper provides a very simple API for interacting with znodes. ZooKeeper also comes with a command-line interface that is useful for debugging and playing around with the different operations. Here are the main operations used in the bin/

create [path] [data] - Creates a node for the given path and associates some data with it

delete [path] - Deletes a node with the given path

get [path] - Get the data associated with the node at this path

stat [path] - Gets the metadata of a node. This includes information about when the node was created, the version of the data, the length of the data etc. See documentation for more information about the Stat object.

set [path] [data] [version] - Update the data of a given node. This caller can specify the version that it knows about (returned from stat). If the version is not the latest, ZooKeeper rejects this update.

Recipes and Curator

As mentioned earlier, ZooKeeper is used to implement common synchronization primitives for distributed applications. To that end, ZooKeeper provides a few recipes, which are actual implementations of common synchronization mechanisms that a ZooKeeper user might need. Most of the time, an application’s interaction with ZooKeeper would be through these recipes.

In addition, Apache Curator, a library open-sourced by Netflix, provides a wrapper around the already-pretty-simple API of ZooKeeper. Curator provides support for ZooKeeper recipes as well as a fluent interface API that manages the complexity of connecting to ZooKeeper.


Migrating from Lift to Play

Blog post for Rally Health on migrating from Lift to to Play framework Continue reading