Over the past few weeks, I have been working with Apache ZooKeeper for a certain project at work. ZooKeeper is a pretty cool technology and I thought I would document what I’ve learned so far, for others looking for a short introduction and for myself to look back on in the future.

What is ZooKeeper?

ZooKeeper is:

“A centralized service for maintaining configuration information, naming, providing distributed synchronization, and providing group services”.

Informally, ZooKeeper is just a service that distributed systems can rely on for coordination. It allows developers to focus on implementing the core application logic, without having to worry about also implementing any coordination tasks required for the application. In today’s “big data” world, applications are usually individual, independent programs running on separate, physical machines. This sort of distribution further complicates the already-complicated problem of synchronizing shared data and state between processes. ZooKeeper provides a fast, highly available, fault tolerant, and distributed solution to help solve these synchronization issues.

Like a lot of innovative ideas these days, ZooKeeper was originally conceived by Google as Google Chubby. In that paper, Chubby is introduced as a lock service to “allow its clients to synchronize their activities and to agree on basic information about their environment”. In the context of distributed applications, this information can control the cooperation between many competing resources. Here are a few examples of when ZooKeeper can be used:

  • An application in which a single worker machine needs to wait on a certain condition to be satisfied before starting or continuing progress. This is an example of the familiar Mutual Exclusion problem just at a larger, more distributed scale. For this scenario, ZooKeeper can be used as a lock to prevent concurrent execution.
  • An application with a master-slave architecture where a single master delegates work to several worker nodes (see HBase, for an example). In this example, we can use ZooKeeper to elect a leader and to keep track of the worker nodes to see “who’s doing what”.

How does ZooKeeper work?

ZooKeeper’s data model is almost identical to a that of a file system. ZooKeeper works with small data nodes (called znodes) which are organized hierarchically into a tree structure.

Logical representation of znodes in ZooKeeper.

In the above example, we have the root node /, under which we have two logical namespaces config and workers. These just serve to separate out the concerns of the application. The config namespace is used for centralized configuration management. In the leaf node, we can store data (upto 1MB), which in our case contains a BLOCK_SIZE property. The workers namespace is used as a name service to locate worker nodes in our system.

ZooKeeper’s nodes come in two flavors: persistent and ephemeral. The difference is simply that a persistent znode can only be deleted through an explicit delete call. Whereas ephemeral nodes are deleted once the client which created it has disconnected. Nodes can also be sequential. This just means that each node is given a unique, monotonically increasing integer which is appended to the path.

Architecture

Architecture of ZooKeeper. Source: O'Reilly's ZooKeeper by Flavio Junqueira and Benjamin Reed

The above image shows a basic overview of a typical ZooKeeper interaction. Many application processes open sessions to ZooKeeper via a client library (ZooKeeper provides bindings for Java and C). The ensemble of ZooKeeper servers are responsible for serving client requests. Writes are routed through a single leader and are atomically broadcast to followers. Reads can be served from any follower. There is a small, bounded delay from when a write has been requested to when it is available to be read. This is the reason ZooKeeper is considered eventually consistent.

ZooKeeper is built on three main principles:

  • Replication - The data stored in a single znode is replicated across several servers in the ensemble. This ensures that there is not one single point of failure.
  • Order - Each update of data in ZooKeeper is assigned a global version number. Updates from a client are applied in the order in which they were was sent.
  • Speed - ZooKeeper stores its data in-memory, which allows it to achieve high throughput and low latency.

API

ZooKeeper provides a very simple API for interacting with znodes. ZooKeeper also comes with a command-line interface that is useful for debugging and playing around with the different operations. Here are the main operations used in the bin/zkCli.sh:

create [path] [data] - Creates a node for the given path and associates some data with it

delete [path] - Deletes a node with the given path

get [path] - Get the data associated with the node at this path

stat [path] - Gets the metadata of a node. This includes information about when the node was created, the version of the data, the length of the data etc. See documentation for more information about the Stat object.

set [path] [data] [version] - Update the data of a given node. This caller can specify the version that it knows about (returned from stat). If the version is not the latest, ZooKeeper rejects this update.

Recipes and Curator

As mentioned earlier, ZooKeeper is used to implement common synchronization primitives for distributed applications. To that end, ZooKeeper provides a few recipes, which are actual implementations of common synchronization mechanisms that a ZooKeeper user might need. Most of the time, an application’s interaction with ZooKeeper would be through these recipes.

In addition, Apache Curator, a library open-sourced by Netflix, provides a wrapper around the already-pretty-simple API of ZooKeeper. Curator provides support for ZooKeeper recipes as well as a fluent interface API that manages the complexity of connecting to ZooKeeper.

Resources

Introduction to the data

In celebration of Super Bowl Sunday, this post will examine how HBase can be used to model NFL play-by-play data.

We will be using data from the 2002 season through the 2012 season. The source of the data is CSV files provided by the awesome people at Advance NFL Stats. Each file contains data for a single season and the format of the data is as follows:

gameid|qtr|min|gsec|off|def|down|togo|ydline|description|offscore|defscore|season

The gameid column contains the data of the game and the two teams competing. The description column contains a short natural language blurb about a certain play. The rest of the columns are self explanatory. Here is a snippet of a sample file from the 2012 season (which also happens to be Robert Griffin III’s first plays in the NFL):

20120909_WAS@NO,1,-5,0,WAS,NO,,,69,B.Cundiff kicks 65 yards from WAS 35 to end zone Touchback.,0,0,2012
20120909_WAS@NO,1,60,0,NO,WAS,1,10,80,(15:00) D.Brees pass incomplete short left to M.Colston.,0,0,2012
20120909_WAS@NO,1,59,57,NO,WAS,2,10,80,(14:57) D.Brees pass incomplete short left to L.Moore [A.Carriker].,0,0,2012
20120909_WAS@NO,1,59,52,NO,WAS,3,10,80,(14:52) (Shotgun) D.Brees pass incomplete short middle (S.Bowen).,0,0,2012
20120909_WAS@NO,1,59,47,NO,WAS,4,10,80,(14:47) T.Morstead punts 59 yards to WAS 21 Center-J.Drescher. B.Banks pushed ob at WAS 32 for 11 yards (S.Shanle).,0,0,2012
20120909_WAS@NO,1,59,34,WAS,NO,1,10,68,(14:34) (Shotgun) R.Griffin pass short left to P.Garcon to WAS 32 for no gain (C.White). Pass -4 YAC 4,0,0,2012
20120909_WAS@NO,1,58,58,WAS,NO,2,10,68,(13:58) (Shotgun) R.Griffin right end to WAS 44 for 12 yards (S.Shanle).,0,0,2012
20120909_WAS@NO,1,58,25,WAS,NO,1,10,56,(13:25) (Shotgun) R.Griffin pass short left to P.Garcon to NO 44 for 12 yards (R.Harper). Pass -2 YAC 14,0,0,2012
20120909_WAS@NO,1,57,44,WAS,NO,1,10,44,(12:44) (Shotgun) R.Griffin pass short right to P.Garcon to NO 35 for 9 yards (C.Lofton; C.White).,0,0,2012

Why HBase?

In a previous post, we looked at the HBase’s Data Model. One of the killer features built into HBase was the ability to store versioned data. Data that spans a certain time-series could be modeled easily in HBase. NFL play-by-play data fits nicely with this model, since each play of a game can be considered as a time slice that captures the state of a game at that instant. This means that we can replace the timestamp field with an integer from 0 to n - 1, where n is the number of total plays in the game.

Another reason for using HBase is its innate ability to scale for large amounts of data. We will be storing data for 11 seasons each with 269 games (regular season and playoffs) with around 175 plays for each game. This totals to around 517,825 data points that we will currently be storing. This may not seem like much, but consider that if we start adding data from games before 2002 and we continue to add data up to the current season, we will soon be glad that we chose a scalable data store like HBase.

Schema

We won’t spend too much discussing the HBase schema decisions. The HBase documentation does a good job of explaining the different schema design considerations. We will be using gameid (see above) as the row key. This will allow us to store data for each game in a single row. We will use a single column family called ‘play’. Our columns will be the different attributes of a single play (for example: yard line, offense score). As discussed before, we will use a play number as the timestamp for each column. Here is a diagram that more succinctly describes our schema:

Our schema for modeling NFL plays in HBase.

Table creation & Data import

Now that we have looked at the data we will be storing and the schema we will follow, lets finally create our table, and put some data in it. There are several APIs and libraries for different languages that allow us to interact with HBase. We will mostly be using the HBase shell, which is a JRuby IRB-based shell that allows us to run simple data management and manipulation commands. To import data from the CSV files into HBase, we will use JRuby, which is the “Ruby Language on the JVM”.

Let’s start by creating the table. The below command creates a new table called ‘nflplays’, with a single column family called ‘play’ and the specification that we want to store ALL_VERSIONS (2147483647) of a column. This may seem like overkill, we should never really have more than ~300 plays in a game. But I am doing this to illustrate how we can use constants from HConstants when defining table characteristics.

create 'nflplays', {NAME=> 'play', VERSIONS=>org.apache.hadoop.hbase.HConstants::ALL_VERSIONS}

Next, we use the describe command to verify that the table we just created. We can see a few of the characteristics that this table contains. The ‘ENABLED’

hbase(main):002:0> describe 'nflplays'
DESCRIPTION                                                                                                                                              ENABLED                                                                             
 'nflplays', {NAME => 'play', DATA_BLOCK_ENCODING => 'NONE', BLOOMFILTER => 'NONE', REPLICATION_SCOPE => '0', VERSIONS => '2147483647', COMPRESSION => ' true                                                                                
 NONE', MIN_VERSIONS => '0', TTL => '2147483647', KEEP_DELETED_CELLS => 'false', BLOCKSIZE => '65536', IN_MEMORY => 'false', ENCODE_ON_DISK => 'true', B                                                                                     
 LOCKCACHE => 'true'}                                                                                                                                                                                                                        
1 row(s) in 1.5100 seconds

Now that we have our table created, we can finally start importing the data from the CSV files into HBase. For this, I’ve written up this crude script in JRuby that goes through each line and parses out the different elements. Here it is:

include Java
import org.apache.hadoop.hbase.client.HTable
import org.apache.hadoop.hbase.client.Put
import javax.xml.stream.XMLStreamConstants

def jbytes( *args )
  args.map { |arg| arg.to_s.to_java_bytes }
end

def parse_row(row)
  map = {}
  values = row.split(',')
  map['gameid'] = values[0]
  map['qtr'] = values[1]
  map['min'] = values[2]
  map['sec'] = values[3]
  map['off'] = values[4]
  map['def'] = values[5]
  map['down'] = values[6]
  map['togo'] = values[7]
  map['ydline'] = values[8]
  #The csv file contains some weird characters at the Game over plays, this little hack is to avoid parsing that
  map['description'] = (map['qtr'].to_i >= 4 and map['min'].to_i == 0 and map['down'].empty? and map['togo'].empty?) ? "Game Over" : values[9]
  map['offscore'] = values[10]
  map['defscore'] = values[11]
  map['season'] = values[12]
  return map
end

def put_into_hbase(document, play_number)
  table = HTable.new(@hbase.configuration, 'nflplays')
  table.setAutoFlush(false)
  document.each do |key, value|
    if !value.empty?
      rowkey = document['gameid'].to_java_bytes
      ts = play_number

      p = Put.new(rowkey, ts)
      p.add(*jbytes("play", key, value))
      table.put(p)

      puts play_number.to_s + ":" + key.to_s + ":" + value.to_s
    end
  end
  table.flushCommits()
end

count = 1
seen_before = {}
File.open('2012.csv', 'r:utf-8').each_line do |row|
  data = parse_row(row.strip!)
  if !seen_before.has_key?(data['gameid'])
    count = 1
    seen_before[data['gameid']] = true
  end

  put_into_hbase(data, count)
  count += 1
end
exit

A few things to notice:

  • With JRuby, we can import classes from Java using familiar import statements
  • Notice that each record is inserted with a Put object.
  • We set autoFlush to false, this buffers Puts with a client-side write buffer. We force the flush of the buffer using flushCommits(). See the documentation for more information.

After running this script, we can view some of the data that has been inserted into HBase. First we can count the number of rows we have inserted (I’ve was running HBase on a micro EC2 machine, and I did not want to strain my already scarce resources so I only imported the first 8 games from 2012.

hbase(main):009:0> count 'nflplays'
8 row(s) in 0.0260 seconds

We can scan all the rows that we have imported using the ‘scan’ command in the HBase shell. Here is a snippet of the output I get:

ROW                                                          COLUMN+CELL                                                                                                                                                                     
 20120909_STL@DET                                            column=play:def, timestamp=158, value=DET                                                                                                                                       
 20120909_STL@DET                                            column=play:defscore, timestamp=158, value=27                                                                                                                                   
 20120909_STL@DET                                            column=play:description, timestamp=158, value=Game Over                                                                                                                         
 20120909_STL@DET                                            column=play:down, timestamp=157, value=1                                                                                                                                        
 20120909_STL@DET                                            column=play:gameid, timestamp=158, value=20120909_STL@DET                                                                                                                       
 20120909_STL@DET                                            column=play:min, timestamp=158, value=0                                                                                                                                         
 20120909_STL@DET                                            column=play:off, timestamp=158, value=STL                                                                                                                                       
 20120909_STL@DET                                            column=play:offscore, timestamp=158, value=23                                                                                                                                   
 20120909_STL@DET                                            column=play:qtr, timestamp=158, value=4                                                                                                                                         
 20120909_STL@DET                                            column=play:season, timestamp=158, value=2012                                                                                                                                   
 20120909_STL@DET                                            column=play:sec, timestamp=158, value=2                                                                                                                                         
 20120909_STL@DET                                            column=play:togo, timestamp=157, value=10                                                                                                                                       
 20120909_STL@DET                                            column=play:ydline, timestamp=158, value=61                                                                                                                                     
 20120909_WAS@NO                                             column=play:def, timestamp=190, value=WAS                                                                                                                                       
 20120909_WAS@NO                                             column=play:defscore, timestamp=190, value=40                                                                                                                                   
 20120909_WAS@NO                                             column=play:description, timestamp=190, value=Game Over                                                                                                                         
 20120909_WAS@NO                                             column=play:down, timestamp=189, value=2                                                                                                                                        
 20120909_WAS@NO                                             column=play:gameid, timestamp=190, value=20120909_WAS@NO                                                                                                                        
 20120909_WAS@NO                                             column=play:min, timestamp=190, value=0                                                                                                                                         
 20120909_WAS@NO                                             column=play:off, timestamp=190, value=NO                                                                                                                                        
 20120909_WAS@NO                                             column=play:offscore, timestamp=190, value=32                                                                                                                                   
 20120909_WAS@NO                                             column=play:qtr, timestamp=190, value=4                                                                                                                                         
 20120909_WAS@NO                                             column=play:season, timestamp=190, value=2012                                                                                                                                   
 20120909_WAS@NO                                             column=play:sec, timestamp=190, value=1                                                                                                                                         
 20120909_WAS@NO                                             column=play:togo, timestamp=189, value=10                                                                                                                                       
 20120909_WAS@NO                                             column=play:ydline, timestamp=190, value=39

Here we see the last play for two games. Since I didn’t specify how many verisions to return, it only returns the most recent version (last play of each game). We also see each column for a single row. Also notice that the timestamp here is the play number (see above for why this is the case). If we wanted to see only the play descriptions for a single game, we can use this query:

scan 'nflplays', {COLUMNS=>['play:description'], STARTROW => '20120909_STL@DET', ENDROW => '20120909_STL@DET', VERSIONS =>200}</code>

Let’s take a look at the Washington vs New Oreleans game, notice that some columns (like ‘togo’) only has 189 values, whereas the total number of plays is 190. This indicates that for play 190, we did not have a value for ‘togo’. This illustrates the ‘scarce’ property we looked at in the previous post.

Analyzing the data

Now that we have our data in HBase, we can run a few queries on that data. Here is a simple Java program that scans the table for all plays that contain the word ‘TOUCHDOWN’ in the description. You can imagine how we can extend this program to create more complex queries, using different Filters.

public class Touchdowns {
  public static void main(String[] args) throws IOException {
    Configuration config = HBaseConfiguration.create();
    HTable table = new HTable(config, "nflplays");
  
    Scan s = new Scan();
    s.setMaxVersions();
    s.addColumn(Bytes.toBytes("play"), Bytes.toBytes("description"));

    Filter tdFilter = new ValueFilter(CompareFilter.CompareOp.EQUAL, new RegexStringComparator(".*TOUCHDOWN.*"));
    s.setFilter(tdFilter);
 
    ResultScanner scanner = table.getScanner(s);
    try {
      for (Result result : scanner) {
        for (KeyValue kv : result.raw()) {
          System.out.println("KV: " + kv + ", Value: " + Bytes.toString(kv.getValue()));
        }
      }
    } 
    finally {
      scanner.close();
    }
  }
}

What is HBase?

HBase is:

A “sparse, distributed, consistent, multi-dimensional, sorted map”

We will look at what each of these terms mean below. HBase is based on Google’s BigTable and is currently an Apache top-level project. It provides random read/write access to data stored in HDFS (Hadoop Distributed File System). It leverages the capabilities provided by Hadoop and HDFS. In a future post, we will look at the architecture of how HBase stores data. This post will be more of a high-level introduction to the data model used by HBase

We will start by looking at what each of the terms in the above quote mean and understand the data model using terms that we are already familiar with.

“Map”

At its core, HBase is a mapping of keys to values. It serves one of the most basic functions of a data store. It stores values, indexed by a key. It retrieves values, given a key.

“Sorted”

HBase guarantees that each cell of data is stored lexicographically by its key. This allows for fast range queries (for example: we can ask HBase to return all values with keys from k1...k4. In contrast, relational databases provides no such guarantee about the sort order of their values.

“Multi-dimensional”

The key in HBase is actually made up of several parts: row key, column family, column and timestamp. Timestamp is the killer feature of HBase. It provides a way to store several versions of a while, which makes it a good choice for storing data series data. The key-value pair model looks like this now:

(row key, column family, column, timestamp) -> value

“Sparse”

HBase is a sparse data store in that it stores nothing for empty/null values. There is no cell for a column without a value. In HBase, null values are free to store.

“Distributed”

HBase is built for scale. Data stored in HBase can be spread over many physical machines and can store billions of cells. HBase sits on top of the HDFS, which takes care of the distribution and replication of data. In addition to scalability, this “feature” provides protection again node failures.

“Consistent”

HBase is strongly consistent. This means that reads will always return the last written and committed value for a key. HBase guarantees that all changes within the same row are atomic.

Now that we have broken down the canonical definition of HBase, let’s take a look at some of the important terms that describe how data is stored in HBase.

Data Model

Table

The highest level of organization is the Table. This term is similar to the relational definition of the term. We organize logically independent groups of data into Tables. The diagram below shows an empty Table (we will use this diagram to iteratively build our understanding of the different terms.

Logical representation of a Table in HBase. We will build on this representation as we look at more terms.

Rows

Each Table is made up of 1 or more Rows. Rows provide a logical grouping of cells. Row keys are lexicographically sorted. Notice in the diagram below that ‘row-10’ is before ‘row-2’. Row keys can be made up of just bytes, which allows us to use a variety of types of data as the key. Each row will hold the data for a certain entity. The definition of a Row in HBase is similar to its relational counterpart.

Logical representation of a Table with rows in HBase.

Columns

Each Row is made up of 1 or more Columns. Columns are arbitrary labels for attributes of a row. In contrast with RDBMS, columns do not need to be specified in advance. As soon as we PUT (insert) a row into HBase, that column is implicitly created. This allows HBase to be a “semi-structured” database by giving it the flexibility to add columns on the fly, rather than declaring them when the table is initially created.

Logical representation of a Table with rows and columns in HBase. So far, our representation is similar to a relational database's table

Column Family

Columns are grouped into Column Families. They define storage attributes for Columns (compression, # of versions etc). Column Families must be declared when a Table is created and must be printable characters. All elements of a column family are stored together on the File System. It is also important to limit the number of Column Families to a relatively small amount (we will see the reason for this in a future post).

Logical representation of a Table with two column families. col1 belongs to fam1 and columns col2 and col3 belong to the family fam2.

Cells

At the intersection of a Row, Column Family, Column is a Cell. Each cell contains a value and a version (usually a timestamp). HBase allows the client to store many versions of a single cell, so data that spans over a time period can be modeled easily with HBase. Null values are not stored in Cells (see “Sparse” section above).

Logical representation of a Table with 7 non-empty cells. A few cells contain several versions. The large 'DO NOT ENTER' signs represent the fact that no storage space is wasted in storing NULL values. Those cells are not stored in HBase.

Putting it all together

Overall, the data model of HBase is a multi-dimensional key-value store. If you remember one this from this post, it should be this:

(Table, RowKey, Family, Column, Timestamp) -> Value

Or, if you like to think in terms of Java generics:

Representing HBase's Data Model using Java Generics. The arrows associate each data structure with their HBase equivalent, from above.

Coursera Android Course

Happy New Year! One of my goals this year is to (finally) complete a Coursera course from start to finish. I’ve tried this at least 2 times before and have given up because of time-constraints, loss of interest or because it required Emacs. I’ve been looking for a good introduction to Android internals and to gain more than just a API-level understanding of Android. I am hoping this course will help me achieve that goal. Maybe after completing this course, I will understand how Intents work.

In this post, I will go through a quick overiew of a few uses of Google’s Guava library. When I entered the professional world, I used Guava pretty heavily and the following blog post includes some of the notes I took when learning about Guava’s many useful tools.

What is Guava?

Guava is a open-source library for the Java language that provides several utilities that makes a programmer’s job easier. According to Jared Levy (one of the creators of Java), “The library’s functionality simplifies your code so it’s easier to write, read, and maintain”. This wiki page is designed to introduce some useful aspects of Guava and expose some gotchas that you might come across. It includes utilities for Collections, Concurrency, Primitives, Reflection, Comparison, I/O, Hashing, Networking, Strings, Math, In-memory caching, and various basic data types.

Useful utility classes

Preconditions: Used to validate assumptions at the start of methods or constructors (and fail-fast if they aren’t valid)

public Person(Name name) {
  this.name = Preconditions.checkNotNull(name); // throws NullPointerException
}
 
public void eat(Food food) {
  Preconditions.checkArgument(food.isPoisonous(), "Food must not be poisonous"); // throws IllegalArgumentException
  Preconditions.checkState(person.isHungry(), "Person must be running"); // throws IllegalStateException
  ...
}

Objects: simplify writing equals(), toString(), hashCode() for a class

// Objects.equal provides null-sensitive comparison 
Objects.equal("a", "a"); // returns true
Objects.equal(null, "a"); // returns false
Objects.equal("a", null); // returns false
Objects.equal(null, null); // returns true
 
// Objects.hashCode provides a order-sensitive hashCode method for a variable number of args. Uses Arrays.hashCode
Objects.hashCode(person.getName(), person.getAge());
 
// Objects.toStringHelper provides a easy, clean way to implement a class' toString method, overloaded add() methods for each primitive data type
Objects.toStringHelper("Person") // returns Person{age=43}
       .add("age", 43)
       .add("hungry", null)
       .omitNullValues()
       .toString();

NOTE: Java 7 provides a new Objects class that provides similar functionality.

Strings: simplify String joining, splitting, matching

// Joiner provides utility methods to join pieces of text into a String
static final Joiner JOINER = Joiner.on("-").skipNulls();
return JOINER.join("Me", null, "Myself", "I"); // returns "Me-Myself-I"
 
// Splitter provides utility methods to split a given String into pieces of text split by a given separator
static final Splitter SPLITTER = Splitter.on(',').trimResults().omitEmptyStrings();
return SPLITTER.split("me,myself,,    I"); // returns an Iterable of ["me", "myself", "I"]
 
// CharMatcher has several common matchers and methods to [trim|replace|remove|return|collapsing] occurrences of a match in a String
String onlyDigits = CharMatcher.DIGIT.retainFrom(string); // keep only the digits
String noDigits = CharMatcher.JAVA_DIGIT.replaceFrom(string, "*"); // replace all digits with stars

Functional Programming: Functions and Predicates that can be used to simulate first-class functions in Java

// Function<F, T>: Override equals method to provide a one-way transformation of F to T
static final Function<String, Integer> LENGTH = new Function<String, Integer>() {
  public Integer apply(String string) {
    return string.length();
  }
};
 
// Predicate<F>: Override apply method to determine if F is true or false
static final Predicate<String> ALL_CAPS = new Predicate<String>() {
  public boolean apply(String string) {
    return CharMatcher.JAVA_UPPER_CASE.matchesAllOf(string);
  }
};

IntMath/LongMath

Guava provides utility methods for performing int, long, BigInteger operations in conveniently named IntMath, LongMath, and BigIntegerMath classes. IntMath and LongMath provide overflow-checked add, subtract, multiply and divide methods. These methods fail-fast on overflow. Here are some examples of using the utility methods:

int x = 3;
int y = 45;
long l = 21520;
 
MathPreconditions.checkPositive(x); // returns 3
  
IntMath.factorial(x); // returns 6
IntMath.gcd(x, y) // returns 3
IntMath.isPowerOfTwo(x) // returns false
LongMath.mod(l, 10); //returns 0

Guava Pitfalls/Gotchas

  • Joiner/Splitter instances are always immutable:
// This has no effect:
Joiner joiner = Joiner.on(',');
joiner.skipNulls(); // does nothing!
return joiner.join("bad", null, "wrong");
 
 // Do this instead:
 Joiner joiner = Joiner.on(',').skipNulls();
 Guava has several methods that returns views. Views modify underlying collection: