December 09, 2013

Solr

Apache Solr is a enterprise-level HTTP search server built on top of Lucene. Solr performs all the operations of Lucene and provides additional features that are not available in Lucene. Documents (in XML, JSON, CSV or binary format) are added to the index via a POST; Search results (in XML, JSON, CSV or binary format) are returned via a HTTP GET. Here are a few of the feautres Solr provides:

  • Full text search - Allows for complex search queries
  • Logging - Provides logging for debugging and support purposes.
  • Near real time indexing - Search results are updated soon after a document has been indexed.
  • Faceted search - Allows search results to be categorized in sub-groups.
  • Geo-spatial search - searching based on geographic location

SolrCloud

SolrCloud is a distributed version of Solr. It provides distributed indexing and searching for large scale, fault tolerant Solr server. It uses Apache ZooKeeper for cluster configuration and management. When the data being indexed is too large for a single server, SolrCloud breaks it up into shards. Shards are split portions of the entire index and can be distributed across different servers in a cluster. When adding documents to the index, SolrCloud figures out the correct machine to which this shard should belong. SolrCloud provides additional features on top of Solr, including:

  • Automatic failover - if a single node goes down, its index is replicated on a different node using a backup
  • Maintaining consistency - updates to the index must be directed to the correct shard so that the one, consistent view of the document in maintained
  • Automatic shard partitioning - SolrCloud only needs to know the number of shards and it takes care of partitioning the index; it even forwards updates to the index to the correct index
  • Simple Configuration - SolrCloud uses ZooKeeper for configuring the cluster, which centralizes the configuration for the cluster.

Here is a diagram showing the SolrCloud architecture source

Solr Architecture

Elasticsearch vs SolrCloud

Elasticsearch is a another enterprise search engine built on top of Apache Lucene. It is a competitor to SolrCloud; both add features to Lucene and provide an HTTP wrapper around Lucene through which documents can be indexed and searched. Here are a few differences between the Elasticsearch and SolrCloud:

  • Solr uses Zookeeper for cluster configuration, while Elasticsearch uses an internal coordination mechanism for configuration
  • Both ES and SC use the concept of sharding (partitions of Lucene index).
  • Elasticsearch’s uses a JSON query syntax, while Solr uses a simple key/value pair query
  • Elasticsearch’s killer feature is the Percolator. This allows the user to register certain queries to generate an alert when documents are added that match that query. Description from the documentation: “Instead of sending docs, indexing them, and then running queries, one sends queries, registers them, and then sends docs and finds out which queries match that doc.”

See this for more information about the differences between SolrCloud and Elasticsearch.

Scala: The Good Parts

Quick overview of some features of the Scala Programming Language Continue reading

Design Patterns in Real Life

Published on December 05, 2014

Introduction to ZooKeeper

Published on August 16, 2014