digitalpebble
 
         
 
Digital Pebble is a consulting company specialised in linguistic engineering, document management, information retrieval and extraction. Our expertise is based on open source solutions, such as Lucene or Gate  

We provide consulting services and custom development for leading open source projects for search or text engineering such as Lucene, SOLR, Nutch, GATE or UIMA.

Apache Lucene is a high-performance, full-featured text search engine library written entirely in Java. It is a technology suitable for nearly any application that requires full-text search, especially cross-platform. Lucene is now considered one of the most successful open source tools and the absolute reference for search engine systems.

Nutch is an open source web-search software. It builds on Lucene Java, adding web-specifics such as a crawler, a link-graph database, parsers for HTML and other document formats. We have contributed a lot to Nutch in recent years and Julien has recently been made one of the committers.

Solr is a high performance search server built using Lucene Java, with XML/HTTP and JSON/Python/Ruby APIs, hit highlighting, faceted search, caching, replication, and a web admin interface. We recently published a review of the book Solr 1.4 Enterprise Search Server from Pakt Publishing.

GATE is one of the most widely used human language processing systems in the world. It is developed and maintained at the University of Sheffield in the Natural Language Processing group. It provides ways to get structured information from unstructured textual data and is the perfect complement to IR tools such as Lucene.

UIMA is an Apache project inherited from IBM, which is comparable to GATE. It is geared towards multimodal analysis, scalabilty and interoperability.

Over the years we have actively contributed to some of the projects above and combined them to build bespoke solutions on numerous occasions. We have a strong focus on very large scale processing and have developed solutions based on Hadoop and deployed them on Amazon EC2.

Our open source project Behemoth allows to facilitate the deployment of GATE or UIMA-based applications over a Hadoop cluster. Why not giving it a try?