digitalpebble
 
         
     

 
  Development of custom GATE plugins and resources for Named Entity Extraction on contracts.
     

 
  Customisation and hosting of Nutch for a vertical crawl.
     

 
  Auditing, redesign and optimisation of a SOLR setup for a real estate search system using geo-location.
     

 
  Whole web crawling using Nutch on Amazon EC2. Development of custom Nutch plugins and resources.
     
    Consulting on GATE for Named Entity Recognition; improvement of the accuracy of the ANNIE application.
     

 
  Port of the RASP application to Apache UIMA. More details can be found on the Resources section.
     

 
  Design of an avanced architecture for a search solution based on Nutch / Lucene. Work on a performance benchmark and optimisation of the results.
Work on Term Extraction and Clustering, Text Classification, Ontology Learning and custom Information Extraction.
     

 
  Strategy review and integration design for mobile content based search engines.
     

 
  Design and implementation of a search solution based on SOLR.
     

 
  Development of custom Nutch plugins and resources. Monitoring of crawls. Deployment and tuning of SOLR instances.
     
 

Implementation of a search functionality based on Lucene and compliant with the OpenSearch standard.

Design and development of Text Classification web service. It is used to identify junk posts from a collection of forum pages indexed with Lucene. This improves the relevance of the search engine results, as these documents tend to rank high due to the repetition of keywords (e.g. product names). The format of the messages used by the service is based on Solr.

     
 

DigitalPebble has co-designed and implemented the full-text search functionalities of Lingway KM, which uses Lucene as a default implementation. and illustrates both its versatility and performance.