digitalpebble
 
         
 
Digital Pebble is a consulting company specialized in linguistic engineering, document management, information retrieval and extraction. Our expertise is based on open source solutions, such as Lucene or Gate  

The objective of Information Extraction is to automatically extract structured or semistructured information from unstructured machine-readable documents. The significance of Information Extraction is determined by the growing amount of information available in unstructured (i.e. without metadata) form, for instance on the Internet. This knowledge can be made more accessible by means of transformation into relational form.

Information Extraction techniques are used to create Semantic Annotations, where the values of the annotation refers to entities in an ontology. This can be used to improve an Information Retrieval system by having reasoning capabilities backed by a knowledge base.

Information Extraction is not limited to the Semantic Web though and can be used in a large number of contexts. Opinion Mining for instance can be seen as an IE problem.

DigitalPebble uses and provides consultancy services for the following open source solutions for Information Extraction:

  • GATE is one of the most widely used human language processing systems in the world. It is developed and maintained at the University of Sheffield in the Natural Language Processing group. It provides ways to get structured information from unstructured textual data and is the perfect complement to IR tools such as Lucene.

  • UIMA is an Apache project inherited from IBM, which is comparable to GATE. It is geared towards multimodal analysis, scalabilty and interoperability.