![]() |
||||||||||
|
||||||||||
|
RASP4UIMA DigitalPebble has ported the RASP system to Apache UIMA. RASP is a domain-independent, robust parsing system for English. For ease of installation, the system is distributed in the form of binaries for 3 widespread unix architectures (Intel-32bit and -64bit/Linux, and Sparc/Solaris), with source code for most of the modules. It is free for research purposes. RASP was originally developed on a UK EPSRC-funded project. Since the end of that project it has continued to be extended and enhanced on an on-going basis. An informal description of the RASP system is online, with examples of system output. The first public release of the system was in January 2002; the second release (RASPv2) is now available. To obtain it, go to the RASP licence and download page. RASP4UIMA wraps the NLP modules of RASP (Sentence Parser, Tokenizer, Part of Speech Tagger, Morphological Analyser and Dependency Parser) as UIMA Analysis Engines.Version: 1.1 beta Download: rasp4uima1.1.pear Documentation: (html) |
|
GATE Toolbox DigitalPebble's GATE Toolbox is a collection of Processing Resources for GATE. It contains the following components: SentenceSplitter based on JavaCC. It differs from the default GATE component by:
Download: Toolbox.tar.gz License: LGPL |
|
RASP2 plugin for GATE DigitalPebble has ported the RASP system to GATE. The RASP plugin wraps the NLP modules of RASP (Tokenizer, Part of Speech Tagger, Morphological Analyser and Dependency Parser) as individual GATE Processing Resources, which allows them to be easily replaced or combined with existing GATE PRs. This component is part of the standard distribution of GATE. |
|
Java API for Web-1T Corpus The Web 1T 5-gram corpus contains n-grams from unigrams through to 5-grams compiled from counts on a one trillion word corpus. It is distributed by the Linguistic Data Consortium for researchers. We have developed a Java API which allows to query the Web 1t corpus (or any corpus at a similar format). Unlike Get1T, our API allows allows on-the-fly queries of the full set of Web 1T n-grams - even on a machine with modest hardware. The API also helps creating n-gram corpora from other sources (Lucene indices, BNC corpus). Contact us for more details and the terms of use. |