RASP is a
domain-independent, robust parsing system for English. For
ease of installation, the system is distributed in the form of binaries
for 3 widespread unix architectures (Intel-32bit and -64bit/Linux, and
Sparc/Solaris). It is free for
research purposes. RASP is described in :
Briscoe, E., J. Carroll and R. Watson (2006) The Second Release of the
RASP System. In Proceedings of the COLING/ACL 2006 Interactive
Presentation Sessions, Sydney, Australia.
UIMA is an
Apache project in incubation which provides a component
framework for analysing unstructured content such as text, audio and
video. It comprises an SDK and tooling for composing and running
analytic components written in Java and C++.
Please contact the respective projects for any question related to RASP or UIMA. You can use the DigitalPebble user group for any question specific to RASP4UIMA.
Installation
This version of RASP4UIMA has been tested on Apache UIMA 2.1.0. It is
available as a PEAR package and can be installed with the PEAR installer. You
will also need to download and install RASP2 from the RASP
project page.
Run the PearInstaller
(e.g. /usr/local/bin/apache-uima/bin/runPearInstaller.sh). Select
the RASP4UIMA pear file and a target directory for the installation. In
this manual we assume RASP4UIMA has been
installed in /usr/local/bin/RASP4UIMA.
Note: RASP4UIMA relies on a system environment (rasp.home) to determine where the original RASP executables are located. See $RASP4UIMA/metadata/setenv.txt formore details.
Make sure you specify the location of RASP with -Drasp.home when
you call the UIMA executables. For instance, if you want to run your
component in the Collection Processing Engine Configurator GUI
application, you need to add the environment variables settings from
the component's setenv.txt file to the cpeGui.bat (cpeGui.sh) script
file in the <UIMA_HOME>/bin directory.
Alternatively you can add this information to the setUimaClassPath script.
Test
Once RASP4UIMA has been installed with the PEAR installer, you
can test the installation with the Collection Processing Engine (CPE).
Please refer to the UIMA documentation for more details on the use of
this tool. Don't forget to add the variables from setenv.txt to the script (see above).
For this test we'll use the Collection Reader
and Xmi Writer
CAS Consumer
available in the UIMA examples. These two components are
respectively in charge of converting a collection of documents into
CASes and serialize the CASes into XML files. Their descriptors can be found in the /examples directory of UIMA.
Click on the button Add
of the section Analysis
Engines and select the file SentenceSplitter.xml located
in the /desc
directory of the the RASP4UIMA installation (e.g. /usr/local/bin/RASP4UIMA/desc).
Repeat the procedure for the files Tokenizer.xml, POStagger.xml, Morpher.xml and Parser.xml.
You should get something similar to the screenshot below:
Click on the Play button, after a while you should get a summary of the
process. You can use the AnnotationViewer
of
UIMA which takes as input a directory containing XML files at the xmi
format and a TypeSystem file. The TypeSystem description for
RASP4UIMA is in the file /desc/RASPTypes.xml.
Once you've specified both the input directory and the TypeSystem,
click on View and
double click on one of the documents of the list. You should get
something similar to the screenshot below. More details about the
Annotation Types generated by RASP4UIMA can be found in the Modules
section.