RASP4UIMA 1.0 beta

Next: Modules           

Overview

RASP4UIMA is an integration of the RASP System into the Apache UIMA framework.

RASP is a domain-independent, robust parsing system for English. For ease of installation, the system is distributed in the form of binaries for 3 widespread unix architectures (Intel-32bit and -64bit/Linux, and Sparc/Solaris). It is free for research purposes. RASP is described in :
Briscoe, E., J. Carroll and R. Watson (2006) The Second Release of the RASP System. In Proceedings of the COLING/ACL 2006 Interactive Presentation Sessions, Sydney, Australia.

UIMA is an Apache project in incubation which provides a component framework for analysing unstructured content such as text, audio and video. It comprises an SDK and tooling for composing and running analytic components written in Java and C++.

Please contact the respective projects for any question related to RASP or UIMA. You can use the DigitalPebble user group for any question specific to RASP4UIMA.

Installation

This version of RASP4UIMA has been tested on Apache UIMA 2.1.0. It is available as a PEAR package and can be installed with the PEAR installer. You will also need to download and install RASP2 from the RASP project page.

Run the PearInstaller (e.g. /usr/local/bin/apache-uima/bin/runPearInstaller.sh). Select the RASP4UIMA pear file and a target directory for the installation. In this manual we assume RASP4UIMA has been installed in /usr/local/bin/RASP4UIMA.




Note:  RASP4UIMA relies on a system environment (rasp.home) to determine where the original RASP executables are located. See $RASP4UIMA/metadata/setenv.txt for more details.

Make sure you specify the location of RASP with -Drasp.home when you call the UIMA executables. For instance, if you want to run your component in the Collection Processing Engine Configurator GUI application, you need to add the environment variables settings from the component's setenv.txt file to the cpeGui.bat (cpeGui.sh) script file in the <UIMA_HOME>/bin directory.
Alternatively you can add this information to the setUimaClassPath script.

Test

Once RASP4UIMA has been installed with the PEAR installer, you can test the installation with the Collection Processing Engine (CPE).  Please refer to the UIMA documentation for more details on the use of this tool. Don't forget to add the variables from setenv.txt to the script (see above).

For this test we'll use the Collection Reader and Xmi Writer CAS Consumer available in the UIMA examples.  These two components are respectively in charge of converting a collection of documents into CASes and serialize the CASes into XML files. Their descriptors can be found in the /examples directory of UIMA.

Click on the button Add of the section Analysis Engines and select the file SentenceSplitter.xml located in the /desc directory of the the RASP4UIMA installation (e.g. /usr/local/bin/RASP4UIMA/desc). Repeat the procedure for the files Tokenizer.xml, POStagger.xml, Morpher.xml and Parser.xml.

You should get something similar to the screenshot below:




Click on the Play button, after a while you should get a summary of the process. You can use the AnnotationViewer of UIMA which takes as input a directory containing XML files at the xmi format and a TypeSystem file. The TypeSystem description for RASP4UIMA is in the file /desc/RASPTypes.xml.

Once you've specified both the input directory and the TypeSystem, click on View and double click on one of the documents of the list. You should get something similar to the screenshot below. More details about the Annotation Types generated by RASP4UIMA can be found in the Modules section.