get_annotations_by_type.WebAnno_XMI: Import custom annotations from files exported by WebAnno

Description Usage Arguments Details Value Functions Implementation notes Examples

Description

Import custom annotation layers as exported by annotation software WebAnno v3.0.0. in UIMA CAS XMI format.

Usage

1
2
3
4
5
6
7
8
9

Arguments

file

File containing the annotations

type

String containing the name of the XMI attribute that holds the annotation.

zipfile

Zipfile containing the annotations

path

Path containing the annotations

Details

The different functions provide different entry points in the directory structure exported by WebAnno, from the innermost XMI file, to the outermost project folder.

Note the sentence_id, which was encoded as the name of the .txt file input to WebAnno, is not recorded in any of these attributes. I think the 'sofa' (Subject of Analysis) number is just the order of the file within the WebAnno corpus, which is generally a subset of the documents in our overall corpus, since we don't manually annotate them all.

These functions will not work with other WebAnno export formats. They may work with other versions of WebAnno, but this has not been tested.

Value

A dataframe containing all custom annotations visible from the specified entry point, or an empty dataframe (no rows or columns) if no custom annotations are visible.

Functions

Implementation notes

The core functionalilty is contained in get_annotations_by_type.WebAnno_XMI. We extract the data is a slightly hacky way, using XPath expressions based on hardwired knowledge of the layers to be extracted and their representation in XMI. The XPath expressions may need to be tweaked for different WebAnno annotation types (layers). (A more robust approach would be to use the UIMA framework, but would be considerably more complicated. Also, it is implemented in Java, and R bindings were not available at the time of writing.)

Our layer names may differ from the naming in WebAnno. To abstract the code from the names used in WebAnno, the attribute names used for the layers are stored in .annotation_type_attribute_names (not exported) in the parent environment of the functions, rather than hardwired in the code.

The other functions form nested wrappers around this. get_annotations.WebAnno_XMI calls get_annotations_by_type.WebAnno_XMI iteratively over the default types hardwired in in the parent environment of that function (not exported). get_annotations.WebAnno_document and get_annotations.WebAnno_project include doc_id. This is taken from the filename of the unannotated document input to WebAnno, which is preserved in the folder name at an intermediate level in the project directory structure. doc_id can therefore be used to keep track of external identifiers for the texts fed into WebAnno, which are not otherwise known to WebAnno.

TO DO:

Examples

1
2
3
4
5
6
## Not run: sentiments <-
get_annotations_by_type.WebAnno_XMI("temp/webanno/out/admin.xmi", "sentiment")
## End(Not run)
## Not run: topics
<- get_annotations_by_type.WebAnno_XMI("temp/webanno/out/admin.xmi", "topic")
## End(Not run)

petereckley/webannor documentation built on May 25, 2019, 12:48 a.m.