Import custom annotation layers as exported by annotation software WebAnno v3.0.0. in UIMA CAS XMI format.
1 2 3 4 5 6 7 8 9
File containing the annotations
String containing the name of the XMI attribute that holds the annotation.
Zipfile containing the annotations
Path containing the annotations
The different functions provide different entry points in the directory structure exported by WebAnno, from the innermost XMI file, to the outermost project folder.
Note the sentence_id, which was encoded as the name of the .txt file input to WebAnno, is not recorded in any of these attributes. I think the 'sofa' (Subject of Analysis) number is just the order of the file within the WebAnno corpus, which is generally a subset of the documents in our overall corpus, since we don't manually annotate them all.
These functions will not work with other WebAnno export formats. They may work with other versions of WebAnno, but this has not been tested.
A dataframe containing all custom annotations visible from the specified entry point, or an empty dataframe (no rows or columns) if no custom annotations are visible.
get_annotations_by_type.WebAnno_XMI: Start from the XMI file for a
single annotator, and get all custom annotations of the specified type.
get_annotations.WebAnno_XMI: Start from the XMI file for a
single annotator, and get all custom annotations of types listed in the
.annotation_type_attribute_names contained in this
get_annotations.WebAnno_annotator: Start from the zipfile for a
single annotator of a single document and get all annotations of the
get_annotations.WebAnno_document: Start from the folder for a
single document and all annotations of the default types from all
get_annotations.WebAnno_project: Start from the outer WebAnno
project directory (not zipped) and get all annotations of the default types
for all documents and annotators.
The core functionalilty is contained in
get_annotations_by_type.WebAnno_XMI. We extract the data is a
slightly hacky way, using XPath expressions based on hardwired knowledge of
the layers to be extracted and their representation in XMI. The XPath
expressions may need to be tweaked for different WebAnno annotation types
(layers). (A more robust approach would be to use the UIMA framework, but
would be considerably more complicated. Also, it is implemented in Java,
and R bindings were not available at the time of writing.)
Our layer names may differ from the naming in WebAnno. To abstract the code from the names used in WebAnno, the attribute names used for the layers are stored in .annotation_type_attribute_names (not exported) in the parent environment of the functions, rather than hardwired in the code.
The other functions form nested wrappers around this.
get_annotations_by_type.WebAnno_XMI iteratively over the default
types hardwired in in the parent environment of that function (not
is taken from the filename of the unannotated document input to WebAnno,
which is preserved in the folder name at an intermediate level in the
project directory structure.
doc_id can therefore be used to
keep track of external identifiers for the texts fed into WebAnno, which
are not otherwise known to WebAnno.
Consider if my function naming convention is the most appropriate, given that its not really an S3 generic. I could create classes for the different WebAnno file types, but provide a minimal implementation (essentially just passing through file / path names, without checking if the file / paths are well-formed, though could check existence)
Add examples that aren't subject to copyright.
Write unit tests.
Add error handling, e.g. for bad file | zipfile | path arguments.
Add warning if no the result contains no annotations.
Consider removing the suppressWarnings(), or dealing only with particular types of warnings at that stage, rather than a catch-all.
Separate function logic to process the XML, from providing an XML source, which need not be a file (e.g. a network connection).
Consider whether should use xml2 rather than XML package?
hardwired names for annotation types, from
get_annotations_by_type.WebAnno_XMI, so that function can be called
with the name as appears in the XMI, and a wrapper takes care of converting
that name to our preferred default type name. This will make the function
useable for arbitrary layers without having to edit
Make the element type a parameter, instead of hardwired to "custom:Credibility" to facilitate extraction of other types of annotations.
1 2 3 4 5 6
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.