Description Usage Arguments Details Value Functions Implementation notes Examples
Import custom annotation layers as exported by annotation software WebAnno v3.0.0. in UIMA CAS XMI format.
1 2 3 4 5 6 7 8 9 |
file |
File containing the annotations |
type |
String containing the name of the XMI attribute that holds the annotation. |
zipfile |
Zipfile containing the annotations |
path |
Path containing the annotations |
The different functions provide different entry points in the directory structure exported by WebAnno, from the innermost XMI file, to the outermost project folder.
Note the sentence_id, which was encoded as the name of the .txt file input to WebAnno, is not recorded in any of these attributes. I think the 'sofa' (Subject of Analysis) number is just the order of the file within the WebAnno corpus, which is generally a subset of the documents in our overall corpus, since we don't manually annotate them all.
These functions will not work with other WebAnno export formats. They may work with other versions of WebAnno, but this has not been tested.
A dataframe containing all custom annotations visible from the specified entry point, or an empty dataframe (no rows or columns) if no custom annotations are visible.
get_annotations_by_type.WebAnno_XMI
: Start from the XMI file for a
single annotator, and get all custom annotations of the specified type.
get_annotations.WebAnno_XMI
: Start from the XMI file for a
single annotator, and get all custom annotations of types listed in the
unexported list .annotation_type_attribute_names
contained in this
package.
get_annotations.WebAnno_annotator
: Start from the zipfile for a
single annotator of a single document and get all annotations of the
default types.
get_annotations.WebAnno_document
: Start from the folder for a
single document and all annotations of the default types from all
annotators.
get_annotations.WebAnno_project
: Start from the outer WebAnno
project directory (not zipped) and get all annotations of the default types
for all documents and annotators.
The core functionalilty is contained in
get_annotations_by_type.WebAnno_XMI
. We extract the data is a
slightly hacky way, using XPath expressions based on hardwired knowledge of
the layers to be extracted and their representation in XMI. The XPath
expressions may need to be tweaked for different WebAnno annotation types
(layers). (A more robust approach would be to use the UIMA framework, but
would be considerably more complicated. Also, it is implemented in Java,
and R bindings were not available at the time of writing.)
Our layer names may differ from the naming in WebAnno. To abstract the code from the names used in WebAnno, the attribute names used for the layers are stored in .annotation_type_attribute_names (not exported) in the parent environment of the functions, rather than hardwired in the code.
The other functions form nested wrappers around this.
get_annotations.WebAnno_XMI
calls
get_annotations_by_type.WebAnno_XMI
iteratively over the default
types hardwired in in the parent environment of that function (not
exported). get_annotations.WebAnno_document
and
get_annotations.WebAnno_project
include doc_id
. This
is taken from the filename of the unannotated document input to WebAnno,
which is preserved in the folder name at an intermediate level in the
project directory structure. doc_id
can therefore be used to
keep track of external identifiers for the texts fed into WebAnno, which
are not otherwise known to WebAnno.
TO DO:
Consider if my function naming convention is the most appropriate, given that its not really an S3 generic. I could create classes for the different WebAnno file types, but provide a minimal implementation (essentially just passing through file / path names, without checking if the file / paths are well-formed, though could check existence)
Add examples that aren't subject to copyright.
Write unit tests.
Add error handling, e.g. for bad file | zipfile | path arguments.
Add warning if no the result contains no annotations.
Consider removing the suppressWarnings(), or dealing only with particular types of warnings at that stage, rather than a catch-all.
Separate function logic to process the XML, from providing an XML source, which need not be a file (e.g. a network connection).
Consider whether should use xml2 rather than XML package?
Disentangle our
hardwired names for annotation types, from
get_annotations_by_type.WebAnno_XMI
, so that function can be called
with the name as appears in the XMI, and a wrapper takes care of converting
that name to our preferred default type name. This will make the function
useable for arbitrary layers without having to edit
.annotation_type_attribute_names
.
Make the element type a parameter, instead of hardwired to "custom:Credibility" to facilitate extraction of other types of annotations.
1 2 3 4 5 6 | ## Not run: sentiments <-
get_annotations_by_type.WebAnno_XMI("temp/webanno/out/admin.xmi", "sentiment")
## End(Not run)
## Not run: topics
<- get_annotations_by_type.WebAnno_XMI("temp/webanno/out/admin.xmi", "topic")
## End(Not run)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.