Extracting annotations

Load the package

library(xmlAnnotate)

and load up some test data.

folder <- system.file("extdata", "fomc", package = "xmlAnnotate")
dir(folder)

Extract the 'hedge' tags from the first file in that folder

f <- file.path(folder, "2004_03_2-1.xml")
f
ftags <- get_tagset(f)

and take a look

knitr::kable(ftags)

By default this function get hedge tags only. So the call above is equivalent to

ftags <- get_tagset(f, nodes=c('hedge'))

We can have the note tags too, by adding it

ftags2 <- get_tagset(f, nodes=c('hedge', 'note'))

which looks like

knitr::kable(ftags2)

And if we want these tags extracted from all the XML files in a folder

fftags <- get_tagsets(folder, nodes=c('hedge', 'note'))

This rowbinds the results from all the files it finds.

Match Tags

If we want to extract all tags but want to match the word and note tags to the hedge tag based on their positions in the text

fftag <- get_tagset(f, nodes=c('hedge','word', 'note'))

fftag2 <- match_nodes(fftag, match_x = "hedge", match_y = c("word","note"))

which gives you all word and note tags that fall into the span of the respective hedgetags.

knitr::kable(fftag2)

This only works for \code{get_tagset} output meaning for data extracted from a single .xml file. For output from \code{get_tagsets} generated from multiple files you have to apply match_nodes after subsetting for each filename

fftags2 <- plyr::ddply(fftags,~file,match_nodes,match_x="hedge",match_y=c("word","note"))


jwillisch/xmlAnnotate documentation built on May 20, 2019, 6:26 a.m.