R/FactivaSource.R

Defines functions FactivaSource getElem.FactivaSource

Documented in FactivaSource getElem.FactivaSource

FactivaSource <- function(x, encoding = "UTF-8", format = c("auto", "XML", "HTML")) {
    format <- match.arg(format)

    # XML format
    if(format == "XML" ||
       (format == "auto" && grepl(".(xml|XML)$", x))) {
        XMLSource(x,
                  function(tree) xml_children(xml_children(xml_children(xml_children(xml_ns_strip(tree))))),
                  readFactivaXML)
    }
    # HTML format
    else {
        tree <- read_html(x, encoding=encoding)

        # The full class is "article XXArticle", with XX the language code
        content <- xml_find_all(tree, "//div[starts-with(@class, 'article ')]")

        SimpleSource(encoding, length=length(content),
                     content=content, uri=x,
                     reader=readFactivaHTML, class="FactivaSource")
    }
}

# This function need to be exactly the same as that for XMLSource
# since it can be used with the Factiva XML source
getElem.FactivaSource <- function(x) list(content = x$content[[x$position]], uri = x$uri)

Try the tm.plugin.factiva package in your browser

Any scripts or data that you put into this service are public.

tm.plugin.factiva documentation built on Oct. 30, 2019, 11:23 a.m.