R/EuropresseSource.R

Defines functions getElem.EuropresseSource EuropresseSource

Documented in EuropresseSource getElem.EuropresseSource

EuropresseSource <- function(x, encoding = "UTF-8") {
    tree <- htmlParse(x, encoding=encoding)

    content <- getNodeSet(tree, "/html/body/article")
    reader <- readEuropresseHTML2

    # Old format
    if(length(content) == 0) {
        content <- getNodeSet(tree, "/html/body/table/tbody/tr/td")

        # Some HTML files do not have <tbody> (depending on the browser?)
        if(length(content) == 0)
            content <- getNodeSet(tree, "/html/body/table/tr/td")
        reader <- readEuropresseHTML1
    }

    free(tree)

    SimpleSource(encoding, length(content),
                 content=content, uri=x,
                 reader=reader, class="EuropresseSource")
}

# This functions is the same as that for XMLSource
getElem.EuropresseSource <- function(x) list(content = saveXML(x$content[[x$position]]), uri = x$URI)

Try the tm.plugin.europresse package in your browser

Any scripts or data that you put into this service are public.

tm.plugin.europresse documentation built on May 29, 2017, 11:01 a.m.