tm.plugin.waterloo: Import articles from the Waterloo Region Record

NewsSource<-function(x, encoding = "UTF-8") {
    # Use filename 'x' to read the file and split it
    # into a character vector called 'content' with one item
    # per document
    content<-readLines(x)
    content<-content[1:(length(content)-6)]
    ###The following are lines of usually superfluous information that are placed between the full text and Title:.  They are annoying because they are not always attached to every article, making them useless as markers to end the full text section via grep in readNews. If we did not delete them, they would end up in the full-text section of every corpus element.  They have to be deleted with the invert argument because grepping those lines that do not match ^Subject: via the -grep argument fails when there are none of these lines in a document. This has happened.
    #content<-   content[grep('^Credit:', content, invert=TRUE)]
    #content<-   content[grep('^Subject:', content, invert=TRUE)]
    #content<-content[grep('^Location:', content, invert=TRUE)]
    #content<-content[grep('^People:', content, invert=TRUE)]
    #content<-content[-grep('^Company/Organization:', content)]
    #content<-content[-grep('Caption:?', content)]
    # content<-content[-grep('________', content)]
    content<-content[grep('.', content)]
    num <- cumsum(grepl("______________", content))
    content<-split(content, num)
    
    s<-tm::SimpleSource(encoding, length = length(content), content = content,
    uri = x, reader = readNews, class = "NewsSource")

    s
}

sjkiss/tm.plugin.waterloo documentation built on May 30, 2019, 8:08 p.m.

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

sjkiss/tm.plugin.waterloo
Import articles from the Waterloo Region Record

R/newsSource.R
In sjkiss/tm.plugin.waterloo: Import articles from the Waterloo Region Record

R Package Documentation

Browse R Packages

We want your feedback!

sjkiss/tm.plugin.waterloo Import articles from the Waterloo Region Record

R/newsSource.R In sjkiss/tm.plugin.waterloo: Import articles from the Waterloo Region Record

R Package Documentation

Browse R Packages

We want your feedback!

sjkiss/tm.plugin.waterloo
Import articles from the Waterloo Region Record

R/newsSource.R
In sjkiss/tm.plugin.waterloo: Import articles from the Waterloo Region Record