tm.plugin.webmining: Retrieve Structured, Textual Data from Various Web Sources

Facilitate text retrieval from feed formats like XML (RSS, ATOM) and JSON. Also direct retrieval from HTML is supported. As most (news) feeds only incorporate small fractions of the original text tm.plugin.webmining even retrieves and extracts the text of the original text source.

Install the latest version of this package by entering the following in R:
AuthorMario Annau [aut, cre]
Date of publication2015-05-11 00:20:43
MaintainerMario Annau <>

View on CRAN

Man pages

corpus.update: Update/Extend 'WebCorpus' with new feed items.

encloseHTML: Enclose Text Content in HTML tags

extract: Extract main content from 'TextDocument's.

extractContentDOM: Extract Main HTML Content from DOM

extractHTMLStrip: Simply strip HTML Tags from Document

feedquery: Buildup string for feedquery.

getEmpty: Retrieve Empty Corpus Elements through '$postFUN'.

getLinkContent: Get main content for corpus items, specified by links.

GoogleFinanceSource: Get feed Meta Data from Google Finance.

GoogleNewsSource: Get feed data from Google News Search <URL:...

nytimes_appid: AppID for the NYtimes-API.

NYTimesSource: Get feed data from NYTimes Article Search (<URL:...

parse: Wrapper/Convenience function to ensure right encoding for...

readWeb: Read content from WebXMLSource/WebHTMLSource/WebJSONSource.

removeNonASCII: Remove non-ASCII characters from Text.

ReutersNewsSource: Get feed data from Reuters News RSS feed channels. Reuters...

source.update: Update WebXMLSource/WebHTMLSource/WebJSONSource

tm.plugin.webmining-package: Retrieve structured, textual data from various web sources

trimWhiteSpaces: Trim White Spaces from Text Document.

WebCorpus: WebCorpus constructor function.

WebSource: Read Web Content and respective Link Content from feedurls.

YahooFinanceSource: Get feed data from Yahoo! Finance.

YahooInplaySource: Get News from Yahoo Inplay.

yahoonews: WebCorpus retrieved from Yahoo! News for the search term...

YahooNewsSource: Get news data from Yahoo! News (<URL:...


assignValues Man page
calcDensity Man page
corpus.update Man page
corpus.update.WebCorpus Man page
encloseHTML Man page
encloseHTML.character Man page
encloseHTML.PlainTextDocument Man page
extract Man page
extractContentDOM Man page
extractHTMLStrip Man page
extract.PlainTextDocument Man page
feedquery Man page
getEmpty Man page
getEmpty.WebCorpus Man page
getLinkContent Man page
getMainText Man page
GoogleFinanceSource Man page
GoogleNewsSource Man page
json_content Man page
nytimes_appid Man page
NYTimesSource Man page
parse Man page
readGoogle Man page
readNYTimes Man page
readReutersNews Man page
readWeb Man page
readWebHTML Man page
readWebJSON Man page
readWebXML Man page
readYahoo Man page
readYahooHTML Man page
readYahooInplay Man page
removeNonASCII Man page
removeNonASCII.PlainTextDocument Man page
removeTags Man page
ReutersNewsSource Man page
source.update Man page
source.update.WebHTMLSource Man page
source.update.WebJSONSource Man page
source.update.WebXMLSource Man page
tm.plugin.webmining Man page
tm.plugin.webmining-package Man page
trimWhiteSpaces Man page
WebCorpus Man page
webmining Man page
WebSource Man page
YahooFinanceSource Man page
YahooInplaySource Man page
yahoonews Man page
YahooNewsSource Man page

Questions? Problems? Suggestions? or email at

Please suggest features or report bugs with the GitHub issue tracker.

All documentation is copyright its authors; we didn't write any of that.