tm.plugin.webmining: Retrieve Structured, Textual Data from Various Web Sources

Facilitate text retrieval from feed formats like XML (RSS, ATOM) and JSON. Also direct retrieval from HTML is supported. As most (news) feeds only incorporate small fractions of the original text tm.plugin.webmining even retrieves and extracts the text of the original text source.

AuthorMario Annau [aut, cre]
Date of publication2015-05-11 00:20:43
MaintainerMario Annau <mario.annau@gmail.com>
LicenseGPL-3
Version1.3
https://github.com/mannau/tm.plugin.webmining

View on CRAN

Man pages

corpus.update: Update/Extend 'WebCorpus' with new feed items.

encloseHTML: Enclose Text Content in HTML tags

extract: Extract main content from 'TextDocument's.

extractContentDOM: Extract Main HTML Content from DOM

extractHTMLStrip: Simply strip HTML Tags from Document

feedquery: Buildup string for feedquery.

getEmpty: Retrieve Empty Corpus Elements through '$postFUN'.

getLinkContent: Get main content for corpus items, specified by links.

GoogleFinanceSource: Get feed Meta Data from Google Finance.

GoogleNewsSource: Get feed data from Google News Search <URL:...

nytimes_appid: AppID for the NYtimes-API.

NYTimesSource: Get feed data from NYTimes Article Search (<URL:...

parse: Wrapper/Convenience function to ensure right encoding for...

readWeb: Read content from WebXMLSource/WebHTMLSource/WebJSONSource.

removeNonASCII: Remove non-ASCII characters from Text.

ReutersNewsSource: Get feed data from Reuters News RSS feed channels. Reuters...

source.update: Update WebXMLSource/WebHTMLSource/WebJSONSource

tm.plugin.webmining-package: Retrieve structured, textual data from various web sources

trimWhiteSpaces: Trim White Spaces from Text Document.

WebCorpus: WebCorpus constructor function.

WebSource: Read Web Content and respective Link Content from feedurls.

YahooFinanceSource: Get feed data from Yahoo! Finance.

YahooInplaySource: Get News from Yahoo Inplay.

yahoonews: WebCorpus retrieved from Yahoo! News for the search term...

YahooNewsSource: Get news data from Yahoo! News (<URL:...

Files in this package

tm.plugin.webmining
tm.plugin.webmining/inst
tm.plugin.webmining/inst/NEWS.Rd
tm.plugin.webmining/inst/doc
tm.plugin.webmining/inst/doc/ShortIntro.Rnw
tm.plugin.webmining/inst/doc/ShortIntro.pdf
tm.plugin.webmining/inst/doc/ShortIntro.R
tm.plugin.webmining/tests
tm.plugin.webmining/tests/testthat.R.temp
tm.plugin.webmining/tests/testthat.R
tm.plugin.webmining/tests/testthat
tm.plugin.webmining/tests/testthat/test-source-googlefinance.R
tm.plugin.webmining/tests/testthat/test-source-googlenews.R
tm.plugin.webmining/tests/testthat/test-source-yahoofinance.R
tm.plugin.webmining/tests/testthat/test-source-nytimes.R
tm.plugin.webmining/tests/testthat/test-source-reutersnews.R
tm.plugin.webmining/tests/testthat/test-source-yahooinplay.R
tm.plugin.webmining/tests/testthat/test-source-yahoonews.R
tm.plugin.webmining/NAMESPACE
tm.plugin.webmining/data
tm.plugin.webmining/data/nytimes_appid.rda
tm.plugin.webmining/data/yahoonews.rda
tm.plugin.webmining/R
tm.plugin.webmining/R/getLinkContent.R tm.plugin.webmining/R/tm.plugin.webmining-package.R tm.plugin.webmining/R/corpus.R tm.plugin.webmining/R/feedquery.R tm.plugin.webmining/R/source.R tm.plugin.webmining/R/extract.R tm.plugin.webmining/R/transform.R tm.plugin.webmining/R/trimWhiteSpaces.R tm.plugin.webmining/R/reader.R tm.plugin.webmining/R/parser.R
tm.plugin.webmining/vignettes
tm.plugin.webmining/vignettes/ShortIntro.Rnw
tm.plugin.webmining/vignettes/tables
tm.plugin.webmining/vignettes/tables/sources.tex
tm.plugin.webmining/vignettes/references.bib
tm.plugin.webmining/MD5
tm.plugin.webmining/build
tm.plugin.webmining/build/vignette.rds
tm.plugin.webmining/DESCRIPTION
tm.plugin.webmining/man
tm.plugin.webmining/man/extractHTMLStrip.Rd tm.plugin.webmining/man/yahoonews.Rd tm.plugin.webmining/man/trimWhiteSpaces.Rd tm.plugin.webmining/man/parse.Rd tm.plugin.webmining/man/tm.plugin.webmining-package.Rd tm.plugin.webmining/man/WebSource.Rd tm.plugin.webmining/man/YahooInplaySource.Rd tm.plugin.webmining/man/WebCorpus.Rd tm.plugin.webmining/man/ReutersNewsSource.Rd tm.plugin.webmining/man/GoogleNewsSource.Rd tm.plugin.webmining/man/getEmpty.Rd tm.plugin.webmining/man/YahooNewsSource.Rd tm.plugin.webmining/man/encloseHTML.Rd tm.plugin.webmining/man/corpus.update.Rd tm.plugin.webmining/man/readWeb.Rd tm.plugin.webmining/man/removeNonASCII.Rd tm.plugin.webmining/man/YahooFinanceSource.Rd tm.plugin.webmining/man/extractContentDOM.Rd tm.plugin.webmining/man/source.update.Rd tm.plugin.webmining/man/feedquery.Rd tm.plugin.webmining/man/GoogleFinanceSource.Rd tm.plugin.webmining/man/extract.Rd tm.plugin.webmining/man/getLinkContent.Rd tm.plugin.webmining/man/NYTimesSource.Rd tm.plugin.webmining/man/nytimes_appid.Rd

Questions? Problems? Suggestions? or email at ian@mutexlabs.com.

All documentation is copyright its authors; we didn't write any of that.