tm.plugin.webmining: Retrieve Structured, Textual Data from Various Web Sources

Facilitate text retrieval from feed formats like XML (RSS, ATOM) and JSON. Also direct retrieval from HTML is supported. As most (news) feeds only incorporate small fractions of the original text tm.plugin.webmining even retrieves and extracts the text of the original text source.

Author
Mario Annau [aut, cre]
Date of publication
2015-05-11 00:20:43
Maintainer
Mario Annau <mario.annau@gmail.com>
License
GPL-3
Version
1.3
URLs

View on CRAN

Man pages

corpus.update
Update/Extend 'WebCorpus' with new feed items.
encloseHTML
Enclose Text Content in HTML tags
extract
Extract main content from 'TextDocument's.
extractContentDOM
Extract Main HTML Content from DOM
extractHTMLStrip
Simply strip HTML Tags from Document
feedquery
Buildup string for feedquery.
getEmpty
Retrieve Empty Corpus Elements through '$postFUN'.
getLinkContent
Get main content for corpus items, specified by links.
GoogleFinanceSource
Get feed Meta Data from Google Finance.
GoogleNewsSource
Get feed data from Google News Search <URL:...
nytimes_appid
AppID for the NYtimes-API.
NYTimesSource
Get feed data from NYTimes Article Search (<URL:...
parse
Wrapper/Convenience function to ensure right encoding for...
readWeb
Read content from WebXMLSource/WebHTMLSource/WebJSONSource.
removeNonASCII
Remove non-ASCII characters from Text.
ReutersNewsSource
Get feed data from Reuters News RSS feed channels. Reuters...
source.update
Update WebXMLSource/WebHTMLSource/WebJSONSource
tm.plugin.webmining-package
Retrieve structured, textual data from various web sources
trimWhiteSpaces
Trim White Spaces from Text Document.
WebCorpus
WebCorpus constructor function.
WebSource
Read Web Content and respective Link Content from feedurls.
YahooFinanceSource
Get feed data from Yahoo! Finance.
YahooInplaySource
Get News from Yahoo Inplay.
yahoonews
WebCorpus retrieved from Yahoo! News for the search term...
YahooNewsSource
Get news data from Yahoo! News (<URL:...

Files in this package

tm.plugin.webmining
tm.plugin.webmining/inst
tm.plugin.webmining/inst/NEWS.Rd
tm.plugin.webmining/inst/doc
tm.plugin.webmining/inst/doc/ShortIntro.Rnw
tm.plugin.webmining/inst/doc/ShortIntro.pdf
tm.plugin.webmining/inst/doc/ShortIntro.R
tm.plugin.webmining/tests
tm.plugin.webmining/tests/testthat.R.temp
tm.plugin.webmining/tests/testthat.R
tm.plugin.webmining/tests/testthat
tm.plugin.webmining/tests/testthat/test-source-googlefinance.R
tm.plugin.webmining/tests/testthat/test-source-googlenews.R
tm.plugin.webmining/tests/testthat/test-source-yahoofinance.R
tm.plugin.webmining/tests/testthat/test-source-nytimes.R
tm.plugin.webmining/tests/testthat/test-source-reutersnews.R
tm.plugin.webmining/tests/testthat/test-source-yahooinplay.R
tm.plugin.webmining/tests/testthat/test-source-yahoonews.R
tm.plugin.webmining/NAMESPACE
tm.plugin.webmining/data
tm.plugin.webmining/data/nytimes_appid.rda
tm.plugin.webmining/data/yahoonews.rda
tm.plugin.webmining/R
tm.plugin.webmining/R/getLinkContent.R
tm.plugin.webmining/R/tm.plugin.webmining-package.R
tm.plugin.webmining/R/corpus.R
tm.plugin.webmining/R/feedquery.R
tm.plugin.webmining/R/source.R
tm.plugin.webmining/R/extract.R
tm.plugin.webmining/R/transform.R
tm.plugin.webmining/R/trimWhiteSpaces.R
tm.plugin.webmining/R/reader.R
tm.plugin.webmining/R/parser.R
tm.plugin.webmining/vignettes
tm.plugin.webmining/vignettes/ShortIntro.Rnw
tm.plugin.webmining/vignettes/tables
tm.plugin.webmining/vignettes/tables/sources.tex
tm.plugin.webmining/vignettes/references.bib
tm.plugin.webmining/MD5
tm.plugin.webmining/build
tm.plugin.webmining/build/vignette.rds
tm.plugin.webmining/DESCRIPTION
tm.plugin.webmining/man
tm.plugin.webmining/man/extractHTMLStrip.Rd
tm.plugin.webmining/man/yahoonews.Rd
tm.plugin.webmining/man/trimWhiteSpaces.Rd
tm.plugin.webmining/man/parse.Rd
tm.plugin.webmining/man/tm.plugin.webmining-package.Rd
tm.plugin.webmining/man/WebSource.Rd
tm.plugin.webmining/man/YahooInplaySource.Rd
tm.plugin.webmining/man/WebCorpus.Rd
tm.plugin.webmining/man/ReutersNewsSource.Rd
tm.plugin.webmining/man/GoogleNewsSource.Rd
tm.plugin.webmining/man/getEmpty.Rd
tm.plugin.webmining/man/YahooNewsSource.Rd
tm.plugin.webmining/man/encloseHTML.Rd
tm.plugin.webmining/man/corpus.update.Rd
tm.plugin.webmining/man/readWeb.Rd
tm.plugin.webmining/man/removeNonASCII.Rd
tm.plugin.webmining/man/YahooFinanceSource.Rd
tm.plugin.webmining/man/extractContentDOM.Rd
tm.plugin.webmining/man/source.update.Rd
tm.plugin.webmining/man/feedquery.Rd
tm.plugin.webmining/man/GoogleFinanceSource.Rd
tm.plugin.webmining/man/extract.Rd
tm.plugin.webmining/man/getLinkContent.Rd
tm.plugin.webmining/man/NYTimesSource.Rd
tm.plugin.webmining/man/nytimes_appid.Rd