tm.plugin.webmining-package: Retrieve structured, textual data from various web sources

Description Author(s) See Also Examples

Description

tm.plugin.webmining facilitates the retrieval of textual data through various web feed formats like XML and JSON. Also direct retrieval from HTML is supported. As most (news) feeds only incorporate small fractions of the original text tm.plugin.webmining goes a step further and even retrieves and extracts the text of the original text source. Generally, the retrieval procedure can be described as a two–step process:

Meta Retrieval

In a first step, all relevant meta feeds are retrieved. From these feeds all relevant meta data items are extracted.

Content Retrieval

In a second step the relevant source content is retrieved. Using the boilerpipeR package even the main content of HTML pages can be extracted.

Author(s)

Mario Annau mario.annau@gmail

See Also

WebCorpus GoogleFinanceSource GoogleNewsSource NYTimesSource ReutersNewsSource YahooFinanceSource YahooInplaySource YahooNewsSource

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
## Not run: 
googlefinance <- WebCorpus(GoogleFinanceSource("NASDAQ:MSFT"))
googlenews <- WebCorpus(GoogleNewsSource("Microsoft"))
nytimes <- WebCorpus(NYTimesSource("Microsoft", appid = nytimes_appid))
reutersnews <- WebCorpus(ReutersNewsSource("businessNews"))
yahoofinance <- WebCorpus(YahooFinanceSource("MSFT"))
yahooinplay <- WebCorpus(YahooInplaySource())
yahoonews <- WebCorpus(YahooNewsSource("Microsoft"))
liberation <- WebCorpus(LiberationSource("latest"))

## End(Not run)

mannau/tm.plugin.webmining documentation built on May 21, 2019, 11:24 a.m.