Retrieve Structured, Textual Data from Various Web Sources

corpus.updateUpdate/Extend 'WebCorpus' with new feed items.
encloseHTMLEnclose Text Content in HTML tags
extractExtract main content from 'TextDocument's.
extractContentDOMExtract Main HTML Content from DOM
extractHTMLStripSimply strip HTML Tags from Document
feedqueryBuildup string for feedquery.
getEmptyRetrieve Empty Corpus Elements through '$postFUN'.
getLinkContentGet main content for corpus items, specified by links.
GoogleFinanceSourceGet feed Meta Data from Google Finance.
GoogleNewsSourceGet feed data from Google News Search <URL:...
nytimes_appidAppID for the NYtimes-API.
NYTimesSourceGet feed data from NYTimes Article Search (<URL:...
parseWrapper/Convenience function to ensure right encoding for...
readWebRead content from WebXMLSource/WebHTMLSource/WebJSONSource.
removeNonASCIIRemove non-ASCII characters from Text.
ReutersNewsSourceGet feed data from Reuters News RSS feed channels. Reuters...
source.updateUpdate WebXMLSource/WebHTMLSource/WebJSONSource
tm.plugin.webmining-packageRetrieve structured, textual data from various web sources
trimWhiteSpacesTrim White Spaces from Text Document.
WebCorpusWebCorpus constructor function.
WebSourceRead Web Content and respective Link Content from feedurls.
YahooFinanceSourceGet feed data from Yahoo! Finance.
YahooInplaySourceGet News from Yahoo Inplay.
yahoonewsWebCorpus retrieved from Yahoo! News for the search term...
YahooNewsSourceGet news data from Yahoo! News (<URL:...
