Description Usage Arguments Value See Also
View source: R/getLinkContent.R
getLinkContent downloads and extracts content from weblinks for Corpus objects.
Typically it is integrated and called as a post-processing function (field:$postFUN) for most WebSource
objects. getLinkContent implements content download in chunks which has been proven to be a stabler approach for
large content requests.
| 1 2 3 4 5 6 7 | getLinkContent(corpus, links = sapply(corpus, meta, "origin"),
  timeout.request = 30, chunksize = 20, verbose = getOption("verbose"),
  curlOpts = curlOptions(verbose = FALSE, followlocation = TRUE, maxconnects =
  5, maxredirs = 20, timeout = timeout.request, connecttimeout =
  timeout.request, ssl.verifyhost = FALSE, ssl.verifypeer = FALSE, useragent =
  "R", cookiejar = tempfile()), retry.empty = 3, sleep.time = 3,
  extractor = ArticleExtractor, .encoding = integer(), ...)
 | 
| corpus | object of class  | 
| links | character vector specifyinig links to be used for download, defaults to sapply(corpus, meta, "Origin") | 
| timeout.request | timeout (in seconds) to be used for connections/requests, defaults to 30 | 
| chunksize | Size of download chunks to be used for parallel retrieval, defaults to 20 | 
| verbose | Specifies if retrieval info should be printed, defaults to getOption("verbose") | 
| curlOpts | curl options to be passed to  | 
| retry.empty | Specifies number of times empty content sites should be retried, defaults to 3 | 
| sleep.time | Sleep time to be used between chunked download, defaults to 3 (seconds) | 
| extractor | Extractor to be used for content extraction, defaults to extractContentDOM | 
| .encoding | encoding to be used for  | 
| ... | additional parameters to  | 
corpus including downloaded link content
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.