Description Usage Arguments Value See Also
View source: R/getLinkContent.R
getLinkContent
downloads and extracts content from weblinks for Corpus
objects.
Typically it is integrated and called as a post-processing function (field:$postFUN
) for most WebSource
objects. getLinkContent
implements content download in chunks which has been proven to be a stabler approach for
large content requests.
1 2 3 4 5 6 7 | getLinkContent(corpus, links = sapply(corpus, meta, "origin"),
timeout.request = 30, chunksize = 20, verbose = getOption("verbose"),
curlOpts = curlOptions(verbose = FALSE, followlocation = TRUE, maxconnects =
5, maxredirs = 20, timeout = timeout.request, connecttimeout =
timeout.request, ssl.verifyhost = FALSE, ssl.verifypeer = FALSE, useragent =
"R", cookiejar = tempfile()), retry.empty = 3, sleep.time = 3,
extractor = ArticleExtractor, .encoding = integer(), ...)
|
corpus |
object of class |
links |
character vector specifyinig links to be used for download, defaults to sapply(corpus, meta, "Origin") |
timeout.request |
timeout (in seconds) to be used for connections/requests, defaults to 30 |
chunksize |
Size of download chunks to be used for parallel retrieval, defaults to 20 |
verbose |
Specifies if retrieval info should be printed, defaults to getOption("verbose") |
curlOpts |
curl options to be passed to |
retry.empty |
Specifies number of times empty content sites should be retried, defaults to 3 |
sleep.time |
Sleep time to be used between chunked download, defaults to 3 (seconds) |
extractor |
Extractor to be used for content extraction, defaults to extractContentDOM |
.encoding |
encoding to be used for |
... |
additional parameters to |
corpus including downloaded link content
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.