getLinkContent: Get main content for corpus items, specified by links.
In mannau/tm.plugin.webmining: Retrieve Structured, Textual Data from Various Web Sources

Description Usage Arguments Value See Also

getLinkContent downloads and extracts content from weblinks for Corpus objects. Typically it is integrated and called as a post-processing function (field:$postFUN) for most WebSource objects. getLinkContent implements content download in chunks which has been proven to be a stabler approach for large content requests.

getLinkContent(corpus, links = sapply(corpus, meta, "origin"),
  timeout.request = 30, chunksize = 20, verbose = getOption("verbose"),
  curlOpts = curlOptions(verbose = FALSE, followlocation = TRUE, maxconnects =
  5, maxredirs = 20, timeout = timeout.request, connecttimeout =
  timeout.request, ssl.verifyhost = FALSE, ssl.verifypeer = FALSE, useragent =
  "R", cookiejar = tempfile()), retry.empty = 3, sleep.time = 3,
  extractor = ArticleExtractor, .encoding = integer(), ...)

`corpus`	object of class `Corpus` for which link content should be downloaded
`links`	character vector specifyinig links to be used for download, defaults to sapply(corpus, meta, "Origin")
`timeout.request`	timeout (in seconds) to be used for connections/requests, defaults to 30
`chunksize`	Size of download chunks to be used for parallel retrieval, defaults to 20
`verbose`	Specifies if retrieval info should be printed, defaults to getOption("verbose")
`curlOpts`	curl options to be passed to `getURL`
`retry.empty`	Specifies number of times empty content sites should be retried, defaults to 3
`sleep.time`	Sleep time to be used between chunked download, defaults to 3 (seconds)
`extractor`	Extractor to be used for content extraction, defaults to extractContentDOM
`.encoding`	encoding to be used for `getURL`, defaults to integer() (=autodetect)
`...`	additional parameters to `getURL`

corpus including downloaded link content

WebSource getURL Extractor

mannau/tm.plugin.webmining documentation built on May 21, 2019, 11:24 a.m.

mannau/tm.plugin.webmining index

README.md

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

mannau/tm.plugin.webmining
Retrieve Structured, Textual Data from Various Web Sources

getLinkContent: Get main content for corpus items, specified by links.
In mannau/tm.plugin.webmining: Retrieve Structured, Textual Data from Various Web Sources

Description

Usage

Arguments

Value

See Also

Related to getLinkContent in mannau/tm.plugin.webmining...

R Package Documentation

Browse R Packages

We want your feedback!

mannau/tm.plugin.webmining Retrieve Structured, Textual Data from Various Web Sources

getLinkContent: Get main content for corpus items, specified by links. In mannau/tm.plugin.webmining: Retrieve Structured, Textual Data from Various Web Sources

Description

Usage

Arguments

Value

See Also

Related to getLinkContent in mannau/tm.plugin.webmining...

R Package Documentation

Browse R Packages

We want your feedback!

mannau/tm.plugin.webmining
Retrieve Structured, Textual Data from Various Web Sources

getLinkContent: Get main content for corpus items, specified by links.
In mannau/tm.plugin.webmining: Retrieve Structured, Textual Data from Various Web Sources