crm_plain: Get full plain text

Description Usage Arguments Details User-agent Examples

View source: R/crm_plain.R


Get full plain text


crm_plain(url, overwrite_unspecified = FALSE, ...)



A URL (character) or an object of class tdmurl from a call to crm_links(). If you'll be getting text from the publishers are use Crossref TDM (which requires authentication), we strongly recommend using crm_links() first and passing output of that here, as crm_links() grabs the publisher Crossref member ID, which we use to do authentication and other publisher specific fixes to URLs


(logical) Sometimes the crossref API returns mime type 'unspecified' for the full text links (for some Wiley dois for example). This parameter overrides the mime type to be type.


Named curl options passed on to crul::verb-GET, see curl::curl_options() for available curl options. See especially the User-agent section below


Note that this function is not vectorized. To do many requests use a for/while loop or lapply family calls, or similar.

Note that some links returned will not in fact lead you to full text content as you would understandbly think and expect. That is, if you use the filter parameter with e.g., rcrossref::cr_works() and filter to only full text content, some links may actually give back only metadata for an article. Elsevier is perhaps the worst offender, for one because they have a lot of entries in Crossref TDM, but most of the links that are apparently full text are not in facct full text, but only metadata.

Check out auth for details on authentication.


You can optionally set a user agent string with the curl option useragent, like crm_text("some doi", "pdf", useragent = "foo bar"). user agent strings are sometimes used by servers to decide whether to provide a response (in this case, the full text article). sometimes, a browser like user agent string will make the server happy. by default all requests in this package have a user agent string like libcurl/7.64.1 r-curl/4.3 crul/0.9.0, which is a string with the names and versions of the http clients used under the hood. If you supply a user agent string using the useragent curl option, we'll use it instead. For more information on user agent's, and exmaples of user agent strings you can use here, see


## Not run: 
link <- crm_links("10.1016/j.physletb.2010.10.049", "plain")

# another eg, which requires Crossref TDM authentication, see ?auth
link <- crm_links(dois_elsevier[3], "plain")
# crm_plain(link)

## End(Not run)

crminer documentation built on July 2, 2020, 2:11 a.m.