crm_links: Get full text links from a DOI
In ropenscilabs/crminer: Fetch 'Scholary' Full Text from 'Crossref'

crm_links

R Documentation

Get full text links from a DOI

Description

Get full text links from a DOI

Usage

crm_links(doi, type = "all", ...)

Arguments

`doi`	(character) A Digital Object Identifier (DOI). required.
`type`	(character) One of 'xml', 'html', 'plain', 'pdf', 'unspecified', or 'all' (default). required.
`...`	Named parameters passed on to `crul::HttpClient()`

Details

Note that this function is not vectorized.

Some links returned will not in fact lead you to full text content as you would understandbly think and expect. That is, if you use the filter parameter with e.g., rcrossref::cr_works() and filter to only full text content, some links may actually give back only metadata for an article. Elsevier is perhaps the worst offender, for one because they have a lot of entries in Crossref TDM, but most of the links that are apparently full text are not in fact full text, but only metadata. You can get full text if you are part of a subscribing institution to that specific Elsever content, but otherwise, you're SOL.

Note that there are still some bugs in the data returned form CrossRef. For example, for the publisher eLife, they return a single URL with content-type application/pdf, but the URL is not for a PDF, but for both XML and PDF, and content-type can be set with that URL as either XML or PDF to get that type.

In another example, all Elsevier URLs at time of writing are have http scheme, while those don't actually work, so we have a custom fix in this function for that publisher. Anyway, expect changes...

Value

NULL if no full text links given; a list of tdmurl objects if links found. a tdmurl object is an S3 class wrapped around a simple list, with attributes for:

type: type, matchin type passed to the function
doi: DOI
member: Crossref member ID
intended_application: intended application, e.g., text-mining

Register for the Polite Pool

The crm_links() uses the Crossref API You should send your email address with your crm_links() requests. This has the advantage that queries are placed in the polite pool of servers. In addition, even if the non-polite pool is having server problems, the polite pool is often okay. Including your email address is good practice as described in the Crossref documentation under Good manners. To pass your email address to Crossref, simply store it as an environment variable in .Renviron file like crossref_email=name@example.com, or CROSSREF_EMAIL=name@example.com. Save the file and restart your R session. To stop sharing your email when using rcrossref simply delete it from your .Renviron file OR to temporarily not use your email unset it for the session like Sys.unsetenv('crossref_email'). To be sure your in the polite pool use curl verbose by e.g., crm_links(doi = "10.5555/515151", verbose = TRUE)

Examples

## Not run: 
data(dois_crminer)

# pdf link
crm_links(doi = "10.5555/515151", "pdf")

# xml and plain text links
crm_links(dois_crminer[1], "pdf")
crm_links(dois_crminer[6], "xml")
crm_links(dois_crminer[7], "plain")
crm_links(dois_crminer[1]) # all is the default

# pdf link
crm_links(doi = "10.5555/515151", "pdf")
crm_links(doi = "10.3897/phytokeys.52.5250", "pdf")

# many calls, use e.g., lapply
lapply(dois_crminer[1:3], crm_links)

# elsevier
## DOI that is open acccess
crm_links('10.1016/j.physletb.2010.10.049')
## DOI that is not open acccess
crm_links('10.1006/jeth.1993.1066')

## End(Not run)

ropenscilabs/crminer documentation built on May 18, 2022, 7:36 p.m.