Nothing
crm_pdf()
and crm_text()
lose the cache
parameter, which toggled whether or not to use caching. those functions always cache requests now (#37)crm_extract()
gains parameter try_ocr
(logical, default: FALSE
) to optionally try Optical Character Recognition (OCR) with extract pdf pages if the pdf is scanned images. extraction can take a while, but the result is cached, so will be very fast on subsequent requests for the same article (#37)crm_plain()
, crm_xml()
, crm_html()
, and crm_text()
now cache articles as crm_pdf()
has for a while. Along with this change caching is now split into separate folders for pdf, txt (for plain), xml, and html (#17)User-agent
to crm_html()
, crm_pdf()
, crm_plain()
, crm_xml()
, and crm_text()
detailing how users can set a user agent string with the useragent
curl option (#41) (#42)pdf
with pdfdirect
for better access (#40)try_extract_pdf_errors()
to attempt to extract various errors that occur when trying to download and extract text from pdfs (#40)crm_links()
, older url was leading to article landing pages (#6)crm_pdf()
/crm_text()
(with type="pdf"
) - arose from a Cambridge publisher article, hopefully will handle all malformed pdfs (#45)crm_links()
to always include a pdf link even if no returned by Crossref - as almost always probably there is a pdf for every article, but the link just may not have been included in metadata sent to Crossref (#37)?
(as they were all likely query params that we didn't need), but Elsevier gives the content type as a query param. B) some dois that are listed as having a non-Elsevier owner are actually owned by Elsevier now; special handling for those dois. C) (#37)vcr
for tests that write to disk (#34)xml2::xml_find_one
with xml2::xml_find_first
(#32)crm_links()
: fix full text links from Elsevier that have httpss
instead of https
(#30) thanks @njahn82crm_links()
: the fuction wasn't using email header for Crossref polite pool - now it does if you provide your email address, see docs (#31)crm_cache$cache_path_set()
gains ability to set the full cache path directly via its full_path
parameter via an update to package hoardr
(#27)raw
as another parameter in crm_extract()
to allow raw byte extraction from a pdf (#24)crm_links()
to allow filtering on the intended application (#28)crm_cache
for managing cached files, see ?crm_cache
after installation (#19)hoardr
for managing cached files (#19)crm_pdf()
and crm_text()
lose the parameter path
- instead cache
directory managed through crm_cache
Any scripts or data that you put into this service are public.
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.