crm_pdf()
and crm_text()
lose the cache
parameter, which toggled whether or not to use caching. those functions always cache requests now (#37)crm_extract()
gains parameter try_ocr
(logical, default: FALSE
) to optionally try Optical Character Recognition (OCR) with extract pdf pages if the pdf is scanned images. extraction can take a while, but the result is cached, so will be very fast on subsequent requests for the same article (#37)crm_plain()
, crm_xml()
, crm_html()
, and crm_text()
now cache articles as crm_pdf()
has for a while. Along with this change caching is now split into separate folders for pdf, txt (for plain), xml, and html (#17)User-agent
to crm_html()
, crm_pdf()
, crm_plain()
, crm_xml()
, and crm_text()
detailing how users can set a user agent string with the useragent
curl option (#41) (#42)pdf
with pdfdirect
for better access (#40)try_extract_pdf_errors()
to attempt to extract various errors that occur when trying to download and extract text from pdfs (#40)crm_links()
, older url was leading to article landing pages (#6)crm_pdf()
/crm_text()
(with type="pdf"
) - arose from a Cambridge publisher article, hopefully will handle all malformed pdfs (#45)crm_links()
to always include a pdf link even if no returned by Crossref - as almost always probably there is a pdf for every article, but the link just may not have been included in metadata sent to Crossref (#37)?
(as they were all likely query params that we didn't need), but Elsevier gives the content type as a query param. B) some dois that are listed as having a non-Elsevier owner are actually owned by Elsevier now; special handling for those dois. C) (#37)vcr
for tests that write to disk (#34)xml2::xml_find_one
with xml2::xml_find_first
(#32)crm_links()
: fix full text links from Elsevier that have httpss
instead of https
(#30) thanks @njahn82crm_links()
: the fuction wasn't using email header for Crossref polite pool - now it does if you provide your email address, see docs (#31)crm_cache$cache_path_set()
gains ability to set the full cache path directly via its full_path
parameter via an update to package hoardr
(#27)raw
as another parameter in crm_extract()
to allow raw byte extraction from a pdf (#24)crm_links()
to allow filtering on the intended application (#28)crm_cache
for managing cached files, see ?crm_cache
after installation (#19)hoardr
for managing cached files (#19)crm_pdf()
and crm_text()
lose the parameter path
- instead cache
directory managed through crm_cache
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.