regex_doi: Find DOIs with regular expressions
In subugoe/biblids: Working with bibliometric identifiers

Description Usage Arguments Related Functions and Methods See Also Examples

View source: R/doi.R

Find DOIs with regular expressions

regex_doi(type = c("doi.org", "cr-modern"), ...)

doi_patterns(type = c("doi.org", "cr-modern"))

str_extract_doi(string)

str_extract_all_doi(string, type = "doi.org")

type

a character string giving the type of validation to run. Implemented as regular expressions (see source code). Must be one these syntax specifications:

"doi.org" (recommended) from doi.org, via stack-overflow uses the actual spec, but can cause problems when DOIs are not separated by whitespace or linebreaks, because many other characters are valid DOI and will extracted.
"cr-modern" from crossref is less vulnerable to over-extracting, but excludes some DOIs which, while today uncommon are syntactically valid. See examples.

...

Arguments passed on to stringr::regex

pattern: Pattern to modify behaviour.
ignore_case: Should case differences be ignored in the match?
multiline: If TRUE, $ and ^ match the beginning and end of each line. If FALSE, the default, only match the start and end of the input.
comments: If TRUE, white space and comments beginning with # are ignored. Escape literal spaces with \ .
dotall: If TRUE, . will also match line terminators.

string

Input vector. Either a character vector, or something coercible to one.

Functions

doi_patterns: Find DOI fields with regular expressions

str_extract_doi: Extract first DOIs from character strings

str_extract_all_doi: Extract all DOIs from character strings

Other doi: doiEntry, doi_api, doi_examples(), doi_ra, doi(), view_doi_matches()

regex_doi("doi.org")
regex_doi("cr-modern")

str_extract_doi(string = c(
  "10.1594/PANGAEA.726855",  # nothing to do here
  "10.1119/1.16433 ",  # remove space
  " 10.1594/PANGAEA.667386", # remove space
  "doi:10.3866/PKU.WHXB201112303", # remove DOI
  "http://dx.doi.org/10.3352/jeehp.2013.10.3", # parse URL
  "10.3972/water973.0145.db&", # remove forbidden symbol
  "foo bar" # no DOI here
))
str_extract_all_doi(string = c(
  # nothing to do here
  "10.17487/rfc1149",
    # space separated
  "10.1016/j.iheduc.2003.11.004 doi:10.7875/leading.author.2.e008",
  # separated by forbidden
  "doi:10.6084/m9.figshare.97218&doi:10.1126/science.169.3946.635 ",
  # separated by linebreak
  "10.5194/wes-2019-70\n10.5194/wes-5-819-202",
  # no DOI here
  "quux"
))