regex_doi: Find DOIs with regular expressions

Description Usage Arguments Related Functions and Methods See Also Examples

View source: R/doi.R

Description

Find DOIs with regular expressions

Usage

1
2
3
4
5
6
7
regex_doi(type = c("doi.org", "cr-modern"), ...)

doi_patterns(type = c("doi.org", "cr-modern"))

str_extract_doi(string)

str_extract_all_doi(string, type = "doi.org")

Arguments

type

a character string giving the type of validation to run. Implemented as regular expressions (see source code). Must be one these syntax specifications:

  • "doi.org" (recommended) from doi.org, via stack-overflow uses the actual spec, but can cause problems when DOIs are not separated by whitespace or linebreaks, because many other characters are valid DOI and will extracted.

  • "cr-modern" from crossref is less vulnerable to over-extracting, but excludes some DOIs which, while today uncommon are syntactically valid. See examples.

...

Arguments passed on to stringr::regex

pattern

Pattern to modify behaviour.

ignore_case

Should case differences be ignored in the match?

multiline

If TRUE, $ and ^ match the beginning and end of each line. If FALSE, the default, only match the start and end of the input.

comments

If TRUE, white space and comments beginning with # are ignored. Escape literal spaces with \ .

dotall

If TRUE, . will also match line terminators.

string

Input vector. Either a character vector, or something coercible to one.

Related Functions and Methods

Functions

See Also

Other doi: doiEntry, doi_api, doi_examples(), doi_ra, doi(), view_doi_matches()

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
regex_doi("doi.org")
regex_doi("cr-modern")

str_extract_doi(string = c(
  "10.1594/PANGAEA.726855",  # nothing to do here
  "10.1119/1.16433 ",  # remove space
  " 10.1594/PANGAEA.667386", # remove space
  "doi:10.3866/PKU.WHXB201112303", # remove DOI
  "http://dx.doi.org/10.3352/jeehp.2013.10.3", # parse URL
  "10.3972/water973.0145.db&", # remove forbidden symbol
  "foo bar" # no DOI here
))
str_extract_all_doi(string = c(
  # nothing to do here
  "10.17487/rfc1149",
    # space separated
  "10.1016/j.iheduc.2003.11.004 doi:10.7875/leading.author.2.e008",
  # separated by forbidden
  "doi:10.6084/m9.figshare.97218&doi:10.1126/science.169.3946.635 ",
  # separated by linebreak
  "10.5194/wes-2019-70\n10.5194/wes-5-819-202",
  # no DOI here
  "quux"
))

subugoe/biblids documentation built on Dec. 11, 2021, 6:55 a.m.