cas_detect: Detect if something is (likely) a validly formatted CAS RN

View source: R/cas_detect.R

cas_detectR Documentation

Detect if something is (likely) a validly formatted CAS RN

Description

Simple format check for valid Chemical Abstracts Service Registry Number

Usage

cas_detect(x, preprocess = FALSE, output = c("check", "count", "result"))

Arguments

x

chr. A vector of values to check

preprocess

logi. Trim leading and trailing whitespace and pare all consecutive (-{2,}) to (-)? Defaults to FALSE. Does not carry over to return values, nor modify input in any way. See details.

output

chr. What should the function return? Defaults to check

Details

A quick, though highly imperfect way to determine if something has the basic formatting characteristics of a CAS RN. Checks for CAS basic formatting rules only, i.e.

  • Three sections of digits

  • Each separated by a single hyphen (-)

  • Section 1 length >=2 and <=7

  • Section 2 length ==2

  • Section 3 length ==1

If preprocess=TRUE, note that str_trim is called with default options, i.e. sides = "both", and this is the only whitespace modification performed on the intermediate vector. If you wish to ensure that all whitespaces are removed, you should perform this upstream, i.e. this function will not strip all whitespaces for you.

Value

By default, a logical vector of equal length to x.
If output == "count", an integer vector of length equal to x.
If output == "result", a list of length equal to x containing the extracted result(s) for each element

Note

There is a significant difference between something that looks like a CAS and is a (valid) CAS. This function should have a low false negative rate, i.e. if it flags x as FALSE, you can believe the result. A return of TRUE, however, only means that the input meets the aforementioned criteria (see details).

Use cas_checkSum to determine if TRUE inputs also pass the last-digit checksum check. See examples.

See Also

Other cas_functions: cas_checkSum(), cas_check()

Examples

toCheck <- c("cas rn: 123-45-6", "123-45-6", " 123-45-6", "123--45-6", "123- 45 -6")
cas_detect(toCheck)
cas_detect(toCheck, preprocess = TRUE)
cas_detect(toCheck, output = "result")
cas_detect(toCheck, preprocess = TRUE, output = "result")
cas_detect(gsub("[^\\w-]", "", toCheck, perl = TRUE), preprocess = TRUE, output = "result")

slin30/wzMisc documentation built on Jan. 27, 2023, 1 a.m.