cas_checkSum: Check CAS RN validity via checksum method

View source: R/cas_checkSum.R

cas_checkSumR Documentation

Check CAS RN validity via checksum method

Description

For a suspected CAS RN, determine validity by calculating final digit checksum

Usage

cas_checkSum(x, checkLEN = TRUE)

Arguments

x

chr. Input vector of values to check. Standard CAS notation using hyphens is fine, as all non-digit characters are stripped for checksum calculation. Each element of x should contain only one suspected CAS RN to check.

checkLEN

logi. Should the function check that the non-digit characters of x are at least 4, but no more than 10 digits long? Defaults to TRUE.

Details

This function performs a very specific type of check for CAS validity, namely whether the final digit checksum follows the CAS standard. By default, it also ensures that the digit length is compatible with CAS standards. It does nothing more.

This means that there is no check for valid CAS format. Use the cas_detect function to check CAS format beforehand, or write your own function if necessary.

Value

A logical vector of length x denoting whether each x is a valid CAS by the checksum method. NA input values will remain NA.

Note

This is a vectorized, reasonably high-performance version of the is.cas function found in the webchem package. The functionality encompasses only the actual checksum checking of webchem::is.cas; as mentioned in details, use cas_detect to recreate the CAS format + checksum checking in webchem::is.cas. See examples.

Short of looking up against the CAS registry, there is no way to be absolutely sure that even inputs that pass the checksum test are actually registered CAS RNs. The short digit length of CAS IDs combined with the modulo 10 single- digit checksum means that even within a set of randomly generated validly-formatted CAS entities, ~10% will pass checksum.

See Also

Other cas_functions: cas_check(), cas_detect()

Examples

cas_good <- c("71-43-2", "18323-44-9", "7732-18-5") #benzene, clindamycin, water
cas_bad  <- c("61-43-2", "18323-40-9", "7732-18-4") #single digit change from good
cas_checkSum(c(cas_good, cas_bad))

slin30/wzMisc documentation built on Jan. 27, 2023, 1 a.m.