concatMatch: Value Matching With Option For Concatenated Terms

View source: R/concatMatch.R

concatMatchR Documentation

Value Matching With Option For Concatenated Terms

Description

This is a _match()_-like function allowing to serach among concatenated terms/IDs, additional options to remove text pattern like terminal lowercase extesion are available. The function returns a named vector indicating the positions of (first) matches similar to match.

Usage

concatMatch(
  x,
  table,
  sep = ",",
  sepPattern = NULL,
  globalPat = "digitExtension",
  nomatch = NA_integer_,
  incomparables = NULL,
  extensPat = TRUE,
  silent = FALSE,
  debug = FALSE,
  callFrom = NULL
)

Arguments

x

(vector) the values to be matched

table

(vector) the values to be matched against (ie reference)

sep

(character) separator character in case concatenation of entries is tested

sepPattern

(character or NULL) optional custom pattern for splitting concatenations of x) and table) (in case NULL) is not sufficient)

globalPat

(character) pattern for additional trimming of serach-terms. If globalPat="digitExtension" all terminal digits will not be considered when matching

nomatch

(vector) similar to match the value to be returned in the case when no match is found

incomparables

(vector) similar to match, a vector of values that cannot be matched. Any value in x matching a value in this vector is assigned the nomatch value.

extensPat

(logical) similar to match the value to be returned in the case when no match is found

silent

(logical) suppress messages

debug

(logical) additional messages for debugging

callFrom

(character) allow easier tracking of messages produced

Details

The main motivation to create this function was to be able to treat concatenated entries and to look if any of the concatenated values match to 'x'. This function offers additional options for trimming values before running the main comparison.

Of course, the concatenation strategy must be known and only a single concatenation separator (which may be multiple characters long) may be used for both x and match. Thus result will only indicate that at least one of the concatenated terms had a match, but not which one. Finally, both vectors x and table may contain concatenated terms. In this case this function will require much more computational ressources due to the increased combinatorics when comparing larger vectors.

Please note, that in case of multiple to multiple matches, only the first hit gets reported.

The argument globalPat="digitExtension" allows eg reducing 'A1234-4' to 'A1234'.

Value

This function returns a character vector with verified path and file-name(s), returns NULL if nothing

See Also

match (for two simple vectors without concatenated terms), grep

Examples

tab1 <- c("AA","BB-5","CCab","FF")
tab2 <- c("AA","WW,Vde,BB-5,E","CCab","FF,Uef")
x1 <- c("ZZ","YY","AA","BB-2","DD","CCdef","Dxy")            # modif of single ID (no concat)
concatMatch(x1, tab2)
x2 <- c("ZZ,Z","YY,Y","AA,Z,Y","BB-2","DD","X,CCdef","Dxy")  # conatenated in 'x'
concatMatch(x2, tab2)
tab1 <- c("AA","BB-5","CCab","FF")              # no conatenated in 'table'
concatMatch(x2, tab1)                          # simple case of no concat anywhere
concatMatch(x1, tab1)

wrMisc documentation built on Nov. 17, 2023, 5:09 p.m.