unifyEnumerator: Unify Enumerators

View source: R/unifyEnumerator.R

unifyEnumeratorR Documentation

Unify Enumerators

Description

The aim of this function is to provide help in automatically harmonizing enumerators at the end of sample-names. When data have same grouped setup/design, many times this is reflected in their names, eg 'A_sample1', 'A_sample2' and 'B_sample1'. However, human operators may use multiple similar (but not identical) ways of expressing the same meanin, eg writng 'A_Samp_1'. This function allows testing a panel of different extensions of enumerators and (if recognized) to replace them by a user-defined standard text/enumerator. Please note that the more recent function rmEnumeratorName offers better/more flexible options.

Usage

unifyEnumerator(
  x,
  refSep = "_",
  baseSep = c("\\-", "\\ ", "\\."),
  suplEnu = c("Repl", "Rep", "R", "Number", "No", "Sample", "Samp"),
  stringentMatch = TRUE,
  silent = FALSE,
  debug = FALSE,
  callFrom = NULL
)

Arguments

x

(character) main input

refSep

(character) separator for output

baseSep

(character) basic seprators to test (you have to protect special characters)

suplEnu

(character) additional text

stringentMatch

(logical) decide if enumerator text has to be found in all instances or only once

silent

(logical) suppress messages

debug

(logical) display additional messages for debugging

callFrom

(character) allow easier tracking of messages produced

Details

This function has been developed for matching series of the same samples passing in parallel through different evaluation software (see R package wrProteo). The way human operators may name things may easily leave room for surprises and this function allows testing only a limited number of common ways of writing. Thus, in any case, the user is advised to inspect the results by eye and - if needed- to adjust the parameters.

Basically enumerator separators can be constructed by combing a base-separator baseSep (like '-', '_' etc) and an enumerator-abbreviation suplEnu. Then, all possible combinations will be tested if they occur in the text x. Furthermore, the text searched has to be followd by on or multiple digts at the end of text-entry (decimal comma-separators etc are not allowed). Thus, if there is other 'free text' following to the right after the enumerator-text this function will not find any enumerators to replace.

The argument stringentMatch allows defining if this text has to be found in all text-entries of x or just one of them. Whe using stringentMatch=FALSE there is risk that other text not meant to design enumerators may be picked up and modified.

Please note, that with large data-sets (ie many columns) testing/checking a larger panel of enumerator-abreviations may result in slower performance. In cases of larger data-sets it may be more effective to first study the data and then run simple subsitions using sub targeted for this very case.

Value

This function returns a character vector of same length as input x, with it's content as adjusted enumerators

See Also

rmEnumeratorName for better/more flexible options; grep or sub(), etc if exact and consistent patterns are known

Examples

unifyEnumerator(c("ab-1","ab-2","c-3"))
unifyEnumerator(c("ab-R1","ab-R2","c-R3"))
unifyEnumerator(c("ab-1","c3-2","dR3"), strin=FALSE);


wrMisc documentation built on Sept. 11, 2024, 6:10 p.m.