View source: R/unifyEnumerator.R
unifyEnumerator | R Documentation |
The aim of this function is to provide help in automatically harmonizing enumerators at the end of sample-names.
When data have same grouped setup/design, many times this is reflected in their names, eg 'A_sample1', 'A_sample2' and 'B_sample1'.
However, human operators may use multiple similar (but not identical) ways of expressing the same meanin, eg writng 'A_Samp_1'.
This function allows testing a panel of different extensions of enumerators and (if recognized) to replace them by a user-defined standard text/enumerator.
Please note that the more recent function rmEnumeratorName
offers better/more flexible options.
unifyEnumerator(
x,
refSep = "_",
baseSep = c("\\-", "\\ ", "\\."),
suplEnu = c("Repl", "Rep", "R", "Number", "No", "Sample", "Samp"),
stringentMatch = TRUE,
silent = FALSE,
debug = FALSE,
callFrom = NULL
)
x |
(character) main input |
refSep |
(character) separator for output |
baseSep |
(character) basic seprators to test (you have to protect special characters) |
suplEnu |
(character) additional text |
stringentMatch |
(logical) decide if enumerator text has to be found in all instances or only once |
silent |
(logical) suppress messages |
debug |
(logical) display additional messages for debugging |
callFrom |
(character) allow easier tracking of messages produced |
This function has been developed for matching series of the same samples passing in parallel through different evaluation software (see R package wrProteo). The way human operators may name things may easily leave room for surprises and this function allows testing only a limited number of common ways of writing. Thus, in any case, the user is advised to inspect the results by eye and - if needed- to adjust the parameters.
Basically enumerator separators can be constructed by combing a base-separator baseSep
(like '-', '_' etc) and an enumerator-abbreviation suplEnu
.
Then, all possible combinations will be tested if they occur in the text x
.
Furthermore, the text searched has to be followd by on or multiple digts at the end of text-entry (decimal comma-separators etc are not allowed).
Thus, if there is other 'free text' following to the right after the enumerator-text this function will not find any enumerators to replace.
The argument stringentMatch
allows defining if this text has to be found in all text-entries of x
or just one of them.
Whe using stringentMatch=FALSE
there is risk that other text not meant to design enumerators may be picked up and modified.
Please note, that with large data-sets (ie many columns) testing/checking a larger panel of enumerator-abreviations may result in slower performance. In cases of larger data-sets it may be more effective to first study the data and then run simple subsitions using sub targeted for this very case.
This function returns a character vector of same length as input x
, with it's content as adjusted enumerators
rmEnumeratorName
for better/more flexible options; grep
or sub()
, etc if exact and consistent patterns are known
unifyEnumerator(c("ab-1","ab-2","c-3"))
unifyEnumerator(c("ab-R1","ab-R2","c-R3"))
unifyEnumerator(c("ab-1","c3-2","dR3"), strin=FALSE);
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.