rmEnumeratorName: Remove or rename enumerator tag/name (or remove entire...

View source: R/rmEnumeratorName.R

rmEnumeratorNameR Documentation

Remove or rename enumerator tag/name (or remove entire enumerator) from tailing enumerators

Description

This function allows indentifying, removing or renaming enumerator tag/name (or remove entire enumerator) from tailing enumerators (eg 'abc_No1' to 'abc_1'). A panel of potential candidates as combination of separator-symbols and separtor text/words will be tested to find if one matches all data. In case the main input is a matrix, all columns will be tested independently to find the first column where one specific combination of separator-symbols and separtor text/words is found. Several options exist for the output, the combination of separator-symbols and separtor text/words may be included, too.

Usage

rmEnumeratorName(
  dat,
  nameEnum = c("Number", "No", "#", "Replicate", "Sample"),
  sepEnum = c(" ", "-", "_"),
  newSep = "",
  incl = c("anyCase", "trim2"),
  silent = FALSE,
  debug = FALSE,
  callFrom = NULL
)

Arguments

dat

(character vecor or matrix) main input

nameEnum

(character) potential enumerator-names

sepEnum

(character) potential separators for enumerator-names

newSep

(character) potential enumerator-names

incl

(character) options to include further variants of the enumerator-names, use "rmEnum" for completely removing enumerator tag/name and digits for differentr options of trimming names/tags from nameEnum one may use anyCase, trim3 (trimming down to max 3 letters), trim2 (trimming to max 2 letters) or trim1 (trimming down to single letter); trim0 works like trim1 but also includes ' ', ie no enumerator tag/name in front of the digit(s)

silent

(logical) suppress messages

debug

(logical) display additional messages for debugging

callFrom

(character) allow easier tracking of messages produced

Details

Please note, that checking a variety of different separator text-word and separator-symbols may give an important number of combinations to check. In particular, when automatic trimming of separator text-words is added (eg incl="trim2"), the complexity of associated searches increases quickly. Thus, with large data-sets restricting the content of the arguments nameEnum, sepEnum and (in particular) newSep to the most probable terms/options is suggested to help reducing demands on memory and CPU.

In case the input dat is a matrix and multiple different numerator-types are found, only the first colum (from the left) will be treated. If you which to remove/subsitute mutiple types of enumerators the function rmEnumeratorName must be run independently, see last example below.

Value

This function returns a corrected vector (or matrix), or a list if incl="rmEnumL" containing $dat (corrected data), $pattern (the combination of separator-symbols and separtor text/words found), and if input is matrix $column (which column of the input was identified and treated)

See Also

when the exact pattern is known grep and sub may allow direct manipulations much faster

Examples

xx <- c("hg_Re1","hjRe2_Re2","hk-Re3_Re33")
rmEnumeratorName(xx)
rmEnumeratorName(xx, newSep="--")
rmEnumeratorName(xx, incl="anyCase")

xy <- cbind(a=11:13, b=c("11#11","2_No2","333_samp333"), c=xx)
rmEnumeratorName(xy)
rmEnumeratorName(xy,incl=c("anyCase","trim2","rmEnumL"))

xz <- cbind(a=11:13, b=c("23#11","4#2","567#333"), c=xx)
apply(xz, 2, rmEnumeratorName, sepEnum=c("","_"), newSep="_", silent=TRUE)


wrMisc documentation built on Sept. 11, 2024, 6:10 p.m.