View source: R/rmEnumeratorName.R
rmEnumeratorName | R Documentation |
This function allows indentifying, removing or renaming enumerator tag/name (or remove entire enumerator) from tailing enumerators (eg 'abc_No1' to 'abc_1'). A panel of potential candidates as combination of separator-symbols and separtor text/words will be tested to find if one matches all data. In case the main input is a matrix, all columns will be tested independently to find the first column where one specific combination of separator-symbols and separtor text/words is found. Several options exist for the output, the combination of separator-symbols and separtor text/words may be included, too.
rmEnumeratorName(
dat,
nameEnum = c("Number", "No", "#", "Replicate", "Sample"),
sepEnum = c(" ", "-", "_"),
newSep = "",
incl = c("anyCase", "trim2"),
silent = FALSE,
debug = FALSE,
callFrom = NULL
)
dat |
(character vecor or matrix) main input |
nameEnum |
(character) potential enumerator-names |
sepEnum |
(character) potential separators for enumerator-names |
newSep |
(character) potential enumerator-names |
incl |
(character) options to include further variants of the enumerator-names, use |
silent |
(logical) suppress messages |
debug |
(logical) display additional messages for debugging |
callFrom |
(character) allow easier tracking of messages produced |
Please note, that checking a variety of different separator text-word and separator-symbols may give an important number of combinations to check.
In particular, when automatic trimming of separator text-words is added (eg incl="trim2"
), the complexity of associated searches increases quickly.
Thus, with large data-sets restricting the content of the arguments nameEnum
, sepEnum
and (in particular) newSep
to the most probable terms/options
is suggested to help reducing demands on memory and CPU.
In case the input dat
is a matrix and multiple different numerator-types are found, only the first colum (from the left) will be treated.
If you which to remove/subsitute mutiple types of enumerators the function rmEnumeratorName
must be run independently, see last example below.
This function returns a corrected vector (or matrix), or a list if incl="rmEnumL"
containing $dat (corrected data),
$pattern (the combination of separator-symbols and separtor text/words found), and if input is matrix $column (which column of the input was identified and treated)
when the exact pattern is known grep
and sub
may allow direct manipulations much faster
xx <- c("hg_Re1","hjRe2_Re2","hk-Re3_Re33")
rmEnumeratorName(xx)
rmEnumeratorName(xx, newSep="--")
rmEnumeratorName(xx, incl="anyCase")
xy <- cbind(a=11:13, b=c("11#11","2_No2","333_samp333"), c=xx)
rmEnumeratorName(xy)
rmEnumeratorName(xy,incl=c("anyCase","trim2","rmEnumL"))
xz <- cbind(a=11:13, b=c("23#11","4#2","567#333"), c=xx)
apply(xz, 2, rmEnumeratorName, sepEnum=c("","_"), newSep="_", silent=TRUE)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.