uniqtag | R Documentation |
Abbreviate strings to unique substrings of k
characters.
uniqtag(xs, k = 9, uniq = make_unique_all_or_none, sep = "-")
xs |
a character vector |
k |
the size of the identifier, an integer |
uniq |
a function to make the abbreviations unique, such as make_unique, make_unique_duplicates, make_unique_all_or_none, make_unique_all, make.unique, or to disable this function, identity or NULL |
sep |
a character string used to separate a duplicate string from its sequence number |
For each string in a set of strings, determine a unique tag that is a substring of fixed size k
unique to that string, if it has one. If no such unique substring exists, the least frequent substring is used. If multiple unique substrings exist, the lexicographically smallest substring is used. This lexicographically smallest substring of size k
is called the UniqTag of that string.
The lexicographically smallest substring depend on the locale's sort order.
You may wish to first call Sys.setlocale("LC_COLLATE", "C")
a character vector of the UniqTags of the strings x
abbreviate, locales, make.unique
Sys.setlocale("LC_COLLATE", "C") states <- sub(" ", "", state.name) uniqtags <- uniqtag(states) uniqtags4 <- uniqtag(states, k = 4) uniqtags3 <- uniqtag(states, k = 3) uniqtags3x <- uniqtag(states, k = 3, uniq = make_unique) table(nchar(states)) table(nchar(uniqtags)) table(nchar(uniqtags4)) table(nchar(uniqtags3)) table(nchar(uniqtags3x)) uniqtags3[grep("-", uniqtags3x)]
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.