shortest_unique_abbreviation: Find the shortest abbrevation to retain unique values
In jmw86069/jamses: Jam SummarizedExperiment Stats

shortest_unique_abbreviation

R Documentation

Find the shortest abbrevation to retain unique values

Description

Find the shortest abbrevation to retain unique values

Usage

shortest_unique_abbreviation(
  x,
  retain_contig_numbers = TRUE,
  verbose = FALSE,
  ...
)

Arguments

x

character vector

retain_contig_numbers

logical, default TRUE, whether numbers at the end of an abbreviated string should remain contiguous.

When TRUE, the goal is not to split a numeric value in the middle of the number.
When FALSE the string will be abbreviated at the first position of uniqueness.

...

additional arguments are ignored.

Details

This function is intended to abbreviate factor levels used in statistical contrasts to the smallest substring that uniquely represents the unique entries provided in x.

For example, c("one", "two", "three", "four") would be converted to c("on", "tw", "th", "fo").

The default retain_contig_numbers=TRUE will attempt to retain numeric values at the end of a string, to avoid splitting the number at an intermediate position. This option only applies when the character substring is not already unique before encountering the numeric substring.

* For this input:

c("a", "p6", "p12", "p21") the output keeps the contiguous numbers together: c("a", "p6", "p12", "p21")

For this input: c("a", "b6", "c12", "d21") only the first character is retained, because it is already unique: c("a", "b", "c", "d")

Value

character vector named using unique values in x, and whose values are the shortest abbreviated substrings which maintain consistent uniqueness.

Todo

Consider some method to retain contiguous numbers at the end of a long string, while abbreviating the long string.
- For this input: c("adult", "prenatal6", "prenatal12", "prenatal21") the ideal output would be: c("a", "p6", "p12", "p21")
- To be fair, I do not know how to describe this logic. It may required breaking into words by character/non-character breakpoints, then applying substring to each?

Examples

x <- c("a", "p6", "p12", "p21");
shortest_unique_abbreviation(x)

shortest_unique_abbreviation(x, retain_contig_numbers=TRUE)

x1 <- c("male", "female");
shortest_unique_abbreviation(x1)

x2 <- c("Control", "Nicotine");
shortest_unique_abbreviation(x2)

x3 <- c("Control", "Nicotine10", "Nicotine12", "Nicotine20");
shortest_unique_abbreviation(x3)

x4 <- c("one", "two", "three", "four");
shortest_unique_abbreviation(x4)

jmw86069/jamses documentation built on Nov. 4, 2024, 9:25 p.m.