simplifyNames: Removes common substrings (infixes) in a set of strings.

View source: R/fcn_misc.R

simplifyNamesR Documentation

Removes common substrings (infixes) in a set of strings.

Description

Usually handy for plots, where condition names should be as concise as possible. E.g. you do not want names like 'TK20130501_H2M1_010_IMU008_CISPLA_E3_R1.raw' and 'TK20130501_H2M1_026_IMU008_CISPLA_E7_R2.raw' but rather 'TK.._010_I.._E3_R1.raw' and 'TK.._026_I.._E7_R2.raw'

If multiple such substrings exist, the algorithm will remove the longest first and iterate a number of times (two by default) to find the second/third etc longest common substring. Each substring must fulfill a minimum length requirement - if its shorter, its not considered worth removing and the iteration is aborted.

Usage

simplifyNames(
  strings,
  infix_iterations = 2,
  min_LCS_length = 7,
  min_out_length = 7
)

Arguments

strings

A vector of strings which are to be shortened

infix_iterations

Number of successive rounds of substring removal

min_LCS_length

Minimum length of the longest common substring (default:7, minimum: 6)

min_out_length

Minimum length of shortest element of output (no shortening will be done which causes output to be shorter than this threshold)

Value

A list of shortened strings, with the same length as the input

Examples

#library(PTXQC)
simplifyNames(c('TK20130501_H2M1_010_IMU008_CISPLA_E3_R1.raw',
                'TK20130501_H2M1_026_IMU008_CISPLA_E7_R2.raw'), infix_iterations = 2)
# --> "TK.._010_I.._E3_R1.raw","TK.._026_I.._E7_R2.raw"

try(simplifyNames(c("bla", "foo"), min_LCS_length=5))
# --> error, since min_LCS_length must be >=6


PTXQC documentation built on July 26, 2023, 5:27 p.m.