fcommon: Fast Identify Common Substrings In A Pair Of Strings
In akin: Functional Utilities for Data Processing

fcommon

R Documentation

Fast Identify Common Substrings In A Pair Of Strings

Description

Checks and identifies substrings that are common to a pair of strings.

Usage

fcommon(x, y)

Arguments

x, y

character, length 1 each: a string, such as a protein chain. y can be missing

Details

This utility identifies common substrings in the x, y pair of strings by isolating sequences of identical characters in both strings which then, are packed into substrings and validated. All one-character substrings are removed. When y is missing, x is cleaved at each letter producing all substrings longer than 2 characters.

Value

A sorted character vector of common substrings of length >= 2 characters each. When y is missing from call, a sorted character vector of valid substrings in x of length >= 2 characters each.

Examples


if (interactive()) {

 # 1. Check for common substrings in the pair below

 x = 'dvvmtqsplslpvtpgepasiscrssqslaktyrvvsvltvlhqdwlngkeykckvv'
 y = 'mtqspltyrvvsvltvlhqdwlngkeykcksnkalpapiektisk'

# 1.1 Common substrings
 system.time(a <- fcommon(x, y))
 print(head(a, 30))

# 1.2 Cleaving (slow on very long strings!)
 system.time(aa <- fcommon(x))
 system.time(bb <- fcommon(y))

# 1.3 Identical results
 A = sort(intersect(aa, bb))                                # common substrings
 identical(a, A)                                            # TRUE

# 2. Different methods for valid substrings

x = 'tyrvvsvltvlhqdwlngkeykck'

# 2.1. Combinations matrix (slower!)
system.time(am <- cover(x, valid. = TRUE))                  # valid substrings

# 2.2 String cleaving
system.time(ac <- fcommon(x))                               # valid substrings

identical(am, ac)                                           # TRUE

}

akin documentation built on May 19, 2026, 5:07 p.m.