character-utils: Some utility functions to operate on strings

Description Usage Arguments Details Value Author(s) See Also Examples

Description

Some low-level string utilities that operate on ordinary character vectors. For more advanced string manipulations, see the Biostrings package.

Usage

1
2
3
unstrsplit(x, sep="")  # 'sep' default is "" (empty string)

strsplitAsListOfIntegerVectors(x, sep=",")  # 'sep' default is ","

Arguments

x

For unstrsplit: A list-like object where each list element is a character vector, or a character vector (identity).

For strsplitAsListOfIntegerVectors: A character vector where each element is a string containing comma-separated decimal integer values.

sep

A single string containing the separator character. For strsplitAsListOfIntegerVectors, the separator must be a single-byte character.

Details

unstrsplit

unstrsplit(x, sep) is equivalent to (but much faster than) sapply(x, paste0, collapse=sep). It's performing the reverse transformation of strsplit( , fixed=TRUE), that is, if x is a character vector with no NAs and sep a single string, then unstrsplit(strsplit(x, split=sep, fixed=TRUE), sep) is identical to x. A notable exception to this though is when strsplit finds a match at the end of a string, in which case the last element of the output (which should normally be an empty string) is not returned (see ?strsplit for the details).

strsplitAsListOfIntegerVectors

strsplitAsListOfIntegerVectors is similar to the strsplitAsListOfIntegerVectors2 function shown in the Examples section below, except that the former generally raises an error where the latter would have inserted an NA in the returned object. More precisely:

When it fails, strsplitAsListOfIntegerVectors will print an informative error message. Finally, strsplitAsListOfIntegerVectors is faster and uses much less memory than strsplitAsListOfIntegerVectors2.

Value

unstrsplit returns a character vector with one string per list element in x.

strsplitAsListOfIntegerVectors returns a list where each list element is an integer vector. There is one list element per string in x.

Author(s)

Hervé Pagès

See Also

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
## ---------------------------------------------------------------------
## unstrsplit()
## ---------------------------------------------------------------------
x <- list(A=c("abc", "XY"), B=NULL, C=letters[1:4])
unstrsplit(x)
unstrsplit(x, sep=",")
unstrsplit(x, sep=" => ")

data(islands)
x <- names(islands)
y <- strsplit(x, split=" ", fixed=TRUE)
x2 <- unstrsplit(y, sep=" ")
stopifnot(identical(x, x2))

## But...
names(x) <- x
y <- strsplit(x, split="in", fixed=TRUE)
x2 <- unstrsplit(y, sep="in")
y[x != x2]
## In other words: strsplit() behavior sucks :-/

## ---------------------------------------------------------------------
## strsplitAsListOfIntegerVectors()
## ---------------------------------------------------------------------
x <- c("1116,0,-19",
       " +55291 , 2476,",
       "19184,4269,5659,6470,6721,7469,14601",
       "7778889, 426900, -4833,5659,6470,6721,7096",
       "19184 , -99999")

y <- strsplitAsListOfIntegerVectors(x)
y

## In normal situations (i.e. when the input is well-formed),
## strsplitAsListOfIntegerVectors() does actually the same as the
## function below but is more efficient (both in speed and memory
## footprint):
strsplitAsListOfIntegerVectors2 <- function(x, sep=",")
{
    tmp <- strsplit(x, sep, fixed=TRUE)
    lapply(tmp, as.integer)
}
y2 <- strsplitAsListOfIntegerVectors2(x)
stopifnot(identical(y, y2))

AdamLeckenby/S4Vectors_Fix documentation built on May 23, 2019, 2:42 p.m.