tsubstring: tsubstring and transpose the resulting list efficiently

Description Usage Arguments Details Value TODO See Also Examples

Description

substrings & tsubstring work in similar fashion to strsplit & tstrsplit. They are used to split strings, or vectors of strings at a specified point. tsubstring was also designed for optimized use in data.table's, however, its function is to split string columns at a specific indices or widths, instead of using regular expressions.

substrings is a feature extended version of substring that allows for a varity of handling behaviors by way of argument selection.

tsubstr is the basic version of tsubstrings which only calls substr to do it's subsetting.

tsubstring is the basic version of tsubstrings which only calls substring to do it's subsetting. It is slightly faster than tsubstrings, yet has less functionality.

tsubs does the analogously the same thing as the other transpose wrapper functions, above, however, it allows the user to insert the string handling function of your choice.

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
substrings(text, cuts = -1L, widths = FALSE, recycle = FALSE,
  extra = FALSE)

tsubstrings(x, cuts, widths = FALSE, recycle = FALSE, extra = FALSE,
  fill = NA, type.convert = FALSE, give.names = NULL)

tsubstr(x, start, stop, fill = NA, type.convert = FALSE,
  give.names = FALSE)

tsubstring(text, first, last = 1000000L, fill = NA, type.convert = FALSE,
  give.names = FALSE)

tsub(X, FUN, ..., fill = NA, type.convert = FALSE, give.names = FALSE)

Arguments

text

A character vector.

cuts

The vector of indices, or string widths which will be used to cut the character vector into sub-strings. By default, cuts = -1. Any negative integer will cause this function to return an unaltered string.

widths

Default is FALSE. This argument alteres the interpretiation of the cuts vector as widths instead of indices.

recycle

FALSE by default, when TRUE, the cuts sequence – or widths – is repeated until the end of the string.

extra

FALSE by default, when TRUE, any extra characters not contained in the initial cuts sequence will be appended onto the end of the returned list.

x

The vector to split (and transpose), usually a column wrapped in a data.table.

fill

Default is NA. It is used to fill shorter list elements so as to return each element of the transposed result of equal lengths.

type.convert

TRUE calls type.convert with as.is=TRUE on the columns.

give.names

This setting is relevent when tsubstrings is not being used within data.table, and is by default NULL. Column names can be passed into this arguement, as well as TRUE, which will set names with V#, and FALSE, which will force no names to be generated. Additionally, if the arguement cuts is named, the cuts names will be used if give.names is neglected. Like tstrsplit, give.names is FALSE by default for tsubstr.

start

An integer of the first element to be subsetted.

stop

An integer of the last element to be subsetted.

first

An integer of the first element to be subsetted.

last

An integer of the last element to be subsetted.

X

a vector (atomic or list) or an expression object. Other objects (including classed objects) will be coerced by base::as.list.

FUN

the function to be applied to each element of X: see ‘Details’. In the case of functions like +, %*%, the function name must be backquoted or quoted.

...

optional arguments to FUN.

Details

It internally calls substrings first, and then transpose on the result. give.names argument can be used to return an auto named list, although this argument does not have any effect when used with :=, which requires names to be provided explicitly. It might be useful in other scenarios.

It internally calls substr first, and then transpose on the result.

It internally calls substring first, and then transpose on the result.

It internally calls whatever function is loaded into the FUN arguement, and than calls transpose on the result. A this functions core, a call to transpose(lapply(X, FUN, ...)) is being evalutated. This is very simular to just calling apply(X, 1, FUN) in order to apply a function on a row by row basis. I suspect using apply is more efficent in general.

Value

An index split vector of sub-strings.

A transposed list after splitting by the indices provided.

A transposed list after subsetting by start and stop.

A transposed list after subsetting by first and last.

A transposed list after subsetting by the FUN function and it's arguements.

TODO

Adding the "." pacement function like setf

See Also

substring

tstrsplit, transpose

tstrsplit, transpose

tstrsplit, transpose

lapply, apply, transpose

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
txt <- "ABCDEFGJIJKLMN"
cuts <- c(1, 2, 4)
substrings(txt, cuts, widths = FALSE, recycle = FALSE, extra = FALSE)
substrings(txt, cuts, widths = TRUE , recycle = FALSE, extra = FALSE)
substrings(txt, cuts, widths = FALSE, recycle = TRUE , extra = FALSE)
substrings(txt, cuts, widths = TRUE , recycle = TRUE , extra = FALSE)
substrings(txt, cuts, widths = FALSE, recycle = TRUE , extra = TRUE )
substrings(txt, cuts, widths = TRUE , recycle = TRUE , extra = TRUE )
substrings(txt, cuts, widths = FALSE, recycle = FALSE, extra = TRUE )
substrings(txt, cuts, widths = TRUE , recycle = FALSE, extra = TRUE )

cnames <- c("one", "third", "two")
cuts <- setNames(cuts, cnames)
widths = FALSE ; recycle = FALSE ; extra = TRUE ; fill = NA
give.names = NULL #cnames
x <- rep(txt, 3)
DT <- data.table(x = x)
DT[, tsubstrings(x, cuts, give.names = give.names, extra = extra,
                recycle = recycle, widths = widths)]
DT[, tsubstr(x, start = 3, stop = 4)]
DT[, tsubstring(text = x, first = c(1,3,5,7), last = c(2,4,6,8))]
DT[, apply(.SD, MARGIN = 1, FUN = substring,
           first = c(1,3,5,7), last = c(2,4,6,8)),
   .SDc = 'x']
DT[, tsub(X = x, FUN = substring, first = c(1,3,5,7), last = c(2,4,6,8))]

JamesDalrymple/wccmh documentation built on May 7, 2019, 10:20 a.m.