concat.split: Split Concatenated Cells in a Dataset

Description Usage Arguments Details Note Author(s) See Also Examples

Description

The concat.split function takes a column with multiple values, splits the values into a list or into separate columns, and returns a new data.frame or data.table.

Usage

1
2
3
concat.split(data, split.col, sep = ",", structure = "compact",
  mode = NULL, type = NULL, drop = FALSE, fixed = FALSE,
  fill = NA, ...)

Arguments

data

The source data.frame or data.table.

split.col

The variable that needs to be split; can be specified either by the column number or the variable name.

sep

The character separating each value (defaults to ",").

structure

Can be either "compact", "expanded", or list. Defaults to "compact". See Details.

mode

Can be either "binary" or "value" (where "binary" is default and it recodes values to 1 or NA, like Boolean data, but without assuming 0 when data is not available). This setting only applies when structure = "expanded"; a warning message will be issued if used with other structures.

type

Can be either "numeric" or "character" (where "numeric" is default). This setting only applies when structure = "expanded"; a warning message will be issued if used with other structures.

drop

Logical (whether to remove the original variable from the output or not). Defaults to FALSE.

fixed

Is the input for the sep value fixed, or a regular expression? See Details.

fill

The "fill" value for missing values when structure = "expanded". Defaults to NA.

...

Additional arguments to cSplit().

Details

structure

fixed

Note

This is more of a "legacy" or "convenience" wrapper function encompassing the features available in the separated functions of cSplit(), cSplit_l(), and cSplit_e().

Author(s)

Ananda Mahto

See Also

cSplit(), cSplit_l(), cSplit_e()

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
## Load some data
temp <- head(concat.test)

# Split up the second column, selecting by column number
concat.split(temp, 2)

# ... or by name, and drop the offensive first column
concat.split(temp, "Likes", drop = TRUE)

# The "Hates" column uses a different separator
concat.split(temp, "Hates", sep = ";", drop = TRUE)

## Not run: 
# You'll get a warning here, when trying to retain the original values
concat.split(temp, 2, mode = "value", drop = TRUE)

## End(Not run)

# Try again. Notice the differing number of resulting columns
concat.split(temp, 2, structure = "expanded",
mode = "value", type = "numeric", drop = TRUE)

# Let's try splitting some strings... Same syntax
concat.split(temp, 3, drop = TRUE)

# Strings can also be split to binary representations
concat.split(temp, 3, structure = "expanded",
type = "character", fill = 0, drop = TRUE)

# Split up the "Likes column" into a list variable; retain original column
head(concat.split(concat.test, 2, structure = "list", drop = FALSE))

# View the structure of the output to verify
# that the new column is a list; note the
# difference between "Likes" and "Likes_list".
str(concat.split(temp, 2, structure = "list", drop = FALSE))

mrdwab/splitstackshape documentation built on May 23, 2019, 7:16 a.m.