dt.combine: Combines values of partially duplicated columns from a...

dt.combineR Documentation

Combines values of partially duplicated columns from a data.table into new columns.

Description

Combines values of partially duplicated columns from a data.table into new columns.

Usage

dt.combine(DT, col1 = NULL, col2 = NULL, keep.colname = NULL, check.len = TRUE)

Arguments

DT

A data.table resulting from a merge() operation. By default, partially duplicated columns (some values are duplicated at some position of columns, while at other positions NAs are present in only one of the columns) are automatically detected using their colnames suffixes '.x' and '.y', and combined into new columns (thus, reducing the amount of missing values). Original duplicated columns are then removed.

col1

A character specifying a data.table column name that you suspect to be the partial duplicate from the column col2. If col1 is NULL, dt.combine() will look for duplicated columns (Default: col1 = NULL).

col2

A character specifying a data.table column name that you suspect to be the partial duplicate from the column col1. If col2 is NULL, dt.combine() will look for duplicated columns (Default: col2 = NULL).

keep.colname

An integer equals to 1, or 2, or NULL. If equals to 1, the resulting combined column will be named after 'col1'. If equals to 2, the resulting combined column will be named after 'col2'. If NULL, keep.colname is not used for the naming of the resulting combined column (Default: keep.colname = NULL).

check.len

A logical specifying whether the length of each values obtained in the resulting combined column should be checked (Default : check.len = TRUE) or not (check.len = FALSE). If check.len = TRUE and the length of any value is superior to 1, an error message will be returned. It can be useful sometimes to set check.len to FALSE, especially if you know that some values in the columns you want to combine contain whitespaces. In such case, it is advised to set check.len = TRUE.

Value

A data.table with duplicated columns removed, and resulting combined columns appended on the right.

Author(s)

Yoann Pageaud.

Examples

dtbl1 <- data.table(col1 = rev(seq(16)),
                    col2 = c(rep(x = c("hello", "world"), 4), rep(NA, 8)))
dtbl2 <- data.table(col1 = rev(seq(16)),
                    col2 = c(rep(NA, 4), rep(x = c("hello", "world"), 6)))
#'dtbl1' and 'dtbl2' are both missing different values in 'col2'.

dtbl.mrg <- merge(x = dtbl1, y = dtbl2, by = "col1")
dtbl.mrg
#The colname of the 2nd column of 'dtbl1' and 'dtbl2' is the same.
#merge() appends '.x' and '.y' respectively to 'col2' in 'dtbl1' and 'dtbl2'.

# Are 'col2.y' and 'col2.x' partially duplicated ?
dt.combine(dtbl.mrg) # Yes!
# 'col2.x' and 'col2.y' have been combined into 'col2'.

YoannPa/DTrsiv documentation built on Nov. 28, 2022, 5:44 p.m.