rm_dup: Remove duplicates from a data frame

Description Usage Arguments Details See Also Examples

View source: R/rm_dup.R

Description

rm_dup() finds all rows in a data frame which share the same entry for a target column and returns a data frame where only the first or last of each set of duplicates is retained.

Usage

1
rm_dup(df, ind_col, keep_last = FALSE)

Arguments

df

data frame

ind_col

character value giving the name of the column to be searched for duplicate entries

keep_last

logical value indicating if the last, instead of the first, of each set of duplicates should be retained. defaults to FALSE, i.e. to retaining the first of each set of duplicates.

rm_na

logical value. if set to TRUE, rows with NA in the specified column are removed.

Details

rm_dup finds all rows in a data frame which share the same entry for a target column and returns a data frame where duplicates have been removed. For each set of duplicates, in a first step, the row with the most non-missing/non-NA values is retained. In a second step, if there are duplicate rows where more than one row has non-NA values for all columns, either the first (keep_last=FALSE) or the last (keep_last=TRUE) row in the set of duplicates is kept. Rows with NA entries in the target column are left as they are (even if there are multiple NA's). If no duplicates are found, the data frame is returned as-is.

See Also

order

Examples

1
2
3
df <- data.frame(x = c(1, 3, 1, 1, 5, 8, 5, 9, 10, NA),
y = c(3, 1, 4, 8, 10, 8, 9, 10, 11, 5))
rm_dup(df, "x")

AnonZebra/synratss documentation built on Oct. 9, 2021, 2:31 a.m.