delete_duplicates_DF: Delete rows with duplicated values

delete_duplicates_DFR Documentation

Delete rows with duplicated values

Description

Delete data frame rows if they contain duplicated values.

Usage

delete_duplicates_DF(
  data,
  duplicated.var,
  exact = FALSE,
  stay = "first",
  choose.var,
  choose.stay.val,
  pattern,
  mc.cores = 1,
  verbose = TRUE
)

Arguments

data

data frame;

duplicated.var

variable that contains duplicated values

exact

logical; values are to be matched as is

stay

character; which row with duplicated values will stay; possible values are "first" (first of rows), "choose" (depending of the value of other variable) and "none" (rows with values that contain pattern will be removed)

choose.var, choose.stay.val

vector of additional variable to choose the preferred row and it's preferred value (used if stay = "choose")

pattern

deleted pattern (used if stay = "none")

mc.cores

integer; number of processors for parallel computation (not supported on Windows)

verbose

logical; show messages

Details

This function checks if there are repeated values in the data frame (in the duplicated.var). If repeated values are found, the first row with duplicated value stays, others are deleted (if stay = "first"). If stay = "choose" the first row with duplicated values and choose.var = choose.stay.val will stay. If there are no rows with choose.var = choose.stay.val, the first row will stay.

If stay = "none" all rows with values that contain pattern will be removed.

Value

Data frame without rows that contain duplicates in duplicated.var

Author(s)

Elena N. Filatova

Examples

data <- data.frame (N = c(1:5, 11:15), name = c(rep( "A",4), "AA", rep( "B",3), "BB", "C"),
                choose = c(rep(c("yes", "no"), 3), "yes", "yes", "no", "no"))
delete_duplicates_DF (data = data, duplicated.var = data$N, exact = TRUE, stay = "first")
delete_duplicates_DF (data = data, duplicated.var = data$N, exact = FALSE, stay = "first")
delete_duplicates_DF (data = data, duplicated.var = data$name, exact = TRUE, stay = "first")
delete_duplicates_DF (data = data, duplicated.var = data$name, exact = TRUE,
                    stay = "choose", choose.var = data$choose, choose.stay.val = "yes")
delete_duplicates_DF (data = data, duplicated.var = data$name, exact = FALSE, stay = "first")
delete_duplicates_DF (data = data, duplicated.var = data$name, exact = FALSE,
                    stay = "choose", choose.var = data$choose, choose.stay.val = "yes")
delete_duplicates_DF (data =data, duplicated.var = data$name, stay = "none",
                    pattern = c("A", "B"), exact = TRUE)
delete_duplicates_DF (data =data, duplicated.var = data$name, stay = "none",
                    pattern = c("A", "B"), exact = FALSE)


disprose documentation built on March 19, 2022, 2:15 a.m.