ez.duplicated: duplicated

View source: R/frame.R

ez.duplicatedR Documentation

duplicated

Description

find the duplicated rows/cols in a data frame or duplicated elements in a vector of any data type (factor, char, numeric)
ez.notduplicated is not the same as unique, but unique/distinct minus any of the duplicated

Usage

ez.duplicated(
  x,
  col = NULL,
  vec = TRUE,
  vecgroup = FALSE,
  dim = 1,
  incomparables = FALSE,
  value = FALSE,
  keepall = TRUE,
  ...
)

ez.notduplicated(
  x,
  col = NULL,
  vec = TRUE,
  dim = 1,
  incomparables = FALSE,
  value = FALSE,
  keepall = TRUE,
  ...
)

Arguments

x

a data frame or a vector/col of any data type (factor, char, numeric)

col

restrict to the columns where you would like to search for duplicates, evaluated by eval('dplyr::select()'); e.g., 3, c(3), 2:5, "place", c("place","age")
if x is a data frame, col is specified (e.g., "cond"), check that col only
if x is a data frame, col is unspecified (i.e., NULL default), check all cols in x
if x is not a data frame, col is ignored

vec

TRUE/FALSE, if TRUE, returns a vector of TRUE/FALSE indicating duplicates;
if FALSE, returns a df with one column 'Duplicated' of TRUE/FALSE
This is useful for binding with other data frames
T/F could be replaced by 0,1,2, see vecgroup

vecgroup

TRUE/FALSE, if TRUE, returns a vector of 0,1,2 indicating duplicates, where
0=F, no duplicates; 1=duplicates group 1; 2=duplicates group 2, etc

dim

1=find duplicated rows, 2=find duplicated cols. dim has no effect when x is a vector

incomparables

a vector of values that cannot be compared. FALSE is a special value, meaning that all values can be compared,
and may be the only value accepted for methods other than the default. It will be coerced internally to the same type as x.
not applicable to data.frame x (see https://stackoverflow.com/a/29730485/2292993), but ok for vector x

value

TRUE/FALSE, if TRUE, returns actual duplicated values, instead of logicals. The returned data type is the same as the original (data frame->data frame, factor->factor, etc, because only slicing based on logicals). Ignore/Overwrite vec, vecgroup.

keepall

TRUE/FALSE, only applicable when value=T (otherwise ignored). When col is specified, value only returns for that col. Use keepall=T to return all cols in input df

Value

return depends, see vec above (By default, missing values are regarded as equal, to avoid that, pass incomparables=NA)
this is different from the built-in R duplicated
x <- c(1, 1, 4, 5, 4, 6) duplicated(x) returns [1] FALSE TRUE FALSE FALSE TRUE FALSE
but ez.duplicated(x) returns [1] TRUE TRUE TRUE FALSE TRUE FALSE
Also, the function has a trick, so that duplicated cols could be checked, while the native duplicated cannot directly apply to cols. See https://stackoverflow.com/questions/9818125/

Examples

c(2,2,3) %>% data.frame(col=.) %>% ez.duplicated(incomparables = 4)  # error
c(2,2,3) %>% ez.duplicated(incomparables = 4)  # OK  note that 4 is not even an element of the vector

jerryzhujian9/zmisc documentation built on March 9, 2024, 12:49 a.m.