checkdupl: Find and remove duplicated row observations between two data...

checkduplR Documentation

Find and remove duplicated row observations between two data sets


Function checkdupl finds the duplicated row observations between two matrices or data frames.

Function rmdupl removes the duplicated row observations between two matrices or data frames.


checkdupl(X, Y, nam = NULL, digits = NULL, check.all = FALSE)

rmdupl(X, nam = NULL, digits = NULL, check.all = FALSE)



A matrix or data frame, compared to Y.


A matrix or data frame, compared to X.


The names of the variables to consider in X and Y: the test of duplication is undertaken only over the variables in nam. If NULL (default), nam is set to all the column names of X. The variables set in nam must be common between X and Y.


The number of digits used when rounding the variables (set in nam) before the test. Default to NULL (no rounding.


Logical (default = FALSE). If TRUE, an additionnal test of duplication is undertaken considering all the columns of X (even if nam is defined as a part of these columns).


A data frame reporting the duplicated rows.


dat1 <- matrix(c(1:5, 1:5, c(1, 2, 7, 4, 8)), nrow = 3, byrow = TRUE)
dimnames(dat1) <- list(1:3, c("v1", "v2", "v3", "v4", "v5"))

dat2 <- matrix(c(6:10, 1:5, c(1, 2, 7, 6, 12)), nrow = 3, byrow = TRUE)
dimnames(dat2) <- list(1:3, c("v1", "v2", "v3", "v4", "v5"))


checkdupl(dat1, dat2)

checkdupl(dat1, dat2, nam = c("v1", "v2"))

checkdupl(dat1, dat2, nam = c("v1", "v2"), check.all = TRUE)

z <- checkdupl(X = dat1, Y = dat1)
z[z$rownum.X != z$rownum.Y, ]

z <- checkdupl(dat1, dat1, nam = c("v1", "v2"))
z[z$rownum.Y != z$rownum.Y, ]


rmdupl(dat1, nam = c("v1", "v2"))

