Description Usage Arguments Details Value See Also Examples
duplicated
returns a logical vector indicating which rows of a data.table
have duplicate rows (by key).
unique
returns a data table with duplicated rows (by key) removed, or
(when no key) duplicated rows by all columns removed.
anyDuplicated
returns the index i
of the first duplicated entry if there is one, and 0 otherwise.
1 2 3 4 5 6 7 8 | ## S3 method for class 'data.table'
duplicated(x, incomparables=FALSE, fromLast=FALSE, by=key(x), ...)
## S3 method for class 'data.table'
unique(x, incomparables=FALSE, fromLast=FALSE, by=key(x), ...)
## S3 method for class 'data.table'
anyDuplicated(x, incomparables=FALSE, fromLast=FALSE, by=key(x), ...)
|
x |
A data.table. |
... |
Not used at this time. |
incomparables |
Not used. Here for S3 method consistency. |
fromLast |
logical indicating if duplication should be considered from the reverse side, i.e., the last (or rightmost) of identical elements would correspond to |
by |
|
Because data.tables are usually sorted by key, tests for duplication are
especially quick when only the keyed columns are considred.
Unlike unique.data.frame
, paste
is not
used to ensure equality of floating point data. This is done directly (for speed)
whilst still respecting tolerance in the same spirit as all.equal
.
Any combination of columns can be used to test for uniqueness (not just the
key columns) and are specified via the by
parameter. To get
the analagous data.frame
functionality for unique
and
duplicated
, set by
to NULL
.
From v1.9.4
, both duplicated
and unique
methods also gain the logical argument fromLast
, as in base, and by default is FALSE
.
Conceptually duplicated(x, by=cols, fromLast=TRUE)
is equivalent to rev(duplicated(rev(x), by=cols))
, but is much faster. rev(x)
is used just to illustrate the concept, as it clearly applies only to vectors. In the context of data.table
, rev(x)
would mean rearranging the rows of all columns in reverse order.
v1.9.4
also implements anyDuplicated
method for data.table
. It calculates the duplicate entries and returns the first duplicated index, if one exists, and 0 otherwise. It's very similar to any(duplicated(DT))
except that this returns TRUE
or FALSE
.
duplicated
returns a logical vector of length nrow(x)
indicating which rows are duplicates.
unique
returns a data table with duplicated rows removed.
anyDuplicated
returns a integer value with the index of first duplicate. If none exists, 0L is returned.
data.table
, duplicated
, unique
, all.equal
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 | DT <- data.table(A = rep(1:3, each=4), B = rep(1:4, each=3), C = rep(1:2, 6), key = "A,B")
duplicated(DT)
unique(DT)
duplicated(DT, by="B")
unique(DT, by="B")
duplicated(DT, by=c("A", "C"))
unique(DT, by=c("A", "C"))
DT = data.table(a=c(2L,1L,2L), b=c(1L,2L,1L)) # no key
unique(DT) # rows 1 and 2 (row 3 is a duplicate of row 1)
DT = data.table(a=c(3.142, 4.2, 4.2, 3.142, 1.223, 1.223), b=rep(1,6))
unique(DT) # rows 1,2 and 5
DT = data.table(a=tan(pi*(1/4 + 1:10)), b=rep(1,10)) # example from ?all.equal
length(unique(DT$a)) # 10 strictly unique floating point values
all.equal(DT$a,rep(1,10)) # TRUE, all within tolerance of 1.0
DT[,which.min(a)] # row 10, the strictly smallest floating point value
identical(unique(DT),DT[1]) # TRUE, stable within tolerance
identical(unique(DT),DT[10]) # FALSE
# fromLast=TRUE
DT <- data.table(A = rep(1:3, each=4), B = rep(1:4, each=3), C = rep(1:2, 6), key = "A,B")
duplicated(DT, by="B", fromLast=TRUE)
unique(DT, by="B", fromLast=TRUE)
# anyDuplicated
anyDuplicated(DT, by=c("A", "B")) # 3L
any(duplicated(DT, by=c("A", "B"))) # TRUE
|
[1] FALSE FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE
A B C
1: 1 1 1
2: 1 1 2
3: 1 2 2
4: 2 2 1
5: 2 2 2
6: 2 3 1
7: 2 3 2
8: 3 3 1
9: 3 4 2
10: 3 4 1
[1] FALSE TRUE TRUE FALSE TRUE TRUE FALSE TRUE TRUE FALSE TRUE TRUE
A B C
1: 1 1 1
2: 1 2 2
3: 2 3 1
4: 3 4 2
[1] FALSE FALSE TRUE TRUE FALSE FALSE TRUE TRUE FALSE FALSE TRUE TRUE
A B C
1: 1 1 1
2: 1 1 2
3: 2 2 1
4: 2 2 2
5: 3 3 1
6: 3 4 2
a b
1: 2 1
2: 1 2
a b
1: 3.142 1
2: 4.200 1
3: 1.223 1
[1] 10
[1] TRUE
[1] 10
[1] FALSE
[1] FALSE
[1] TRUE TRUE FALSE TRUE TRUE FALSE TRUE TRUE FALSE TRUE TRUE FALSE
A B C
1: 1 1 1
2: 2 2 2
3: 3 3 1
4: 3 4 2
[1] 2
[1] TRUE
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.