Description Usage Arguments Details Value Warning See Also Examples
These S3 methods are alternative (typically much faster) implementations of counterparts in the base
package for atomic matrices.
unique.matrix
returns a matrix with duplicated rows (or columns) removed.
duplicated.matrix
returns a logical vector indicating which rows (or columns) are duplicated.
anyDuplicated.matrix
returns an integer indicating the index of the first duplicate row (or column) if any, and 0L
otherwise.
1 2 3 4 5 6 7 8 9 | ## S3 method for class 'matrix'
unique(x, incomparables = FALSE, MARGIN = 1,
fromLast = FALSE, signif=Inf, ...)
## S3 method for class 'matrix'
duplicated(x, incomparables = FALSE, MARGIN = 1,
fromLast = FALSE, signif=Inf,...)
## S3 method for class 'matrix'
anyDuplicated(x, incomparables = FALSE,
MARGIN = 1, fromLast = FALSE, signif=Inf,...)
|
x |
an atomic matrix of mode |
incomparables |
a vector of values that cannot be compared, as in |
fromLast |
a logical scalar indicating if duplication should be considered
from the last, as in |
... |
arguments for particular methods. |
MARGIN |
a numeric scalar, the matrix margin to be held fixed, as in |
signif |
a numerical scalar only applicable to numeric or complex |
These S3 methods are alternative implementations of counterparts in the base
package for atomic matrices (i.e., double, integer, logical, character, complex and raw) directly based on C++98 Standard Template Library (STL) std::set
, or C++11 STL std::unordered_set
. The implementation treats the whole row (or column) vector as the key, without the intermediate steps of converting the mode to character
nor collapsing them into a scalar as done in base
. On systems with empty `R CMD config CXX1X`
, the C++98 STL std::set
is used, which is typically implemented as a self-balancing tree (usually a red-black tree) that takes O[n log(n)] to find all duplicates, where n=dim(x)[MARGIN]
. On systems with non-empty `R CMD config CXX1X`
, the C++11 STL std::unordered_set
is used, with average O(n) performance and worst case O(n^2) performance.
Missing values are regarded as equal, but NaN
is not equal to
NA_real_
.
Further, in contrast to the base
counterparts, characters are compared directly based on their internal representations; i.e., no encoding issues for characters. Complex values are compared by their real and imaginary parts separately.
unique.matrix
returns a matrix with duplicated rows (if MARGIN=1
) or columns (if MARGIN=2
) removed.
duplicated.matrix
returns a logical vector indicating which rows (if MARGIN=1
) or columns (if MARGIN=2
) are duplicated.
anyDuplicated.matrix
returns an integer indicating the index of the first (if fromLast=FALSE
) or last (if fromLast=TRUE
) duplicate row (if MARGIN=1
) or column (if MARGIN=2
) if any, and 0L
otherwise.
In contrast to the base
counterparts,
characters are compared directly based on their internal representations without considering encoding issues; for numeric and complex matrices, the default signif
is Inf
, i.e. comparing floating point values directly without rounding; and long vectors are not supported yet.
base::duplicated
, base::unique
, signif
, grpDuplicated
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 | ## prepare test data:
set.seed(9992722L, kind="Mersenne-Twister")
x.double=model.matrix(~gl(5,8))[sample(40), ]
## typical uses
unique(x.double)
unique(x.double, fromLast=TRUE)
unique(t(x.double), MARGIN=2)
unique(t(x.double), MARGIN=2, fromLast=TRUE)
anyDuplicated(x.double)
anyDuplicated(x.double, fromLast = TRUE)
## additional atomic test data
x.integer=as.integer(x.double); attributes(x.integer)=attributes(x.double)
x.factor=as.factor(x.integer); dim(x.factor)=dim(x.integer); dimnames(x.factor)=dimnames(x.integer)
x.logical=as.logical(x.double); attributes(x.logical)=attributes(x.double)
x.character=as.character(x.double); attributes(x.character)=attributes(x.double)
x.complex=as.complex(x.double); attributes(x.complex)=attributes(x.double)
x.raw=as.raw(x.double); attributes(x.raw)=attributes(x.double)
## compare results with base:
stopifnot(identical(base::duplicated.matrix(x.double),
uniqueAtomMat::duplicated.matrix(x.double)
))
stopifnot(identical(base::duplicated.matrix(x.integer, fromLast=TRUE),
uniqueAtomMat::duplicated.matrix(x.integer, fromLast=TRUE)
))
stopifnot(identical(base::duplicated.matrix(t(x.logical), MARGIN=2L),
uniqueAtomMat::duplicated.matrix(t(x.logical), MARGIN=2L)
))
stopifnot(identical(base::duplicated.matrix(t(x.character), MARGIN=2L, fromLast=TRUE),
uniqueAtomMat::duplicated.matrix(t(x.character), MARGIN=2L, fromLast=TRUE)
))
stopifnot(identical(base::unique.matrix(x.complex),
uniqueAtomMat::unique.matrix(x.complex)
))
stopifnot(identical(base::unique.matrix(x.raw),
uniqueAtomMat::unique.matrix(x.raw)
))
stopifnot(identical(base::unique.matrix(x.factor),
uniqueAtomMat::unique.matrix(x.factor)
))
stopifnot(identical(base::duplicated.matrix(x.double, MARGIN=0),
uniqueAtomMat::duplicated.matrix(x.double, MARGIN=0)
))
stopifnot(identical(base::anyDuplicated.matrix(x.integer, MARGIN=0),
uniqueAtomMat::anyDuplicated.matrix(x.integer, MARGIN=0)
))
## benchmarking
if (require(microbenchmark)){
print(microbenchmark(base::duplicated.matrix(x.double)))
print(microbenchmark(uniqueAtomMat::duplicated.matrix(x.double)))
print(microbenchmark(base::duplicated.matrix(x.character)))
print(microbenchmark(uniqueAtomMat::duplicated.matrix(x.character)))
}else{
print(system.time(replicate(5e3L, base::duplicated.matrix(x.double))))
print(system.time(replicate(5e3L, uniqueAtomMat::duplicated.matrix(x.double))))
print(system.time(replicate(5e3L, base::duplicated.matrix(x.character))))
print(system.time(replicate(5e3L, uniqueAtomMat::duplicated.matrix(x.character))))
}
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.