grpDuplicated: Grouping by duplicated elements

View source: R/duplicated.matrix.R

grpDuplicatedR Documentation

Grouping by duplicated elements


grpDuplicated() is a generic function that takes an indexed set of "elements", and outputs an integer vector with the same length. The "elements" can be components of a vector, or the row vectors or column vectors of a matrix. In the output vector, a component is 0 if and only if the corresponding element is unique. When the element is unique, it forms a singleton group. Output components have equal positive integer values if and only if the corresponding elements are identical to each other. These elements form a non-singleton group, and the positive integer is called the group number.

The number of singleton groups is equal to #(zeros), which is equal to the #(elements) - #(duplicated elements).
The number of non-singleton groups is equal to max(output vector).
The number of all groups is equal to #(zeros) + max(output vector).


## Default S3 method:
grpDuplicated( x, ... )
## S3 method for class 'matrix'
grpDuplicated( x, MARGIN=1, ... )



a vector or matrix of atomic mode "numeric", "integer", "logical", "complex", "character" or "raw".


an integer scalar, the matrix margin to be held fixed, as in apply. MARGIN=1 means that it looks for duplicated rows, and MARGIN=2 means that it looks for duplicated columns. Other values are invalid.


arguments for particular methods.


The implementation is based on std::unordered_map in C++11, which uses a hash-table.


The return value is an integer vector with all elements ranging from 0 to K, where K is the number of non-singleton groups.
For vector x the elements are the vector components, and the output is the same length as the input.
For a matrix x with MARGIN=1, the elements are the rows of the matrix and the output has length nrow(x).
For a matrix x with MARGIN=2, the elements are the columns of the matrix and the output has length ncol(x).
The 'ngroups' attribute of the returned vector is set to an integer 3-vector. The 1st component is the total number of groups, the 2nd component is the number of singleton groups, and the 3rd component is the number of non-singleton groups K.


The templated C++ function that does the real work is taken from the package uniqueAtomMat by Long Qu, but the returned vector is slightly modified by Glenn Davis.


Long Qu and Glenn Davis

The package uniqueAtomMat was removed from CRAN by its author Long Qu.



#   test a numeric vector
x = rnorm(7)
y = rnorm(5)
grpDuplicated( c(x,y,rev(x)) )
##  [1] 7 6 5 4 3 2 1 0 0 0 0 0 1 2 3 4 5 6 7
##  attr(,"ngroups")
##  [1] 12  5  7

# test a numeric matrix, both rows and columns
A = matrix( rnorm(3*7), 3, 7 )
B = matrix( rnorm(3*5), 3, 5 )

#   the columns of cbind(A,B,A) have the duplicates one would expect
grpDuplicated( cbind(A,B,A), MARGIN=2 )
##  [1] 1 2 3 4 5 6 7 0 0 0 0 0 1 2 3 4 5 6 7
##  attr(,"ngroups")
##  [1] 12  5  7

# but the rows of cbind(A,B,A) are unique
grpDuplicated( cbind(A,B,A), MARGIN=1 )
##  [1] 0 0 0
##  attr(,"ngroups")
##  [1] 3 3 0

zonohedra documentation built on Sept. 11, 2024, 5:20 p.m.