Description Usage Arguments Details Value Note Author(s) Examples
Removes duplicate rows from a dataframe.
1 | remove.dup.rows(dfr)
|
dfr |
A dataframe |
Uses the function eql.
The dataframe with only one copy of identical rows.
Method: Sort the dataframe, figure out which rows have all values identical to their successor. This gives logical vector, in the order of the sorted values, so reorder it. Finally select nondups. As a "bonus feature", I think this will also remove any row containing all NA's...
A major stumbling block is that you'll want two NAs to compare equal, hence the eql() function.
Actually, I think you can do away with the isdup array and do
all.dup <- do.call("pmin", lapply(dfr[o,], function(x) eql(x,c(x[-1],NA))))
and there may be further cleanups possible.
One dirty trick which is much quicker but not quite as reliable is
dfr[!duplicated(do.call("paste",dfr)), ]
(watch out for character strings with embedded spaces and underflowing differences in numeric data!)
Peter Dalgaard, p.dalgaard@biostat.ku.dk
1 2 | dfr <- data.frame(matrix(c(1:3,2:4,1:3,1:3,2:4,3:5),6,byrow=TRUE))
remove.dup.rows(dfr)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.