remove.dup.rows: Remove duplicate rows

Description Usage Arguments Details Value Note Author(s) Examples

Description

Removes duplicate rows from a dataframe.

Usage

1

Arguments

dfr

A dataframe

Details

Uses the function eql.

Value

The dataframe with only one copy of identical rows.

Note

Method: Sort the dataframe, figure out which rows have all values identical to their successor. This gives logical vector, in the order of the sorted values, so reorder it. Finally select nondups. As a "bonus feature", I think this will also remove any row containing all NA's...

A major stumbling block is that you'll want two NAs to compare equal, hence the eql() function.

Actually, I think you can do away with the isdup array and do

all.dup <- do.call("pmin", lapply(dfr[o,], function(x) eql(x,c(x[-1],NA))))

and there may be further cleanups possible.

One dirty trick which is much quicker but not quite as reliable is

dfr[!duplicated(do.call("paste",dfr)), ]

(watch out for character strings with embedded spaces and underflowing differences in numeric data!)

Author(s)

Peter Dalgaard, p.dalgaard@biostat.ku.dk

Examples

1
2
  dfr <- data.frame(matrix(c(1:3,2:4,1:3,1:3,2:4,3:5),6,byrow=TRUE))
  remove.dup.rows(dfr)

cwhmisc documentation built on May 1, 2019, 7:55 p.m.