Description Usage Arguments Value Examples
Deduplicate records
1 | dedup(x, how = "one", tolerance = 0.9)
|
x |
(data.frame) A data.frame, tibble, or data.table |
how |
(character) How to deal with duplicates. The default of
"one" keeps one record of each group of duplicates, and drops the
others, putting them into the |
tolerance |
(numeric) Score (0 to 1) at which to determine a match. You'll want to inspect outputs closely to tweak this value based on your data, as results can vary. |
Returns a data.frame, optionally with attributes
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 | df <- sample_data_1
smalldf <- df[1:20, ]
smalldf <- rbind(smalldf, smalldf[10,])
smalldf[21, "key"] <- 1088954555
NROW(smalldf)
dp <- dframe(smalldf) %>% dedup()
NROW(dp)
attr(dp, "dups")
# Another example - more than one set of duplicates
df <- sample_data_1
twodups <- df[1:10, ]
twodups <- rbind(twodups, twodups[c(9, 10), ])
rownames(twodups) <- NULL
NROW(twodups)
dp <- dframe(twodups) %>% dedup()
NROW(dp)
attr(dp, "dups")
|
[1] 21
[1] 20
<scrubr dframe>
Size: 1 X 5
name longitude latitude date key
(chr) (dbl) (dbl) (time) (dbl)
1 Ursus americanus -76.78671 35.53079 2015-04-05 23:00:00 1088954555
[1] 12
[1] 10
<scrubr dframe>
Size: 2 X 5
name longitude latitude date key
(chr) (dbl) (dbl) (time) (int)
1 Ursus americanus -78.25027 36.93018 2015-03-20 21:11:24 1088923534
2 Ursus americanus -76.78671 35.53079 2015-04-05 23:00:00 1088954559
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.