dedup | R Documentation |
Deduplicate records
dedup(x, how = "one", tolerance = 0.9)
x |
(data.frame) A data.frame, tibble, or data.table |
how |
(character) How to deal with duplicates. The default of
"one" keeps one record of each group of duplicates, and drops the
others, putting them into the |
tolerance |
(numeric) Score (0 to 1) at which to determine a match. You'll want to inspect outputs closely to tweak this value based on your data, as results can vary. |
Returns a data.frame, optionally with attributes
df <- sample_data_1 smalldf <- df[1:20, ] smalldf <- rbind(smalldf, smalldf[10,]) smalldf[21, "key"] <- 1088954555 NROW(smalldf) dp <- dframe(smalldf) %>% dedup() NROW(dp) attr(dp, "dups") # Another example - more than one set of duplicates df <- sample_data_1 twodups <- df[1:10, ] twodups <- rbind(twodups, twodups[c(9, 10), ]) rownames(twodups) <- NULL NROW(twodups) dp <- dframe(twodups) %>% dedup() NROW(dp) attr(dp, "dups")
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.