dupes | R Documentation |
Checks a data frame for duplicated rows based on specified
variables to use for checking (via ...
) or all columns (if
unspecified).dupes
is a convenience shortcut for copies
with the "filter" argument set to "dupes" and the "sort_by_copies" argument
set to TRUE by default. For greater flexibility in checking row copy
numbers or filtering for distinct rows, use copies
instead.
dupes
behaves similarly to get_dupes
) but is
substantially faster due to the use of data.table
as a backend.
dupes( data, ..., keep_all_cols = TRUE, sort_by_copies = TRUE, order = c("d", "a", "i"), na_last = FALSE, output = c("same", "tibble", "dt", "data.frame") )
data |
a data frame, tibble, or data.table. |
... |
This special argument accepts any number of unquoted column names
(also present in the data source) to use when searching for duplicates,
e.g. |
keep_all_cols |
If column names are specified using |
sort_by_copies |
If TRUE (the default), sorts the results by the number
of copies, in order specified by the |
order |
If sort_by_copies is set to TRUE, this controls whether the results should be sorted in order of descending/decreasing = "d" (the default) or ascending/increasing = "a" or "i" copy numbers. |
na_last |
should rows of the specified columns with missing values be listed below non-missing values (TRUE/FALSE)? Default is FALSE. |
output |
"tibble" for tibble, "dt" for data.table, or "data.frame" for a data frame. "same", the default option, returns the same format as the input data. |
A subset of the input data frame consisting of duplicated rows that were
detected based on specified variables used to condition the search. A
message will also be printed to the console indicating whether or not
duplicates were detected. An n_copies
column is appended specifying the
total number of copies of each row that were detected.
Craig P. Hutton, craig.hutton@gov.bc.ca
copies
, get_dupes
# check for duplicates based on one variable, "g" in this case dupes(pdata, g) ## Not run: dupes(pdata, high_low, g) #check based on 2 variables # check based on all variables, i.e. fully duplicated rows dupes(pdata) ## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.