dupes: Check the number of duplicated rows in a data frame.
In bcgov/elucidate: Convenience Functions to Help Researchers Elucidate Patterns in Their Data

dupes

R Documentation

Check the number of duplicated rows in a data frame.

Description

Checks a data frame for duplicated rows based on specified variables to use for checking (via ...) or all columns (if unspecified).dupes is a convenience shortcut for copies with the "filter" argument set to "dupes" and the "sort_by_copies" argument set to TRUE by default. For greater flexibility in checking row copy numbers or filtering for distinct rows, use copies instead. dupes behaves similarly to get_dupes) but is substantially faster due to the use of data.table as a backend.

Usage

dupes(
  data,
  ...,
  keep_all_cols = TRUE,
  sort_by_copies = TRUE,
  order = c("d", "a", "i"),
  na_last = FALSE,
  output = c("same", "tibble", "dt", "data.frame")
)

Arguments

`data`	a data frame, tibble, or data.table.
`...`	This special argument accepts any number of unquoted column names (also present in the data source) to use when searching for duplicates, e.g. `x, y, z`. Also accepts a character vector of column names or index numbers, e.g. c("x", "y", "z") or c(1, 2, 3), but not a mixture of formats in the same call. If no column names are specified, all columns will be used.
`keep_all_cols`	If column names are specified using `...`, this allows you to drop unspecified columns, similarly to the `.keep_all` argument for 'dplyr::distinct()“
`sort_by_copies`	If TRUE (the default), sorts the results by the number of copies, in order specified by the `order` argument.
`order`	If sort_by_copies is set to TRUE, this controls whether the results should be sorted in order of descending/decreasing = "d" (the default) or ascending/increasing = "a" or "i" copy numbers.
`na_last`	should rows of the specified columns with missing values be listed below non-missing values (TRUE/FALSE)? Default is FALSE.
`output`	"tibble" for tibble, "dt" for data.table, or "data.frame" for a data frame. "same", the default option, returns the same format as the input data.

Value

A subset of the input data frame consisting of duplicated rows that were detected based on specified variables used to condition the search. A message will also be printed to the console indicating whether or not duplicates were detected. An n_copies column is appended specifying the total number of copies of each row that were detected.

Author(s)

Craig P. Hutton, craig.hutton@gov.bc.ca

Examples


# check for duplicates based on one variable, "g" in this case
dupes(pdata, g)

## Not run: 
dupes(pdata, high_low, g) #check based on 2 variables

# check based on all variables, i.e. fully duplicated rows
dupes(pdata)

## End(Not run)

bcgov/elucidate documentation built on Sept. 3, 2022, 7:16 p.m.

bcgov/elucidate index

README.md

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

bcgov/elucidate
Convenience Functions to Help Researchers Elucidate Patterns in Their Data

dupes: Check the number of duplicated rows in a data frame.
In bcgov/elucidate: Convenience Functions to Help Researchers Elucidate Patterns in Their Data

Check the number of duplicated rows in a data frame.

Description

Usage

Arguments

Value

Author(s)

See Also

Examples

Related to dupes in bcgov/elucidate...

R Package Documentation

Browse R Packages

We want your feedback!

bcgov/elucidate Convenience Functions to Help Researchers Elucidate Patterns in Their Data

dupes: Check the number of duplicated rows in a data frame. In bcgov/elucidate: Convenience Functions to Help Researchers Elucidate Patterns in Their Data

Check the number of duplicated rows in a data frame.

Description

Usage

Arguments

Value

Author(s)

See Also

Examples

Related to dupes in bcgov/elucidate...

R Package Documentation

Browse R Packages

We want your feedback!

bcgov/elucidate
Convenience Functions to Help Researchers Elucidate Patterns in Their Data

dupes: Check the number of duplicated rows in a data frame.
In bcgov/elucidate: Convenience Functions to Help Researchers Elucidate Patterns in Their Data