poorman: A Poor Man's Dependency Free Recreation of 'dplyr'

distinct

R Documentation

Subset distinct/unique rows

Description

Select only distinct/unique rows from a data.frame.

Usage

distinct(.data, ..., .keep_all = FALSE)

Arguments

`.data`	A `data.frame`.
`...`	Optional variables to use when determining uniqueness. If there are multiple rows for a given combination of inputs, only the first row will be preserved. If omitted, will use all variables.
`.keep_all`	`logical(1)`. If `TRUE`, keep all variables in `.data`. If a combination of `...` is not distinct, this keeps the first row of values.

Value

A data.frame with the following properties:

Rows are a subset of the input but appear in the same order.
Columns are not modified if ... is empty or .keep_all is TRUE. Otherwise, distinct() first calls mutate() to create new columns.
Groups are not modified.
data.frame attributes are preserved.

Examples

df <- data.frame(
  x = sample(10, 100, rep = TRUE),
  y = sample(10, 100, rep = TRUE)
)
nrow(df)
nrow(distinct(df))
nrow(distinct(df, x, y))

distinct(df, x)
distinct(df, y)

# You can choose to keep all other variables as well
distinct(df, x, .keep_all = TRUE)
distinct(df, y, .keep_all = TRUE)

# You can also use distinct on computed variables
distinct(df, diff = abs(x - y))

# The same behaviour applies for grouped data frames,
# except that the grouping variables are always included
df <- data.frame(
  g = c(1, 1, 2, 2),
  x = c(1, 1, 2, 1)
) %>% group_by(g)
df %>% distinct(x)

nathaneastwood/poorman documentation built on Feb. 10, 2024, 1:41 p.m.