distinct: Select distinct/unique rows

Description Usage Arguments Details Examples

View source: R/distinct.R

Description

Retain only unique/distinct rows from an input tbl. This is similar to unique.data.frame(), but considerably faster.

Usage

1
distinct(.data, ..., .keep_all = FALSE)

Arguments

.data

a tbl

...

Optional variables to use when determining uniqueness. If there are multiple rows for a given combination of inputs, only the first row will be preserved. If omitted, will use all variables.

.keep_all

If TRUE, keep all variables in .data. If a combination of ... is not distinct, this keeps the first row of values.

Details

Comparing list columns is not fully supported. Elements in list columns are compared by reference. A warning will be given when trying to include list columns in the computation. This behavior is kept for compatibility reasons and may change in a future version. See examples.

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
df <- tibble(
  x = sample(10, 100, rep = TRUE),
  y = sample(10, 100, rep = TRUE)
)
nrow(df)
nrow(distinct(df))
nrow(distinct(df, x, y))

distinct(df, x)
distinct(df, y)

# Can choose to keep all other variables as well
distinct(df, x, .keep_all = TRUE)
distinct(df, y, .keep_all = TRUE)

# You can also use distinct on computed variables
distinct(df, diff = abs(x - y))

# The same behaviour applies for grouped data frames
# except that the grouping variables are always included
df <- tibble(
  g = c(1, 1, 2, 2),
  x = c(1, 1, 2, 1)
) %>% group_by(g)
df %>% distinct()
df %>% distinct(x)

# Values in list columns are compared by reference, this can lead to
# surprising results
tibble(a = as.list(c(1, 1, 2))) %>% glimpse() %>% distinct()
tibble(a = as.list(1:2)[c(1, 1, 2)]) %>% glimpse() %>% distinct()

Example output

Attaching package: 'dplyr'

The following objects are masked from 'package:stats':

    filter, lag

The following objects are masked from 'package:base':

    intersect, setdiff, setequal, union

[1] 100
[1] 62
[1] 62
# A tibble: 10 x 1
       x
   <int>
 1     7
 2     1
 3     3
 4     4
 5     9
 6     8
 7     2
 8     6
 9     5
10    10
# A tibble: 10 x 1
       y
   <int>
 1     7
 2     9
 3     6
 4    10
 5     1
 6     2
 7     3
 8     4
 9     8
10     5
# A tibble: 10 x 2
       x     y
   <int> <int>
 1     7     7
 2     1     9
 3     3     6
 4     4    10
 5     9     6
 6     8     1
 7     2     2
 8     6     9
 9     5     4
10    10    10
# A tibble: 10 x 2
       x     y
   <int> <int>
 1     7     7
 2     1     9
 3     3     6
 4     4    10
 5     1     1
 6     2     2
 7     8     3
 8     8     4
 9     1     8
10     2     5
# A tibble: 10 x 1
    diff
   <int>
 1     0
 2     8
 3     3
 4     2
 5     6
 6     7
 7     5
 8     1
 9     4
10     9
# A tibble: 3 x 2
# Groups:   g [2]
      g     x
  <dbl> <dbl>
1     1     1
2     2     2
3     2     1
# A tibble: 3 x 2
# Groups:   g [2]
      g     x
  <dbl> <dbl>
1     1     1
2     2     2
3     2     1
Observations: 3
Variables: 1
$ a <list> [1, 1, 2]
# A tibble: 3 x 1
          a
     <list>
1 <dbl [1]>
2 <dbl [1]>
3 <dbl [1]>
Observations: 3
Variables: 1
$ a <list> [1, 1, 2]
# A tibble: 2 x 1
          a
     <list>
1 <int [1]>
2 <int [1]>

dplyr documentation built on July 5, 2018, 9:04 a.m.