distinct: Select distinct/unique rows

Description Usage Arguments Details Value See Also Examples

Description

Select distinct/unique rows

Usage

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
## S3 method for class 'RxFileData'
distinct(.data, ..., .keep_all = FALSE,
  .outFile = tbl_xdf(.data), .rxArgs)

## S3 method for class 'grouped_tbl_xdf'
distinct(.data, ..., .keep_all = FALSE,
  .outFile = tbl_xdf(.data), .rxArgs)

## S3 method for class 'RxDataSource'
distinct(.data, ...)

Arguments

.data

A tbl for an Xdf data source; or a raw Xdf data source.

...

Variables to use for determining uniqueness. If left blank, all variables in .data are used to determine uniqueness.

.keep_all

Whether to keep all the variables in the dataset, or only those used in determining uniqueness.

.outFile

Output format for the returned data. If not supplied, create an xdf tbl; if NULL, return a data frame; if a character string naming a file, save an Xdf file at that location.

.rxArgs

A list of RevoScaleR arguments. See rxArgs for details.

.keep_all

If TRUE, keep all variables in the dataset; otherwise, only keep variables used in defining uniqueness.

Details

This verb calls dplyr::distinct on each chunk in an Xdf file. The individual data frames are rbinded together and dplyr::distinct is called on the overall result. This may be slow if there are many chunks in the file; and the operation will be limited by memory if the number of distinct rows is large.

This verb can be used on HDFS data in the local compute context (on the edge node), but not in the Hadoop or Spark compute contexts.

Value

An object representing the unique rows. This depends on the .outFile argument: if missing, it will be an xdf tbl object; if NULL, a data frame; and if a filename, an Xdf data source referencing a file saved to that location.

See Also

distinct in package dplyr

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
mtx <- as_xdf(mtcars, overwrite=TRUE)
tbl1 <- distinct(mtx)
tbl2 <- distinct(mtx, am)
tbl3 <- distinct(mtx, am, vs)
nrow(tbl1)
nrow(tbl2)
nrow(tbl3)

# save to a persistent Xdf file
distinct(mtx, am, vs, .outFile="mtcars_distinct.xdf")

RevolutionAnalytics/dplyrXdf documentation built on June 3, 2019, 9:08 p.m.