dplyrXdf: Tools for working with Microsoft R Server Xdf files and the dplyr package

Description Usage Arguments Details Value See Also Examples

Select distinct/unique rows

## S3 method for class 'RxFileData'
distinct(.data, ..., .keep_all = FALSE,
  .outFile = tbl_xdf(.data), .rxArgs)

## S3 method for class 'grouped_tbl_xdf'
distinct(.data, ..., .keep_all = FALSE,
  .outFile = tbl_xdf(.data), .rxArgs)

## S3 method for class 'RxDataSource'
distinct(.data, ...)

`.data`	A tbl for an Xdf data source; or a raw Xdf data source.
`...`	Variables to use for determining uniqueness. If left blank, all variables in `.data` are used to determine uniqueness.
`.keep_all`	Whether to keep all the variables in the dataset, or only those used in determining uniqueness.
`.outFile`	Output format for the returned data. If not supplied, create an xdf tbl; if `NULL`, return a data frame; if a character string naming a file, save an Xdf file at that location.
`.rxArgs`	A list of RevoScaleR arguments. See `rxArgs` for details.
`.keep_all`	If `TRUE`, keep all variables in the dataset; otherwise, only keep variables used in defining uniqueness.

This verb calls dplyr::distinct on each chunk in an Xdf file. The individual data frames are rbinded together and dplyr::distinct is called on the overall result. This may be slow if there are many chunks in the file; and the operation will be limited by memory if the number of distinct rows is large.

This verb can be used on HDFS data in the local compute context (on the edge node), but not in the Hadoop or Spark compute contexts.

An object representing the unique rows. This depends on the .outFile argument: if missing, it will be an xdf tbl object; if NULL, a data frame; and if a filename, an Xdf data source referencing a file saved to that location.

distinct in package dplyr

mtx <- as_xdf(mtcars, overwrite=TRUE)
tbl1 <- distinct(mtx)
tbl2 <- distinct(mtx, am)
tbl3 <- distinct(mtx, am, vs)
nrow(tbl1)
nrow(tbl2)
nrow(tbl3)

# save to a persistent Xdf file
distinct(mtx, am, vs, .outFile="mtcars_distinct.xdf")

RevolutionAnalytics/dplyrXdf documentation built on June 3, 2019, 9:08 p.m.

RevolutionAnalytics/dplyrXdf index

README.md

rdrr.io home R language documentation Run R code online

CRAN packages Bioconductor packages R-Forge packages GitHub packages

Note that we can't provide technical support on individual packages. You should contact the package authors for that.

RevolutionAnalytics/dplyrXdf
Tools for working with Microsoft R Server Xdf files and the dplyr package

distinct: Select distinct/unique rows
In RevolutionAnalytics/dplyrXdf: Tools for working with Microsoft R Server Xdf files and the dplyr package

Description

Usage

Arguments

Details

Value

See Also

Examples

Related to distinct in RevolutionAnalytics/dplyrXdf...

R Package Documentation

Browse R Packages

We want your feedback!

RevolutionAnalytics/dplyrXdf Tools for working with Microsoft R Server Xdf files and the dplyr package

distinct: Select distinct/unique rows In RevolutionAnalytics/dplyrXdf: Tools for working with Microsoft R Server Xdf files and the dplyr package

Description

Usage

Arguments

Details

Value

See Also

Examples

Related to distinct in RevolutionAnalytics/dplyrXdf...

R Package Documentation

Browse R Packages

We want your feedback!

RevolutionAnalytics/dplyrXdf
Tools for working with Microsoft R Server Xdf files and the dplyr package

distinct: Select distinct/unique rows
In RevolutionAnalytics/dplyrXdf: Tools for working with Microsoft R Server Xdf files and the dplyr package