drSubset: Subsetting Distributed Data Frames

Description Usage Arguments Value Author(s) Examples

Description

Return a subset of a "ddf" object to memory

Usage

1
2
3
drSubset(data, subset = NULL, select = NULL, drop = FALSE,
  preTransFn = NULL, maxRows = 500000, params = NULL, packages = NULL,
  control = NULL, verbose = TRUE)

Arguments

data

object to be subsetted – an object of class "ddf" or "ddo" - in the latter case, need to specify preTransFn to coerce each subset into a data frame

subset

logical expression indicating elements or rows to keep: missing values are taken as false

select

expression, indicating columns to select from a data frame

drop

passed on to [ indexing operator

preTransFn

a transformation function (if desired) to applied to each subset prior to division - note: this is deprecated - instead use addTransform prior to calling divide

maxRows

the maximum number of rows to return

params

a named list of objects external to the input data that are needed in the distributed computing (most should be taken care of automatically such that this is rarely necessary to specify)

packages

a vector of R package names that contain functions used in fn (most should be taken care of automatically such that this is rarely necessary to specify)

control

parameters specifying how the backend should handle things (most-likely parameters to rhwatch in RHIPE) - see rhipeControl and localDiskControl

verbose

logical - print messages about what is being done

Value

data frame

Author(s)

Ryan Hafen

Examples

1
2
d <- divide(iris, by = "Species")
drSubset(d, Sepal.Length < 5)

datadr documentation built on May 1, 2019, 8:06 p.m.