Multiplyr-class: Parallel processing data frame

Description Arguments Value Fields Methods Examples

Description

With the exception of calling Multiplyr to create a new data frame, none of the methods/fields here are really intended for general use: it's generally best to stick to the manipulation functions. Run the following command to get a better overview: vignette("basics")

Arguments

...

Either a data frame or a list of name=value pairs

cl

Cluster object, number of nodes or NULL (default)

alloc

Allocate additional columns

auto_compact

Automatically compact data after filter operations

auto_partition

Automatically re-partition after group_by

profiling

Enable internal profiling code

Value

Object of class Multiplyr

Fields

auto_compact

Compact data after each filtering etc. operation

auto_partition

Re-partition after group_by

bindenv

Environment for within_group etc. operations

bm

big.matrix (internal representation of data)

bm.master

big.matrix for certain operations that need non-subsetted data

cls

SOCKcluster created by parallel package

col.names

Name of each column; names starting "." are special and NA is a free column

desc.master

big.matrix.descriptor for setting up shared memory access

empty

Flag indicating that this data frame is empty

factor.cols

Which columns are factors/character

factor.levels

List (same length as factor.cols) containing corresponding factor levels

filtercol

Which column in bm indicates filtering (1=included, 0=excluded)

filtered

Flag indicating that this data frame has had filtering applied

first

Subsetting: first row

group.cols

Which columns are involved in grouping

groupcol

Which column in bm contains the group ID

grouped

Flag indicating whether grouped

groupenv

List of environments corresponding to group IDs in group

group_max

Number of groups

group_partition

Flag indicating that partition_group() has been used

group_sizes_stale

Flag indicating that group sizes need to be re-calculated

group

Which group IDs are assigned to this data frame

last

Subsetting: last row

nsamode

Flag indicating whether data frame is in no-strings-attached mode

order.cols

Display order of columns

pad

Number of spaces to pad each column or 0 for dynamic

profile_names

Profile names

profile_real

Total elapsed time for each profile

profile_rreal

Reference time for total elapsed

profile_rsys

Reference time for system

profile_ruser

Reference time for user

profile_sys

Total system time for each profile

profile_user

Total user time for each profile

profiling

Flag indicating that profiling is to be used

slave

Flag indicating whether cluster_* operations are valid

tmpcol

Which column may be used for temporary calculations

type.cols

Column type (0=numeric, 1=character, 2=factor)

Methods

alloc_col(name = ".tmp", update = FALSE)

Allocate a new column and optionally update cluster nodes to do the same. Returns the column number

build_grouped()

Build group environments

calc_group_sizes(delay = TRUE)

Calculate group sizes (if delay=TRUE then this will just mark group sizes as being stale)

cluster_eval(...)

Executes specified expression on cluster

cluster_export(var, var.as = NULL, envir = parent.frame())

Exports a variable from current environment to the cluster, optionally with a different name

cluster_export_each(var, var.as = var, envir = parent.frame())

Like cluster_export, but exports only one element of each variable to each node

cluster_export_self()

Exports this data frame to the cluster (naming it .local)

cluster_profile()

Update profile totals to include all nodes' totals (also resets nodes' totals to 0)

cluster_running()

Checks whether cluster is running

cluster_start(cl = NULL)

Starts a cluster with cl cores if cl is numeric, detectCores()-1 if cl is NULL, or uses specified existing cluster

cluster_stop(only.if.started = FALSE)

Stops cluster

compact()

Re-sorts data so all rows included after filtering are contiguous (and calls sub.big.matrix in the process)

describe()

Describes data frame (for later use by reattach_slave)

destroy_grouped()

Removes grouped data on remote nodes

envir(nsa = NULL)

Returns an environment with active bindings to columns (may also temporarily set no strings attached mode)

factor_map(var, vals)

For a given set of values (numeric or character), map it to be numeric: this is used to store data in big.matrix

filter_range(start, end)

Only include specified rows. Note that start and end are relative to all rows in the big.matrix, filtered or otherwise

filter_rows(rows)

Only include specified numeric rows. Note that rows refer to all rows in the big.matrix, filtered or otherwise

filter_vector(rows)

Only include these rows (given as a vector of TRUE/FALSE values). Note that this applies to all rows in the big.matrix, filtered or otherwise

finalize()

Destructor

free_col(cols, update = FALSE)

Free specified (numeric) column and optionally update cluster

get_data(i = NULL, j = NULL, nsa = NULL, drop = TRUE)

Retrieve given rows (i), columns (j). drop=TRUE with 1 column will return a vector, otherwise a standard data.frame. If no strings attached mode is enabled, this will only return a vector or a matrix

group_cache_attach(descres)

Attach data frame to group_cache

group_restrict(grpid = NULL)

Restricts data to only specified group ID. If NULL, returns to non-restricted.

initialize(..., alloc = 0, cl = NULL, auto_compact = TRUE, auto_partition = TRUE, profiling = TRUE)

Constructor

local_subset(first, last)

Applies sub.big.matrix to bm

partition_even(extend = FALSE)

Partitions data evenly across cluster, irrespective of grouping boundaries

profile(action = NULL, name = NULL)

Profiling function: action may be start or stop. If no parameters, this returns a data.frame of profiling timings

profile_import(prof)

Adds totals from provided profile to this data frame's profiling data

reattach_slave(descres)

Used for nodes to reattach to a specified shared memory object

rebuild_grouped()

Executes destroy_grouped(), followed by build_grouped()

row_names()

Returns some entirely arbitrary row names

set_data(i = NULL, j = NULL, value, nsa = NULL)

Set data in given rows (i) and columns (j). If in no strings attached mode, then value must be entirely numeric

sort(decreasing = FALSE, dots = NULL, cols = NULL, with.group = TRUE)

Sorts data by specified (numeric) columns or by translating from a lazy_dots object. with.group is used to ensure that the sort is by grouping columns first to ensure contiguity

submatrix(a, b)

Returns a sub.big.matrix between specified rows (a:b)

update_fields(fieldnames)

Update specified cluster data frames' field names to be the same as this one's

Examples

1
2
3
4
5
dat <- Multiplyr (x=1:100, G=rep(c("A", "B"), each=50), cl=2)
dat %>% shutdown()
dat.df <- data.frame (x=1:100, G=rep(c("A", "B"), each=50))
dat <- Multiplyr (dat.df, cl=2)
dat %>% shutdown()

jeblundell/multiplyr documentation built on May 19, 2019, 12:39 a.m.