dframe: Creates a distributed data.frame with the specified...

Description Usage Arguments Details Value References See Also Examples

View source: R/dobject.R

Description

Creates a distributed data.frame with the specified partitioning and data.

Usage

1
2
3

Arguments

nparts

vector specifying number of partitions. If missing, 'psize' and 'dim' must be provided.

dim

the dim attribute for the data.frame to be created. A vector specifying number of rows and columns.

psize

size of each partition as a vector specifying number of rows and columns. This parameter is provided together with dim.

data

initial value of all elements in array. Default is 0.

Details

Data frame partitions are internally stored as data.frame objects. Last set of partitions may have fewer rows or columns if the dframe dimension is not an integer multiple of partition size. For example, the distributed data.frame 'dframe(dim=c(5,5), psize=c(2,5))' has three partitions. The first two partitions have two rows each but the last partition has only one row. All three partitions have five columns.

Distributed data.frames can also be defined by specifying just the number of partitions, but not their sizes. This flexibility is useful when the size of an dframe is not known apriori. For example, 'dframe(nparts=c(5,1))' is a dense array with five partitions. Each partition can contain any number of rows, though the number of columns should be same to conform to a well formed array.

Distributed data.frames can be fetched at the master using collect. Number of partitions can be obtained by nparts. Partitions are numbered from left to right, and then top to bottom, i.e., row major order. Dimension of each partition can be obtained using psize.

Value

Returns a distributed data.frame with the specified dimensions. Data may reside as partitions in remote nodes.

References

Prasad, S., Fard, A., Gupta, V., Martinez, J., LeFevre, J., Xu, V., Hsu, M., Roy, I. Large scale predictive analytics in Vertica: Fast data transfer, distributed model creation and in-database prediction. _Sigmod 2015_, 1657-1668.

Venkataraman, S., Bodzsar, E., Roy, I., AuYoung, A., and Schreiber, R. (2013) Presto: Distributed Machine Learning and Graph Processing with Sparse Matrices. _EuroSys 2013_, 197-210.

Homepage: https://github.com/vertica/ddR

See Also

collect psize dmapply

Examples

1
2
3
4
5
6
7
8
9
## Not run: 
## A 9 partition (each partition 3x3), 9x9 dframe with each element initialized to 5.
a <- dframe(psize=c(3,3),dim=c(9,9),data=5)
collect(a)
b <- dframe(psize=c(3,3),dim=c(9,9)) # Same as 'a', but filled with 0s.
## An empty dframe with 6 partitions, 2 per column and 3 per row.
c <- dframe(nparts=c(2,3))

## End(Not run)

ddR documentation built on May 29, 2017, 6:52 p.m.