Description Usage Arguments Details Value References See Also Examples
Creates a distributed data.frame with the specified partitioning and data.
1 2 3 |
nparts |
vector specifying number of partitions. If missing, 'psize' and 'dim' must be provided. |
dim |
the dim attribute for the data.frame to be created. A vector specifying number of rows and columns. |
psize |
size of each partition as a vector specifying number of rows and columns. This parameter is provided together with dim. |
data |
initial value of all elements in array. Default is 0. |
Data frame partitions are internally stored as data.frame objects. Last set of partitions may have fewer rows or columns if the dframe dimension is not an integer multiple of partition size. For example, the distributed data.frame 'dframe(dim=c(5,5), psize=c(2,5))' has three partitions. The first two partitions have two rows each but the last partition has only one row. All three partitions have five columns.
Distributed data.frames can also be defined by specifying just the number of partitions, but not their sizes. This flexibility is useful when the size of an dframe is not known apriori. For example, 'dframe(nparts=c(5,1))' is a dense array with five partitions. Each partition can contain any number of rows, though the number of columns should be same to conform to a well formed array.
Distributed data.frames can be fetched at the master using
collect
. Number of partitions can be obtained by
nparts
. Partitions are numbered from left to right, and
then top to bottom, i.e., row major order. Dimension of each
partition can be obtained using psize
.
Returns a distributed data.frame with the specified dimensions. Data may reside as partitions in remote nodes.
Prasad, S., Fard, A., Gupta, V., Martinez, J., LeFevre, J., Xu, V., Hsu, M., Roy, I. Large scale predictive analytics in Vertica: Fast data transfer, distributed model creation and in-database prediction. _Sigmod 2015_, 1657-1668.
Venkataraman, S., Bodzsar, E., Roy, I., AuYoung, A., and Schreiber, R. (2013) Presto: Distributed Machine Learning and Graph Processing with Sparse Matrices. _EuroSys 2013_, 197-210.
Homepage: https://github.com/vertica/ddR
1 2 3 4 5 6 7 8 9 | ## Not run:
## A 9 partition (each partition 3x3), 9x9 dframe with each element initialized to 5.
a <- dframe(psize=c(3,3),dim=c(9,9),data=5)
collect(a)
b <- dframe(psize=c(3,3),dim=c(9,9)) # Same as 'a', but filled with 0s.
## An empty dframe with 6 partitions, 2 per column and 3 per row.
c <- dframe(nparts=c(2,3))
## End(Not run)
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.