as.dframe: Convert input matrix or data.frame into a distributed...

Description Usage Arguments Details Value References See Also Examples

View source: R/dobject.R

Description

Convert input matrix or data.frame into a distributed data.frame.

Usage

1
as.dframe(input, psize = NULL)

Arguments

input

input matrix or data.frame that will be converted to dframe.

psize

size of each partition as a vector specifying number of rows and columns.

Details

If partition size (psize) is missing then the input matrix/data.frame is row partitioned and striped across the cluster, i.e., the returned distributed frame has approximately as many partitions as the number of R instances in the session.

The last set of partitions may have fewer rows or columns if input matrix size is not an integer multiple of partition size. If 'A' is a 5x5 matrix, then 'as.dframe(A, psize=c(2,5))' is a distributed frame with three partitions. The first two partitions have two rows each but the last partition has only one row. All three partitions have five columns.

To create a distributed frame with just one partition, pass the dimension of the input frame, i.e. 'as.dframe(A, psize=dim(A))'

Value

Returns a distributed data.frame with dimensions equal to that of the input matrix and partitioned according to argument 'psize'. Data may reside as partitions on remote nodes.

References

Prasad, S., Fard, A., Gupta, V., Martinez, J., LeFevre, J., Xu, V., Hsu, M., Roy, I. Large scale predictive analytics in Vertica: Fast data transfer, distributed model creation and in-database prediction. _Sigmod 2015_, 1657-1668.

Venkataraman, S., Bodzsar, E., Roy, I., AuYoung, A., and Schreiber, R. (2013) Presto: Distributed Machine Learning and Graph Processing with Sparse Matrices. _EuroSys 2013_, 197-210.

Homepage: https://github.com/vertica/ddR

See Also

dframe psize

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
## Not run: 
    ##Create 4x4 matrix
    mtx<-matrix(sample(0:1, 16, replace=T), nrow=4)
    ##Create distributed frame spread across the cluster
    df<-as.dframe(mtx)
    psize(df)
    ##Create distributed frame with single partition
    db<-as.dframe(mtx, psize=dim(mtx))
    psize(db)
    ##Create distributed frame with two partitions
    dc<- as.dframe(mtx, psize=c(2,4))
    psize(dc)
    ##Fetch first partition
    collect(dc,1)
    #creating of dframe with data.frame
    dfa <- c(2,3,4)
    dfb <- c("aa","bb","cc")
    dfc <- c(TRUE,FALSE,TRUE)
    df <- data.frame(dfa,dfb,dfc)
    #creating dframe from data.frame with default block size
    ddf <- as.dframe(df)
    collect(ddf)
    #creating dframe from data.frame with 1x1 block size
    ddf <- as.dframe(df,psize=c(1,1))
    collect(ddf)

## End(Not run)

Example output

Welcome to 'ddR' (Distributed Data-structures in R)!
For more information, visit: https://github.com/vertica/ddR

Attaching package: 'ddR'

The following objects are masked from 'package:base':

    cbind, rbind

     [,1] [,2]
[1,]    4    4
     [,1] [,2]
[1,]    4    4
     [,1] [,2]
[1,]    2    4
[2,]    2    4
  X1 X2 X3 X4
1  0  1  0  0
2  0  0  0  1
  dfa dfb   dfc
1   2  aa  TRUE
2   3  bb FALSE
3   4  cc  TRUE
  dfa dfb   dfc
1   2  aa  TRUE
2   3  bb FALSE
3   4  cc  TRUE

ddR documentation built on May 29, 2017, 6:52 p.m.