crossprod: Compute the matrix product of 'X^T' and 'Y'.

Description Usage Arguments Value Author(s) See Also Examples

Description

The function computes the cross product of two matrices. The matrix is stored in the table either as multiple columns of data or a column of arrays.

Usage

1
2
## S4 method for signature 'db.obj,ANY'
crossprod(x, y = x)

Arguments

x

A db.obj object. It either has multiple columns or a column of arrays, and thus forms a matrix.

y

A db.obj object, default is the same as x. This represents the second matrix in the cross product.

Value

db.Rcrossprod object, which is subclass of db.Rquery. It is actually a vectorized version of the resulting product matrix represented in an array. If you want to take a look at the actual values inside this matrix, lk or lookat can be used to extract the correct matrix format as long as the matrix can be loaded into the memory. Usually the resulting product matrix is not too large because the number n of columns is usually not too large and the dimension of the resulting matrix is n x n.

Author(s)

Author: Predictive Analytics Team at Pivotal Inc.

Maintainer: Frank McQuillan, Pivotal Inc. [email protected]

See Also

db.array forms an array using columns

Examples

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
## Not run: 
## get the help for a method
## help("crossprod,db.obj-method")



## set up the database connection
## Assume that .port is port number and .dbname is the database name
cid <- db.connect(port = .port, dbname = .dbname, verbose = FALSE)

## create a table from the example data.frame "abalone"
delete("abalone", conn.id = cid)
x <- as.db.data.frame(abalone, "abalone", conn.id = cid, verbose = FALSE)

lookat(crossprod(x[,-c(1,2)]))

x$arr <- db.array(1, x$length, x$diameter)

lookat(crossprod(x$arr))

## -----------------------------------------------------

## Create a function that does Principal Component Analysis in parallel.
## As long as the number of features of the data table is fewer than
## ~ 5000, the matrix t(x) 
## the eigenvalues and eigenvectors. However, the step t(x) 
## be done in-database in parallel, because x can be very big.
pca <- function (x, center = TRUE, scale = FALSE)
{
    y <- scale(x, center = center, scale = scale) # centering and scaling
    z <- as.db.data.frame(y, verbose = FALSE) # create an intermediate table to save computation
    m <- lookat(crossprod(z)) # one scan of the table to compute Z^T * Z
    d <- delete(z) # delete the intermediate table
    res <- eigen(m) # only this computation is in R
    n <- attr(y, "row.number") # save the computation to count rows

    ## return the result
    list(val = sqrt(res$values/(n-1)), # eigenvalues
         vec = res$vectors, # columns of this matrix are eigenvectors
         center = attr(y, "scaled:center"),
         scale = attr(y, "scaled:scale"))
}

## create a data table with a random name
dat <- db.data.frame("abalone", conn.id = cid, verbose = FALSE)

## exclude id and sex columns
p <- pca(dat[,-c(1,2)])

p$val # eigenvalues

db.disconnect(cid, verbose = FALSE)

## End(Not run)

PivotalR documentation built on May 30, 2017, 8:18 a.m.