Description Usage Arguments Details Value Note Author(s) See Also Examples
This function performs a Khatri-Rao product (‘column-wise Kronecker product’, see KhatriRao
for more info) on two sparse matrices. However, the result of such a product on sparse matrices normally results in very many empty rows. This function removes those empty rows, and, most importantly, it produces row names only for the remaining rows. For large sparse matrices this is much more efficient than first producing all rownames, and then removing the one with the empty rows.
1 2 3 | rKhatriRao(X, Y,
rownamesX = rownames(X), rownamesY = rownames(Y),
simplify = FALSE, binder = ":", FUN = "*")
|
X,Y |
matrices of with the same number of columns. |
rownamesX, rownamesY |
row names of matrices X and Y. These can be specified separately, but they default to the row names of the matrices. |
simplify |
by default, the names of rows and columns are not included into the matrix to keep the matrix as lean as possible: the row names are returned separately. Using |
binder |
symbol to include between the row names of X and Y for the resulting matrix |
FUN |
function to be used in the KhatriRao product, passed internally to the workhorse |
Up to 1e6 row names to be produced goes reasonably quick with the basic KhatriRao
function. However, larger amounts of pasting of row names becomes very slow, and the row names take an enormous amount of RAM. This function solves that problem by only producing row names for the non-empty rows.
By default, the result is a list of two items:
M |
resulting sparse product matrix with empty rows removed |
rownames |
a vector with the resulting row names for the non-empty rows |
When simplify=T
, then the matrix is return with the row names included.
This function allows for the row names of the input matrices to be added separately, and the resulting row names are returned separately by default. This might seem a bit unusual, given the nice way how R integrates row names into matrices. However, it turns out often to be easier to store row- and column names separately to efficiently work with large sparse matrices.
Michael Cysouw
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 | # two sparse matrices with row names
X <- rSparseMatrix(1e4, 1e3, 1e4)
Y <- rSparseMatrix(1e4, 1e3, 1e4)
rownames(X) <- 1:nrow(X)
rownames(Y) <- 1:nrow(Y)
# the basic KhatriRao product is very fast
# but almost all rows are empty
system.time(M <- KhatriRao(X, Y))
## Not run:
sum(rowSums(M)==0)/nrow(M) # 99.9% empty rows
## End(Not run)
# To produce all row names takes a long time
# with 1e8 row names it took half an hour on my laptop
# so: don't try the following, except on a very large machine!
## Not run:
system.time(M <- KhatriRao(X, Y, make.dimnames = TRUE))
## End(Not run)
# Using the current special version works just fine and is reasonably quick
system.time(M <- rKhatriRao(X, Y))
|
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.