# jMatrix: Harmonize ('join') sparse matrices In qlcMatrix: Utility Sparse Matrix Functions for Quantitative Language Comparison

## Description

A utility function to make sparse matrices conformable semantically. Not only are the dimensions made conformable in size, but also the content of the dimensions. Formulated differently, this function harmonizes two matrices on a dimensions that have the same entities, but in a different order (and possibly with different subsets). Given two matrices with such (partly overlapping) dimensions, two new matrices are generated to reorder the original matrices via a matrix product to make them conformable. In an abstract sense, this is similar to an SQL ‘inner join’ operation.

## Usage

 ```1 2 3 4``` ```jMatrix(rownamesX, rownamesY, collation.locale = "C") jcrossprod(X, Y, rownamesX = rownames(X), rownamesY = rownames(Y)) tjcrossprod(X, Y, colnamesX = colnames(X), colnamesY = colnames(Y)) ```

## Arguments

 `rownamesX, rownamesY` rownames to be joined from two matrices. `X, Y` sparse matrices to be made (semantically) conformable. `colnamesX, colnamesY` colnames to be joined from two matrices. `collation.locale` locale to be used for ordering of the joined dimension. Defaults to pure numerical unicode ordering "C". See `ttMatrix` for details.

## Details

Given a sparse matrix X with rownames rX and a sparse matrix Y with rownames rY, the function `jMatrix` produces joined rownames rXY with all unique entries in `c(rX, rY)`, reordered according to the specified locale, if necessary.

Further, two sparse matrices M1 and M2 are returned to link X and Y to the new joined dimension rXY. Specifically, X2 = M1 %*% X and Y2 = M2 %*% Y will have conformable rXY rows, so crossprod(X2, Y2) can be computed. Note that the result will be empty when there is no overlap between the rownames of X and Y.

The function `jcrossprod` is a shortcut to compute the above crossproduct immediately, using `jMatrix` internally to harmonize the rows. Similarly, `tjcrossprod` computes the tcrossprod, harmonizing the columns of two matrices using `jMatrix`.

## Value

`jMatrix` returns a list of three elements (for naming, see Details above):

 `M1` sparse pattern matrix of type `ngCMatrix` with dimensions `c(length(rXY),length(rX))` `M2` sparse pattern matrix of type `ngCMatrix` with dimensions `c(length(rXY),length(rY))` `rownames` unique joined row names rXY

`jcrossprod` and `tjcrossprod` return a sparse Matrix of type `ngCMatrix` when both X and Y are pattern matrices. Otherwise they return a sparse Matrix of type `dgCMatrix`.

## Note

Actually, it is unimportant whether the inputs to `jMatrix` are row or column names. However, care has to be taken to use the resulting matrices in the right transposition. To make this function easier to explain, I consistently talk only about row names above.

Michael Cysouw

## Examples

 ``` 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36``` ```# example about INNER JOIN from wikipedia # http://en.wikipedia.org/wiki/Sql_join#Inner_join # this might look complex, but it is maximally efficient on large sparse matrices # Employee table as sparse Matrix Employee.LastName <- c("Rafferty","Jones","Heisenberg","Robinson","Smith","John") Employee.DepartmentID <- c(31,33,33,34,34,NA) E.LN <- ttMatrix(Employee.LastName, simplify = TRUE) E.DID <- ttMatrix(Employee.DepartmentID, simplify = TRUE) ( Employees <- tcrossprod(E.LN, E.DID) ) # Department table as sparse Matrix Department.DepartmentID <- c(31,33,34,35) Department.DepartmentName <- c("Sales","Engineering","Clerical","Marketing") D.DID <- ttMatrix(Department.DepartmentID, simplify = TRUE) D.DN <- ttMatrix(Department.DepartmentName, simplify = TRUE) ( Departments <- tcrossprod(D.DN, D.DID) ) # INNER JOIN on DepartmentID (i.e. on the columns of these two matrices) # result is a sparse matrix linking Employee.LastName to Department.DepartmentName, # internally having used the DepartmentID for the linking ( JOIN <- tjcrossprod(Employees, Departments) ) # Note that in this example it is much easier to directly use jMatrix on the DepartmentIDs # instead of first making sparse matrices from the data # and then using tjcrossprod on the matrices to get the INNER JOIN # (only the ordering is different in this direct approach) J <- jMatrix(Employee.DepartmentID, Department.DepartmentID) JOIN <- crossprod(J\$M1, J\$M2) rownames(JOIN) <- Employee.LastName colnames(JOIN) <- Department.DepartmentName JOIN ```

qlcMatrix documentation built on May 2, 2019, 9:14 a.m.