dimRed: Dimensionality Reduction for sparse matrices, based on... In qlcMatrix: Utility Sparse Matrix Functions for Quantitative Language Comparison

Description

To inspect the structure of a large sparse matrix, it is often highly useful to reduce the matrix to a few major dimensions (cf. multidimensional scaling). This functions implements a rough approach to provide a few major dimensions. The function provides a simple wrapper around Cholesky and sparsesvd.

Usage

 1 dimRed(sim, k = 2, method = "svd")

Arguments

 sim Sparse, symmetric, positive-definite matrix (typically a similarity matrix produces by sim or assoc functions) k Number of dimensions to be returned, defaults to two. method Method used for the decomposition. Currently implemted are svd and cholesky.

Details

Based on the Cholesky decomposition, the Matrix sim is decomposed into:

L D L'

The D Matrix is a diagonal matrix, the values of which are returned here as \$D. Only the first few columns of the L Matrix are returned (possibly after permutation, see the details at Cholesky).

Based on the svd decomposition, the Matrix sim is decomposed into:

U D V

The U Matrix and the values from D are returned.

Value

A list of two elements is returned:

 L : a sparse matrix of type dgCMatrix with k columns D : the diagional values from the Cholesky decomposition, or the eigenvalues from the svd decomposition

Author(s)

Michael Cysouw <cysouw@mac.com>

Examples

 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 # some random points in two dimensions coor <- cbind(sample(1:30), sample(1:30)) # using cmdscale() to reconstruct the coordinates from a distance matrix d <- dist(coor) mds <- cmdscale(d) # using dimRed() on a similarity matrix. # Note that normL works much better than other norms in this 2-dimensional case s <- cosSparse(t(coor), norm = normL) red <- as.matrix(dimRed(s)\$L) # show the different point clouds par(mfrow = c(1,3)) plot(coor, type = "n", axes = FALSE, xlab = "", ylab = "") text(coor, labels = 1:30) title("Original coordinates") plot(mds, type = "n", axes = FALSE, xlab = "", ylab = "") text(mds, labels = 1:30) title("MDS from euclidean distances") plot(red, type = "n", axes = FALSE, xlab = "", ylab = "") text(red, labels = 1:30) title("dimRed from cosSparse similarity") par(mfrow = c(1,1)) # ====== # example, using the iris data data(iris) X <- t(as.matrix(,1:4])) cols <- rainbow(3)[iris\$Species] s <- cosSparse(X, norm = norm1) d <- dist(t(X), method = "manhattan") svd <- as.matrix(dimRed(s, method = "svd")\$L) chol <- as.matrix(dimRed(s, method = "cholesky")\$L) mds <- cmdscale(d) par(mfrow = c(1,3)) plot(mds, col = cols, main = "cmdscale\nfrom euclidean distances") plot(svd, col = cols, main = "dimRed with svd\nfrom cosSparse with norm1") plot(chol, col = cols, main = "dimRed with cholesky\nfrom cosSparse with norm1") par(mfrow = c(1,1))

Example output  