dsm.projection | R Documentation |
Reduce dimensionality of DSM by linear projection of row vectors into a lower-dimensional subspace. Various projections methods with different properties are available.
dsm.projection(model, n, method = c("svd", "rsvd", "asvd", "ri", "ri+svd"), oversampling = NA, q = 2, rate = .01, power=1, with.basis = FALSE, verbose = FALSE)
model |
either an object of class |
method |
projection method to use for dimensionality reduction (see “DETAILS” below) |
n |
an integer specifying the number of target dimensions. Use |
oversampling |
oversampling factor for stochastic dimensionality reduction algorithms ( |
q |
number of power iterations in the randomized SVD algorithm (Halko et al. 2009 recommend |
rate |
fill rate of random projection vectors. Each random dimension has on average |
power |
apply power scaling after SVD-based projection, i.e. multiply each latent dimension with a suitable power of the corresponding singular value.
The default |
with.basis |
if |
verbose |
if |
The following dimensionality reduction algorithms can be selected with the method
argument:
singular value decomposition (SVD), using the efficient SVDLIBC algorithm (Berry 1992) from package sparsesvd if the input is a sparse matrix. If the DSM has been scored with scale="center"
, this method is equivalent to principal component analysis (PCA).
randomized SVD (Halko et al. 2009, p. 9) based on a factorization of rank oversampling * n
with q
power iterations.
approximate SVD, which determines latent dimensions from a random sample of matrix rows including oversampling * n
data points. This heuristic algorithm is highly inaccurate and has been deprecated.
random indexing (RI), i.e. a projection onto random basis vectors that are approximately orthogonal. Basis vectors are generated by setting a proportion of rate
elements randomly to +1 or -1. Note that this does not correspond to a proper orthogonal projection, so the resulting coordinates in the reduced space should be used with caution.
RI to oversampling * n
dimensions, followed by SVD of the pre-reduced matrix to the final n
dimensions. This is not a proper orthogonal projection because the RI basis vectors in the first step are only approximately orthogonal.
A numeric matrix with n
columns (latent dimensions) and the same number of rows as the original DSM. Some SVD-based algorithms may discard poorly conditioned singular values, returning fewer than n
columns.
If with.basis=TRUE
and an orthogonal projection is used, the corresponding orthogonal basis B of the latent subspace is returned as an attribute "basis"
. B is column-orthogonal, hence B^T projects into latent coordinates and B B^T is an orthogonal subspace projection in the original coordinate system.
For orthogonal projections, the attribute "R2"
contains a numeric vector specifying the proportion of the squared Frobenius norm of the original matrix captured by each of the latent dimensions. If the original matrix has been centered (so that a SVD projection is equivalent to PCA), this corresponds to the proportion of variance “explained” by each dimension.
For SVD-based projections, the attribute "sigma"
contains the singular values corresponding to latent dimensions. It can be used to adjust the power scaling exponent at a later time.
Stephanie Evert (https://purl.org/stephanie.evert)
Berry, Michael~W. (1992). Large scale sparse singular value computations. International Journal of Supercomputer Applications, 6, 13–49.
Halko, N., Martinsson, P. G., and Tropp, J. A. (2009). Finding structure with randomness: Stochastic algorithms for constructing approximate matrix decompositions. Technical Report 2009-05, ACM, California Institute of Technology.
rsvd
for the implementation of randomized SVD, and sparsesvd
for the SVDLIBC wrapper
# 240 English nouns in space with correlated dimensions "own", "buy" and "sell" M <- DSM_GoodsMatrix[, 1:3] # SVD projection into 2 latent dimensions S <- dsm.projection(M, 2, with.basis=TRUE) 100 * attr(S, "R2") # dim 1 captures 86.4% of distances round(attr(S, "basis"), 3) # dim 1 = commodity, dim 2 = owning vs. buying/selling S[c("time", "goods", "house"), ] # some latent coordinates ## Not run: idx <- DSM_GoodsMatrix[, 4] > .85 # only show nouns on "fringe" plot(S[idx, ], pch=20, col="red", xlab="commodity", ylab="own vs. buy/sell") text(S[idx, ], rownames(S)[idx], pos=3) ## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.