LSAfun-package: Computations based on Latent Semantic Analysis

LSAfun-packageR Documentation

Computations based on Latent Semantic Analysis

Description

Offers methods and functions for working with Vector Space Models of semantics/distributional semantic models/word embeddings. The package was originally written for Latent Semantic Analysis (LSA), but can be used with all vector space models. Such models are created by algorithms working on a corpus of text documents. Those algorithms achieve a high-dimensional vector representation for word (and document) meanings. The exact LSA algorithm is described in Martin & Berry (2007).

Such a representation allows for the computation of word (and document) similarities, for example by computing cosine values of angles between two vectors.

The focus of this package

This package is not designed to create LSA semantic spaces. In R, this functionality is provided by the package lsa. The focus of the package LSAfun is to provide functions to be applied on existing LSA (or other) semantic spaces, such as

  1. Similarity Computations

  2. Neighborhood Computations

  3. Applied Functions

  4. Composition Methods

Video Tutorials

A video tutorial for this package can be found here: https://youtu.be/IlwIZvM2kg8

A video tutorial for using this package with vision-based representations from deep convolutional neural networks can be found here: https://youtu.be/0PNrXraWfzI

How to obtain a semantic space

LSAfun comes with one example LSA space, the wonderland space.

This package can also directly use LSA semantic spaces created with the lsa-package. Thus, it allows the user to use own LSA spaces. (Note that the function lsa gives a list of three matrices. Of those, the term matrix U should be used.)

The lsa package works with (very) small corpora, but gets difficulties in scaling up to larger corpora. In this case, it is recommended to use specialized software for creating semantic spaces, such as

  • S-Space (Jurgens & Stevens, 2010), available here

  • SemanticVectors (Widdows & Ferraro, 2008), available here

  • gensim (Rehurek & Sojka, 2010), available here

  • DISSECT (Dinu, Pham, & Baroni, 2013), available here

Downloading semantic spaces

Another possibility is to use one of the semantic spaces provided at https://sites.google.com/site/fritzgntr/software-resources. These are stored in the .rda format. To load one of these spaces into the R workspace, save them into a directory, set the working directory to that directory, and load the space using load().

Author(s)

Fritz Guenther


LSAfun documentation built on Nov. 18, 2023, 1:10 a.m.