select.lspace: Select Latent Semantic Spaces

Select Latent Semantic Spaces


Retrieve information and links to latent semantic spaces (sets of word vectors/embeddings) available at, and optionally download their term mappings (


select.lspace(query = NULL, dir = getOption("lingmatch.lspace.dir"),
  terms = NULL, = FALSE, check.md5 = TRUE, mode = "wb")



A character used to select spaces, based on names or other features. If length is over 1, is set to TRUE. Use terms alone to select spaces based on term coverage.


Path to a directory containing lma_term_map.rda and downloaded spaces;
will look in getOption('lingmatch.lspace.dir') and '~/Latent Semantic Spaces' by default.


A character vector of terms to search for in the downloaded term map, to calculate coverage of spaces, or select by coverage if query is not specified.

Logical; if TRUE and lma_term_map.rda is not found in dir, the term map (lma_term_map.rda) is downloaded and decompressed.


Logical; if TRUE (default), retrieves the MD5 checksum from OSF, and compares it with that calculated from the downloaded file to check its integrity.


Passed to download.file when downloading the term map.


A list with varying entries:

  • info: The version of stored internally; a data.frame with spaces as row names, and information about each space in columns:

    • terms: number of terms in the space

    • corpus: corpus(es) on which the space was trained

    • model: model from which the space was trained

    • dimensions: number of dimensions in the model (columns of the space)

    • model_info: some parameter details about the model

    • original_max: maximum value used to normalize the space; the original space would be (vectors * original_max) / 100

    • osf_dat: OSF id for the .dat files; the URL would be

    • osf_terms: OSF id for the _terms.txt files; the URL would be

    • wiki: link to the wiki for the space

    • downloaded: path to the .dat file if downloaded, and '' otherwise.

  • selected: A subset of info selected by query.

  • term_map: If is TRUE or lma_term_map.rda is found in dir, a copy of, which has space names as column names, terms as row names, and indices as values, with 0 indicating the term is not present in the associated space.

# just retrieve information about available spaces
spaces <- select.lspace()
spaces$info[1:10, c("terms", "dimensions", "original_max")]

# retrieve all spaces that used word2vec
w2v_spaces <- select.lspace("word2vec")$selected
w2v_spaces[, c("terms", "dimensions", "original_max")]

## Not run: 

# select spaces by terms
select.lspace(terms = c(
  "part-time", "i/o", "'cause", "brexit", "debuffs"
))$selected[, c("terms", "coverage")]

## End(Not run)

