View source: R/select.lspace.R
| select.lspace | R Documentation |
Retrieve information and links to latent semantic spaces (sets of word vectors/embeddings) available at osf.io/489he, and optionally download their term mappings (osf.io/xr7jv).
select.lspace(query = NULL, dir = getOption("lingmatch.lspace.dir"),
terms = NULL, get.map = FALSE, check.md5 = TRUE, mode = "wb")
query |
A character used to select spaces, based on names or other features.
If length is over 1, |
dir |
Path to a directory containing |
terms |
A character vector of terms to search for in the downloaded term map, to calculate
coverage of spaces, or select by coverage if |
get.map |
Logical; if |
check.md5 |
Logical; if |
mode |
Passed to |
A list with varying entries:
info: The version of osf.io/9yzca stored internally; a
data.frame with spaces as row names, and information about each space in columns:
terms: number of terms in the space
corpus: corpus(es) on which the space was trained
model: model from which the space was trained
dimensions: number of dimensions in the model (columns of the space)
model_info: some parameter details about the model
original_max: maximum value used to normalize the space; the original
space would be (vectors * original_max) / 100
osf_dat: OSF id for the .dat files; the URL would be
https://osf.io/osf_dat
osf_terms: OSF id for the _terms.txt files; the URL would be
https://osf.io/osf_terms
wiki: link to the wiki for the space
downloaded: path to the .dat file if downloaded,
and '' otherwise.
selected: A subset of info selected by query.
term_map: If get.map is TRUE or lma_term_map.rda is found in
dir, a copy of osf.io/xr7jv, which has space names as
column names, terms as row names, and indices as values, with 0 indicating the term is not
present in the associated space.
Other Latent Semantic Space functions:
download.lspace(),
lma_lspace(),
standardize.lspace()
# just retrieve information about available spaces
spaces <- select.lspace()
spaces$info[1:10, c("terms", "dimensions", "original_max")]
# retrieve all spaces that used word2vec
w2v_spaces <- select.lspace("word2vec")$selected
w2v_spaces[, c("terms", "dimensions", "original_max")]
## Not run:
# select spaces by terms
select.lspace(terms = c(
"part-time", "i/o", "'cause", "brexit", "debuffs"
))$selected[, c("terms", "coverage")]
## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.