View source: R/select.lspace.R
select.lspace | R Documentation |
Retrieve information and links to latent semantic spaces (sets of word vectors/embeddings) available at osf.io/489he, and optionally download their term mappings (osf.io/xr7jv).
select.lspace(query = NULL, dir = getOption("lingmatch.lspace.dir"),
terms = NULL, get.map = FALSE, check.md5 = TRUE, mode = "wb")
query |
A character used to select spaces, based on names or other features.
If length is over 1, |
dir |
Path to a directory containing |
terms |
A character vector of terms to search for in the downloaded term map, to calculate
coverage of spaces, or select by coverage if |
get.map |
Logical; if |
check.md5 |
Logical; if |
mode |
Passed to |
A list with varying entries:
info
: The version of osf.io/9yzca stored internally; a
data.frame
with spaces as row names, and information about each space in columns:
terms
: number of terms in the space
corpus
: corpus(es) on which the space was trained
model
: model from which the space was trained
dimensions
: number of dimensions in the model (columns of the space)
model_info
: some parameter details about the model
original_max
: maximum value used to normalize the space; the original
space would be (vectors *
original_max) /
100
osf_dat
: OSF id for the .dat
files; the URL would be
https://osf.io/osf_dat
osf_terms
: OSF id for the _terms.txt
files; the URL would be
https://osf.io/osf_terms
wiki
: link to the wiki for the space
downloaded
: path to the .dat
file if downloaded,
and ''
otherwise.
selected
: A subset of info
selected by query
.
term_map
: If get.map
is TRUE
or lma_term_map.rda
is found in
dir
, a copy of osf.io/xr7jv, which has space names as
column names, terms as row names, and indices as values, with 0 indicating the term is not
present in the associated space.
Other Latent Semantic Space functions:
download.lspace()
,
lma_lspace()
,
standardize.lspace()
# just retrieve information about available spaces
spaces <- select.lspace()
spaces$info[1:10, c("terms", "dimensions", "original_max")]
# retrieve all spaces that used word2vec
w2v_spaces <- select.lspace("word2vec")$selected
w2v_spaces[, c("terms", "dimensions", "original_max")]
## Not run:
# select spaces by terms
select.lspace(terms = c(
"part-time", "i/o", "'cause", "brexit", "debuffs"
))$selected[, c("terms", "coverage")]
## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.