read.dsm.matrix | R Documentation |
This function loads a DSM matrix from a disk file in the specified format (see section sQuote(Formats) for details).
read.dsm.matrix(file, format = c("word2vec"), encoding = "UTF-8", batchsize = 1e6, verbose=FALSE)
file |
either a character string naming a file or a |
format |
input file format (see section sQuote(Formats)). The input file format cannot be guessed automatically. |
encoding |
character encoding of the input file (ignored if |
batchsize |
for certain input formats, the matrix is read in batches of |
verbose |
if |
In order to read text formats from a compressed file, pass a gzfile
, bzfile
or xzfile
connection with appropriate encoding
in the argument file
. Make sure not to open the connection before passing it to read.dsm.matrix
.
Currently, the only supported file format is word2vec
.
word2vec
This widely used text format for word embeddings is only suitable for a dense matrix. Row labels must be unique and may not contain whitespace. Values are usually rounded to a few decimal digits in order to keep file size manageable.
The first line of the file lists the matrix dimensions (rows, columns) separated by a single blank. It is followed by one text line for each matrix row, starting with the row label. The label and are cells are separated by single blanks, so row labels cannot contain whitespace.
Stephanie Evert (https://purl.org/stephanie.evert)
write.dsm.matrix
, read.dsm.triplet
, read.dsm.ucs
fn <- system.file("extdata", "word2vec_hiero.txt", package="wordspace") read.dsm.matrix(fn, format="word2vec")
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.