data_transform | R Documentation |
wordvec
(data.table) or embed
(matrix), saved in a compressed ".RData" file.Transform plain text of word vectors into wordvec
(data.table) or embed
(matrix), saved in a compressed ".RData" file.
data_transform(
file.load,
file.save,
as = c("wordvec", "embed"),
sep = " ",
header = "auto",
encoding = "auto",
compress = "bzip2",
compress.level = 9,
verbose = TRUE
)
file.load |
File name of raw text (must be plain text). Data must be in this format (values separated by cat 0.001 0.002 0.003 0.004 0.005 ... 0.300 dog 0.301 0.302 0.303 0.304 0.305 ... 0.600 |
file.save |
File name of to-be-saved R data (must be .RData). |
as |
Transform the text to which R object? |
sep |
Column separator. Defaults to |
header |
Is the 1st row a header (e.g., meta-information such as "2000000 300")? Defaults to |
encoding |
File encoding. Defaults to |
compress |
Compression method for the saved file. Defaults to
|
compress.level |
Compression level from |
verbose |
Print information to the console? Defaults to |
Speed: In total (preprocess + compress + save), it can process about 30000 words/min with the slowest settings (compress="xz"
, compress.level=9
) on a modern computer (HP ProBook 450, Windows 11, Intel i7-1165G7 CPU, 32GB RAM).
A wordvec
(data.table) or embed
(matrix).
Download pre-trained word vectors data (.RData
): https://psychbruce.github.io/WordVector_RData.pdf
as_wordvec()
/ as_embed()
load_wordvec()
/ load_embed()
normalize()
data_wordvec_subset()
## Not run:
# please first manually download plain text data of word vectors
# e.g., from: https://fasttext.cc/docs/en/crawl-vectors.html
# the text file must be on your disk
# the following code cannot run unless you have the file
library(bruceR)
set.wd()
data_transform(file.load="cc.zh.300.vec", # plain text file
file.save="cc.zh.300.vec.RData", # RData file
header=TRUE, compress="xz") # of minimal size
## End(Not run)
Add the following code to your website.
For more information on customizing the embed code, read Embedding Snippets.